Pygments Needs a PHP Start Tag to Work
When using Pygments to highlight PHP source code, a PHP start-tag is needed.
By. Jacob
Edited: 2020-01-10 04:16
Pygments is, so far, the best Syntax Highlighter that I have tried, and what I like the most about it, is the fact that it is server-sided!
It does have a problem with PHP though, since it will only work if the PHP opening tag is used, which is rarely the case on code examples.
The way I solved the problem, was to simply insert it before passing on the code to Pygments. This might seem a bit impractical, but it works wonderfully, and it is such a small hack that I am comfortable to use it. If ever Pygments fixes their lexer, I can just update my code in 5 mins. Probably, if they do decide to fix it, they should add it as an option since people are now relying on the bugged behavior.
The code I am using:
<?php
/**
* Function to fix certain language specific problems
*
* Pygments require "" to exist in the passed string to work
* if not present, we add it manually and remove it when done
*/
function lang_fixes(string $content, string $lang)
{
switch ($lang) {
case "php":
if (strpos($content, '<?') == false) {
$content = '<?php ' . $content;
}
break;
}
return $content;
}
/**
* Function to undo previous fixes
*
*/
function lang_fixes_undo(string $content, string $lang)
{
switch ($lang) {
case "php":
$content = preg_replace('|<span class=\"cp\">\<\?php<\/span>\s*|u', '', $content);
break;
}
return $content;
}
You will also need to preg_replace_all the "<pre>" elements in your content, but that should be relatively straight forward. Personally I use this neat regular expression:
|<pre[^>]*>(.+?)</pre>|su
This basically greps all of the pre elements and stores them in a variable. The "s" at the end of the RegEx will cause "." to match everything, including newlines, which is needed to allow HTML inside of the pre elements.
In a preg_match_all, you would use the expression like this:
preg_match_all("|<pre[^>]*>(.+?)</pre>|su", $content, $out, PREG_PATTERN_ORDER);
Now you can easily loop through each pre element in your content:
foreach ($pre_array as &$pre) {
$only_code = trim(preg_replace("|</?pre[^>]*>|u", '', $pre));
// echo $only_code;
}
Tell us what you think:
I guess `startinline=True` as option to the PHP lexer was not available when you wrote this. It tells the lexer to highlight inline PHP (ie without the need of a starting <?php tag) and is probably more robust than using this hack.