Share via:

Pygments Needs a PHP Start Tag to Work

When using Pygments to highlight PHP source code, a PHP start-tag is needed.

95 views

Edited: 2020-01-10 04:16

PHP Highlighting in Pygments

Pygments is, so far, the best Syntax Highlighter that I have tried, and what I like the most about it, is the fact that it is server-sided!

It does have a problem with PHP though, since it will only work if the PHP opening tag is used, which is rarely the case on code examples.

The way I solved the problem, was to simply insert it before passing on the code to Pygments. This might seem a bit impractical, but it works wonderfully, and it is such a small hack that I am comfortable to use it. If ever Pygments fixes their lexer, I can just update my code in 5 mins. Probably, if they do decide to fix it, they should add it as an option since people are now relying on the bugged behavior.

The code I am using:

<?php

/**
 *  Function to fix certain language specific problems
 *
 *  Pygments require "" to exist in the passed string to work
 *  if not present, we add it manually and remove it when done
 */
function lang_fixes(string $content, string $lang)
{
    switch ($lang) {
        case "php":
            if (strpos($content, '<?') == false) {
                $content = '<?php ' . $content;
            }
        break;
    }
    return $content;
}
/**
 *  Function to undo previous fixes
 *  
 */
function lang_fixes_undo(string $content, string $lang)
{
    switch ($lang) {
        case "php":
            $content = preg_replace('|<span class=\"cp\">\<\?php<\/span>\s*|u', '', $content);
            break;
    }
    return $content;
}

You will also need to preg_replace_all the "<pre>" elements in your content, but that should be relatively straight forward. Personally I use this neat regular expression:

|<pre[^>]*>(.+?)</pre>|su

This basically greps all of the pre elements and stores them in a variable. The "s" at the end of the RegEx will cause "." to match everything, including newlines, which is needed to allow HTML inside of the pre elements.

In a preg_match_all, you would use the expression like this:

preg_match_all("|<pre[^>]*>(.+?)</pre>|su", $content, $out, PREG_PATTERN_ORDER);

Now you can easily loop through each pre element in your content:

foreach ($pre_array as &$pre) {
  $only_code = trim(preg_replace("|</?pre[^>]*>|u", '', $pre));
  // echo $only_code;
}

Comments