Pygments Needs a PHP Start Tag to Work

When using Pygments to highlight PHP source code, a PHP start-tag is needed.

657 views
d

By. Jacob

Edited: 2020-01-10 04:16

PHP Highlighting in Pygments

Pygments is, so far, the best Syntax Highlighter that I have tried, and what I like the most about it, is the fact that it is server-sided!

It does have a problem with PHP though, since it will only work if the PHP opening tag is used, which is rarely the case on code examples.

The way I solved the problem, was to simply insert it before passing on the code to Pygments. This might seem a bit impractical, but it works wonderfully, and it is such a small hack that I am comfortable to use it. If ever Pygments fixes their lexer, I can just update my code in 5 mins. Probably, if they do decide to fix it, they should add it as an option since people are now relying on the bugged behavior.

The code I am using:

<?php

/**
 *  Function to fix certain language specific problems
 *
 *  Pygments require "" to exist in the passed string to work
 *  if not present, we add it manually and remove it when done
 */
function lang_fixes(string $content, string $lang)
{
    switch ($lang) {
        case "php":
            if (strpos($content, '<?') == false) {
                $content = '<?php ' . $content;
            }
        break;
    }
    return $content;
}
/**
 *  Function to undo previous fixes
 *  
 */
function lang_fixes_undo(string $content, string $lang)
{
    switch ($lang) {
        case "php":
            $content = preg_replace('|<span class=\"cp\">\<\?php<\/span>\s*|u', '', $content);
            break;
    }
    return $content;
}

You will also need to preg_replace_all the "<pre>" elements in your content, but that should be relatively straight forward. Personally I use this neat regular expression:

|<pre[^>]*>(.+?)</pre>|su

This basically greps all of the pre elements and stores them in a variable. The "s" at the end of the RegEx will cause "." to match everything, including newlines, which is needed to allow HTML inside of the pre elements.

In a preg_match_all, you would use the expression like this:

preg_match_all("|<pre[^>]*>(.+?)</pre>|su", $content, $out, PREG_PATTERN_ORDER);

Now you can easily loop through each pre element in your content:

foreach ($pre_array as &$pre) {
  $only_code = trim(preg_replace("|</?pre[^>]*>|u", '', $pre));
  // echo $only_code;
}

Tell us what you think:

Jonathan

I guess `startinline=True` as option to the PHP lexer was not available when you wrote this. It tells the lexer to highlight inline PHP (ie without the need of a starting <?php tag) and is probably more robust than using this hack.

  1. How to find out what version of Pygments Syntax Highlighter you got installed.
  2. How to make Pygments highlight code containing non-ascii UTF-8 characters.

More in: Pygments