Rather than using an existing wrapper, I have made my own Pygments wrapper script to highlight code examples on Beamtic. However, I recently noticed a problem where it would fail to highlight text containing Danish (Scandinavian) characters (Æ Ø and Å), which are properly UTF-8 encoded in the database.
This was a problem because I also write tutorials in Danish occasionally.
Now, I have taken great care in making sure everything is UTF-8, so I was pretty sure the problem was not with my CMS. Both my Database and the CMS itself is setup to use unicode.
After some Googling, I realized the problem was with Pygments. But, luckily the solution was simple. Adding the encoding='utf-8' option to the HtmlFormatter function appears to solve the problem:
print(highlight(code, lexer, HtmlFormatter(encoding='utf-8')))
Doing this should make Pygments use UTF-8 when dealing with your code.
The Pygments wrapper script
My Pygments wrapper script is included below:
#!/usr/bin/python # -*- coding: utf-8 -*- # Author JacobSeated # To generate a stylesheet: # pygmentize -S default -f html -a .highlight > default.css # using argparse to enable arguments I.e.: # print(sys.argv) from pygments.formatters import HtmlFormatter from pygments.lexers import PythonLexer, guess_lexer, get_lexer_by_name from pygments import highlight import argparse # Parse CLI arguments parser = argparse.ArgumentParser() parser.add_argument( "--file", help="Path for file to highlight. The file should only contain code.", type=str, required=True) parser.add_argument( "--lang", help="Language to highlight I.e: php, html, css", type=str) args = parser.parse_args() # Check if file was provided try: f = open(args.file, 'r') except Exception as e: print(0) exit() else: with f: file_contents = f.read() # Check if lang was provided if args.lang: # print('Using provided language') lexer = get_lexer_by_name(args.lang) else: # print('Trying to guess the language') lexer = guess_lexer(file_contents) code = file_contents print(highlight(code, lexer, HtmlFormatter(encoding='utf-8')))