Syntax highlighting engines: clean tex output

The vim module uses the vim editor to syntax highlight code snippets in ConTeXt. I thought that it should be straight forward to support other syntax highlighting engines: source-highlight, pygments, HsColor, etc. Unfortunately, that is not the case. None of these syntax highlighting engines were written with reuse in mind.

For example, consider a simple tex file:

\definestartstop[important]
                [color=red,
                 style=\italic]
\starttext
This is an \important{important} text
\stoptext

Lets compare the tex file generated by various syntax highlighters:

source-highlight -f latex gives

% Generator: GNU source-highlight, by Lorenzo Bettini, http://www.gnu.org/software/src-highlite
\noindent
\mbox{}\textbf{\textcolor{Blue}{\textbackslash{}definestartstop}}\textcolor{Purple}{[important]} \\

\mbox{}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ [color=red, \\

\mbox{}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ style=\textbf{\textcolor{Blue}{\textbackslash{}italic}}] \\

\mbox{}\textbf{\textcolor{Blue}{\textbackslash{}starttext}} \\

\mbox{}This\ is\ an\ \textbf{\textcolor{Blue}{\textbackslash{}important}}\textcolor{ForestGreen}{\{important\}}\ text \\

\mbox{}\textbf{\textcolor{Blue}{\textbackslash{}stoptext}} \\

\mbox{}

pygmentize -f latex gives

\begin{Verbatim}[commandchars=\\
\{\}]
\PY{k}{\PYZbs{}definestartstop}\PY{n+na}{[important]}
                [color=red,
                 style=\PY{k}{\PYZbs{}italic}]
\PY{k}{\PYZbs{}starttext}
This is an \PY{k}{\PYZbs{}important}\PY{n+nb}{\PYZob{}}important\PY{n+nb}{\PYZcb{}} text
\PY{k}{\PYZbs{}stoptext}
\end{Verbatim}

HsColor-latex -partial gives

\textcolor{red}{$\backslash$}{\rm{}definestartstop}\textcolor{red}{[}{\rm{}important}\textcolor{red}{]}\\
\hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \textcolor{red}{[}{\rm{}color}\textcolor{red}{=}{\rm{}red}\textcolor{cyan}{,}\\
\hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace {\rm{}style}\textcolor{cyan}{=$\backslash$}{\rm{}italic}\textcolor{red}{]}\\
\textcolor{red}{$\backslash$}{\rm{}starttext}\\
{\rm{}This}\hsspace {\rm{}is}\hsspace {\rm{}an}\hsspace \textcolor{red}{$\backslash$}{\rm{}important}\textcolor{cyan}{\{}{\rm{}important}\textcolor{cyan}{\}}\hsspace {\rm{}text}\\
\textcolor{red}{$\backslash$}{\rm{}stoptext}\\

HsColor and source-highlight use explicit LaTeX commands for spacing and formatting. Ouch! Pygments uses logical markup, but with cryptic command names. But, from the point of view of using pygments output in ConTeXt, the \begin{Verbatim} and \end{Verbatim} are show stopper. (OK, not really. It can be bypassed with some effort).

Based on my experience, I decided to clean up the output generated by 2context.vim:

\SYN[Identifier]{\\definestartstop}[important]
                [color=red,
                 style=\SYN[Identifier]{\\italic}]
\SYN[Statement]{\\starttext}
This is an \SYN[Identifier]{\\important}\{important\} text
\SYN[Statement]{\\stoptext}

I assume only four TeX commands to be defined: \\, \{, and \} for backslash, open brace, and close brace; and \SYN[...]{...} for syntax highlighting. Thus, if anyone wants to reuse 2context in plain TeX or LaTeX, or a yet to be written future macro package, they would not need to modify the output at all. I wish the other syntax highlighting programs did the same.

Can I borrow your highlighter please?

Are you one of those people who always have a highligher ready when reading a book. Miss that functionality in TeX? Don’t worry, its easy to highlight text in ConTeXt

\definebar[highlight]
          [order=background,
           rulethickness=2.5,
           offset=1.25,
           continue=yes,
           color=yellow]

And then

\highlight{word or sentences}

highlights text. This is based on the new mechanism in MkIV that is used to define underbars, overbars, and overstrike. Tweaking it a little bit gives us a highlighter, which afterall is just a fancy line (Aside: in TeX parlance, a line is called a rule. The ConTeXt mechanism that I am using is called a bar. I’ll just call them lines, ’cause that’s what they are.)

The option order tells whether the line should go in the foreground or the background. Obviously, I choose background. rulethickness is, well, the thickness or the rule (err. I mean line). The default units are ex and I choose the thickness to be 2.5ex (I don’t know why the units and the numerical value of thickness are specified separately). By default, the line is centered at the bottom of the line. The offset=1.25 moves it 1.25ex up, so that the line appears centered. continue=yes tells that the line should be drawn continuously. The default is continue=no which breaks the line after each work. The option color simply specifies the color.

Why do I care about such highlighting? It may be useful for presentations and such, but my interest is for a proper mechanism for universal source highlighting. The idea is that I should be able to define a source highlighting style and use any syntax highlighting program: vim, pgyments, source-highlight, HsColor, and so on. Each of these programs just mark the syntax region and leave the highlighting of the region to TeX. For a universal syntax highlighter, I can define the syntax highligting rules and just map them to output of the different programs.

Source highlighting normally involves changing font style (bold, italic, etc), changing font color, underlining, overstriking, changing backgorund color, and so on. At first I thought that I will use \framed for that: it supports all these features by default. However, \framed does not split across lines. The next choice was \textbackground which is extremely versatile: it supports all the features of \framed, and, at the same time, splits across lines and pages. But, \textbackground seemed to be an overkill. I think that \underbar mechanism is just right. It can take care of underline, overstrike, background colors; while the usual \dostartattributes is sufficient to take care of font style and color. More on this later…