Why all the fuss about luatex

LuaTeX project aims to extend the TeX engine, using Lua as a scripting language. Someone who has not tried to program in TeX, may not realize why having access to a real programming language in TeX is a boon. Let me try to explain this by means of an example. Few years ago, Mojca Mikalvec posted a request on the ConTeXt mailing list asking for a macro \molecule that will transform \molecule{H_20} into H\low{2}0, \molecule{H_2SO_4^-} into H\low{2}SO\lohi{4}{-} (some additional features were requested, but I will not go through them here). Algorithmically, this is something straight forward. Writing a TeX macro that does this is anything but straight forward. Below is the simplified version of a solution that Taco Hoekwater provided (this code only works in pdfTeX, not with luaTeX).

\newbox\chemlowbox
\def\chemlow#1%
{\setbox\chemlowbox\hbox{{\switchtobodyfont[small]#1}}}

\def\chemhigh#1%
{\ifvoid\chemlowbox \high{{\switchtobodyfont[small]#1}}%
\else\lohi{\box\chemlowbox}{{\switchtobodyfont[small]#1}}\fi }

\def\finishchem%
{\ifvoid\chemlowbox \else
\low{\box\chemlowbox}\fi}

\unexpanded\def\molecule%
{\bgroup
\catcode`\_=\active \uccode`\~=`\_ \uppercase{\let~\chemlow}%
\catcode`\^=\active \uccode`\~=`\^ \uppercase{\let~\chemhigh}%
\dostepwiserecurse {65}{90}{1}
{\catcode \recurselevel = \active
\uccode`\~=\recurselevel
\uppercase{\edef~{\noexpand\finishchem
\rawcharacter{\recurselevel}}}}%
\domolecule}%

\def\domolecule#1%
{\scantokens{#1\finishchem}\egroup}

Ugh. After staring at the code for eternity, I finally understand how it works. First it makes _ and ^ active, and defines them to be equal to \chemlow and \chemhigh.  This is done by

   \catcode`\_=\active \uccode`\~=`\_ \uppercase{\let~\chemlow}% 

Changing the \uccode for a dummy character to an active character and then defining something using \uppercase is what TeX wizards have for breakfast.

The  macro then makes each upper case letter active, and defines them to be equal to \finishchem <letter>. \finishchem checks if the previous letter had any subscripts or superscipts, and then places them either as \low, \high or \lohi. Finally, there is some rescanning trickery to ensure that correct catcodes are used. OK, so I can convince myself that the code makes sense. But there is no way I could have written something like this on my own. Ever.

This is where luaTeX makes life so easy. It brings the features of a real programming language to TeX, and one of those features is the ability to write parsers. Lua has a parsing library called lpeg which is bundled in with luaTeX. So, it is easy to parse the argument to \molecule and transform it to some other form. Below is the code by Wolfgang Schuster

\startluacode

do

thirddata = thirddata or { }

local molecule = { }

function molecule.text(one)
tex.sprint(string.format("%s",one))
end

function molecule.low(one,two)
tex.sprint(string.format("%s\\low{%s}",one,two))
end

function molecule.high(one,two)
tex.sprint(string.format("%s\\high{%s}",one,two))
end

function molecule.lowhigh(one,two,three)
tex.sprint(string.format("%s\\lohi{%s}{%s}",one,two,three))
end

function molecule.highlow(one,two,three)
tex.sprint(string.format("%s\\lohi{%s}{%s}",one,two,three))
end

local plus         = lpeg.P("+")
local minus        = lpeg.P("-")
local lowercase    = lpeg.R("az")
local uppercase    = lpeg.R("AZ")
local number       = lpeg.R("09")
local subscript    = lpeg.P("_")
local superscript  = lpeg.P("^")
local leftbrace    = lpeg.P("{")
local rightbrace   = lpeg.P("}")

local single    = lowercase + number + plus + minus
local multiple  = leftbrace * single^1 * rightbrace
local content   = single + multiple

local text    = lpeg.C(uppercase^1)                                                                 / molecule.text
local low     = lpeg.C(uppercase^1) * subscript   * lpeg.C(content)                                 / molecule.low
local high    = lpeg.C(uppercase^1) * superscript * lpeg.C(content)                                 / molecule.high
local lowhigh = lpeg.C(uppercase^1) * subscript   * lpeg.C(content) * superscript * lpeg.C(content) / molecule.lowhigh
local highlow = lpeg.C(uppercase^1) * superscript * lpeg.C(content) * subscript   * lpeg.C(content) / molecule.highlow

local parser = (lowhigh + highlow + low + high + text)^0

function thirddata.molecule(string)
parser:match(string)
end

end

\stopluacode

\def\molecule#1{\ctxlua{thirddata.molecule("#1")}}

Even though I do not understand lpeg very well, I can read the above code and understand what it does without having to stare at it for eternity. Further, I am confident that if I went through the lpeg manual, I could write the above code on my own. Hurray for luaTeX.

About these ads

2 thoughts on “Why all the fuss about luatex

  1. If you think that is complicated, look at how the LaTeX package “mhchem” does the same thing (with more bells and whistles). There are about 2000 lines :-)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s