Tags

, , , ,

The question of translating TeX to (X)HTML arises frequently. Almost everyone wants it. After all, on the web, (X)HTML is the de-facto standard markup; PDF, with all its hyper link abilities, is clumsier to use. On the other hand, for print, PDF, especially, TeX generated PDF is the de-facto standard (at least for math heavy fields); (X)HTML, with all its print css abilities, is clumsier to use. Often, you want both an (X)HTML version and a PDF version of a document. With the popularity of eink devices, epub (which is essentially a zipped (X)HTML file) is also becoming popular. Generating these multiple output formats from the same source is tricky.


The easiest solution, of course, is to use an ascii markup language like markdown, restructured text, asciidoc, etc. that generate (X)HTML and tex output from the same source. For simple tasks these are great: the synatx is fairly intuitive and there are various tools that translation tools are fairly robust. But, these simple markups lack an important ability of TeX: programmable macros. To give an example, suppose I need to write n tuples $(a_1, \dots, a_n)$ fairly often in a document. When using TeX, I am in the habit of defining a macro

\def\TUPLE#1{(#_1,\dots,#_n)}

and then use $\TUPLE a$. This saves typing and prevents typos. Ascii markup languages lack this ability.

A more robust solution is to use XML as the input language and then use XML-tookchain to translate the input to both HTML and TeX (There are various TeX based solution for parsing XML directly, both in LaTeX and ConTeXt). However, typing XML is a pain, epsecially, when it comes to typing MathML.

A solution which meets the needs midway is inputing text in (X)HTML and math in TeX format. Then use mathjax to render HTML and one of the TeX XML processing packages to convert to PDF. I think that until browsers become mature in displaying MathML, mathjax is an excellent ad-hoc solution. In addition, it does parse simplist TeX macros. However, you do loose the ability to write complicated TeX macros.

Of course, if you want to parse TeX macros, the only real solution is to use TeX to parse to parse TeX. Why? Because TeX has the amazing ability to change its grammer on the fly.

\newcatcodetable \weirdcatcodes
\startcatcodetable \weirdcatcodes
    \catcode`\@ = 0
    \catcode`\( = 1
    \catcode`\) = 2
\stopcatcodetable
\setcatcodetable \weirdcatcodes

After this, @ takes the usual meaning of \, ( of {, and ) of }. So, to start a section, you have to use

@section (A section)

This ability of changing catcodes is best illustrated by David Carlisle’s xii.tex

\let~\catcode~`76~`A13~`F1~`j00~`P2jdefA71F~`7113jdefPALLF
PA''FwPA;;FPAZZFLaLPA//71F71iPAHHFLPAzzFenPASSFthP;A$$FevP
A@@FfPARR717273F737271P;ADDFRgniPAWW71FPATTFvePA**FstRsamP
AGGFRruoPAqq71.72.F717271PAYY7172F727171PA??Fi*LmPA&&71jfi
Fjfi71PAVVFjbigskipRPWGAUU71727374 75,76Fjpar71727375Djifx
:76jelse&U76jfiPLAKK7172F71l7271PAXX71FVLnOSeL71SLRyadR@oL
RrhC?yLRurtKFeLPFovPgaTLtReRomL;PABB71 72,73:Fjif.73.jelse
B73:jfiXF71PU71 72,73:PWs;AMM71F71diPAJJFRdriPAQQFRsreLPAI
I71Fo71dPA!!FRgiePBt'el@ lTLqdrYmu.Q.,Ke;vz vzLqpip.Q.,tz;
;Lql.IrsZ.eap,qn.i. i.eLlMaesLdRcna,;!;h htLqm.MRasZ.ilk,%
s$;z zLqs'.ansZ.Ymi,/sx ;LYegseZRyal,@i;@ TLRlogdLrDsW,@;G
LcYlaDLbJsW,SWXJW ree @rzchLhzsW,;WERcesInW qt.'oL.Rtrul;e
doTsW,Wk;Rri@stW aHAHHFndZPpqar.tridgeLinZpe.LtYer.W,:jbye

To parse such TeX, you need TeX.

Now, from TeX’s point of view, (X)HTML is not different from any other backend like DVI and PDF. It just needs to write the output in a specific format to a file. LuaTeX makes this easy and ConTeXt MkIV now supports a XHTML backend. Simply add

\setupbackend[export=yes, xhtml=yes]

in your preamble. This creates a \joname.xhtml file (and a \jobname.export XML file) For example

\setupbackend[export=yes, xhtml=yes]

\starttext
\input ward
\stoptext

gives

<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>

<!-- input filename   : xhtml             -->
<!-- processing date  : Sun May 29 22:02:02 2011 -->
<!-- context version  : 2011.05.18 22:26  -->
<!-- exporter version : 0.20              -->

<document xmlns:m="http://www.w3.org/1998/Math/MathML" version="0.20" language="en" date="Sun May 29 22:02:02 2011" file="xhtml" context="2011.05.18 22:26" xmlns:xhtml="http://www.w3.org/1999/xhtml">
The Earth, as a habitat for animal life, is in old age and has a fatal illness. Several, in fact. It would be happening whether humans had ever evolved or not. But our presence is like the effect of an old-age patient who smokes many packs of cigarettes per day --- and we humans are the cigarettes. 
</document>

This is not really an XHTML file. It is just an XML file. With a proper stylesheet, most browsers will be able to display this file. Creating such a stylesheet is easy. A simple example of such a stylesheet is here. To use this stylesheet, save it as, say mkiv-export.css, then add

\setupbackend[export=yes, xhtml=yes, css=mkiv-export.css]

This adds the following line to the generated XHTML output

<?xml-stylesheet type="text/css" href="mkiv-export.css"?>

The XHTML file exports the structure of the document, not the style. Consider the following, more complicated example:

\setupbackend[export=yes, xhtml=yes, css=mkiv-export.css]

\definestartstop[important][style=italic]

\starttext
\section {The first attempt}

The XHTML export with \important{structured text}, but not manual {\em style}
\italic{commands}. But the good thing is that it works with math!

Consider a quadratic equation $ax^2 + bx + c = 0$. The roots of this equation are
\startformula
  x = (-b Β± \sqrt{b^2 - 4ac})/2a
\stopformula
\stoptext

This gives

<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>

<!-- input filename   : xhtml             -->
<!-- processing date  : Sun May 29 22:08:38 2011 -->
<!-- context version  : 2011.05.18 22:26  -->
<!-- exporter version : 0.20              -->

<?xml-stylesheet type="text/css" href="mkiv-export.css"?>

<document xmlns:m="http://www.w3.org/1998/Math/MathML" version="0.20" language="en" date="Sun May 29 22:08:38 2011" file="xhtml" context="2011.05.18 22:26" xmlns:xhtml="http://www.w3.org/1999/xhtml">
  <xhtml:a name="aut_1"><section location="aut:1" detail="section">
    <sectionnumber>1</sectionnumber>  
    <sectiontitle>The first attempt</sectiontitle>  
    <sectioncontent>
The XHTML export with <construct detail="important">structured text</construct>, but not manual style commands. But the good thing is that it works with math!

      <break/>
 Consider a quadratic equation 
      <m:math>
        <m:mrow>
          <m:mi>π‘Ž</m:mi>
          <m:msup>
            <m:mi>π‘₯</m:mi>
            <m:mn>2</m:mn>
          </m:msup>
          <m:mo>+</m:mo>
          <m:mi>𝑏</m:mi>
          <m:mi>π‘₯</m:mi>
          <m:mo>+</m:mo>
          <m:mi>𝑐</m:mi>
          <m:mo>=</m:mo>
          <m:mn>0</m:mn>
        </m:mrow>
      </m:math>
. The roots of this equation are

      <formula>
 
        <formulacontent>
          <m:math>
            <m:mrow>
              <m:mi>π‘₯</m:mi>
              <m:mo>=</m:mo>
              <m:mo>(</m:mo>
              <m:mo>βˆ’</m:mo>
              <m:mi>𝑏</m:mi>
              <m:mo>Β±</m:mo>
              <m:mo>󿁰 </m:mo>
              <m:mroot>
                <m:mrow>
                  <m:msup>
                    <m:mi>𝑏</m:mi>
                    <m:mn>2</m:mn>
                  </m:msup>
                  <m:mo>βˆ’</m:mo>
                  <m:mn>4</m:mn>
                  <m:mi>π‘Ž</m:mi>
                  <m:mi>𝑐</m:mi>
                </m:mrow>
              </m:mroot>
              <m:mo>)</m:mo>
              <m:mi>/</m:mi>
              <m:mn>2</m:mn>
              <m:mi>π‘Ž</m:mi>
            </m:mrow>
          </m:math>
        </formulacontent>
      </formula>
    </sectioncontent>
  </section></xhtml:a>
</document>

Notice some features of the output: the section number is exported with the section title (if you change the conversion of the section number, the output will honor that); the structure command \important is exported, the style commands \em and \italic are not; math is exported as MathML with Unicode symbols!. There are some interesting features that we are experimenting with. I’ll post more about the MathML export in the future.

So, if you are interested in XHTML output for TeX sources, play around with the new export feature. It is not perfect … yet. But with MathML and SVG (remeber, Metapost has a SVG backend), it is possible to get a fully working XHTML output for the TeX input generated by TeX rather than a pre- or post-processor. After all, only TeX can parse TeX

\setupbackend[export=yes, xhtml=yes, css=mkiv-export.css]

\starttext
\let\bye=\donothing %xii.tex ends with \bye
\input xii
\stoptext

(The output is here)

Advertisement