How I stopped worrying and started using Markdown like TeX

These days I type most of simple documents (short articles, blog entries, course notes) in markdown. Markdown provides only the basic structured elements (sections, emphasis, urls, lists, footnotes, syntax highlighting, simple tables and figures) which makes it easy to transform the input into multiple output formats. Most of the time, I still want PDF output and for that, I use pandoc to convert markdown to ConTeXt. At the same time, I have the peace of mind that if I need HTML or DOC output, I’ll be able to get that easily.

For most of the last decade, I have almost exclusively used LaTeX/ConTeXt for writing all my documents. After moving to Markdown, I miss three features of TeX: separation of content and presentation; conditional inclusion of content; and including external documents. In this post, I’ll explain how to get these with Markdown.

Separation of content and presentation

TeX gives you a lot of control for creating new structural elements. Let’s take a simple example. Suppose I want to write a file name in a document. Normally, I want the filename to appear in typewriter font. In LaTeX, I could type it as

\texttt{src/hello.c}

but it is better to define a custom macro \filename and use

\filename{src/hello.c}

The advantage is two-fold. Firstly, while writing the file, I am thinking in term of content (filename) rather than presentation (typewriter font). Secondly, in the future, if I want to change how a filename is displayed (perhaps as a hyper-link to the file), all I need to do is change the definition of the macro. Markdown, with its simplistic structure, lacks the ability to define custom macros.

Conditional compilation

TeX also makes it trivial to generate multiple versions of the document from the same source. Again, lets take an example. Suppose I am writing notes for a class. Normally, I like to include a short bullet list on my lecture slides, but include a detailed description in the lecture handout. In ConTeXt I can use modes as follows (LaTeX has a similar feature using the comments package):

Feature of the solution
\startitemize[n] 
   \item Feature 1 

     \startmode[handout] 
       Explanation of the feature ... 
     \stopmode 

   \item Feature 2 

     \startmode[handout]
       Explanation of the feature ... 
     \stopmode
\stopitemize

To generate the slides version of my lecture notes, I compile them using

context --mode=slides --result=slides <filename>

This version just contains the bullet list. Since the handout mode is not set, the content between \startmode[handout] ... \stopmode is omitted.

To generate the handout version of my lecture notes, I compile them using

context --mode=handout --result=slides <filename>

Since the handout mode is set, the content between \startmode[handout] ... \stopmode is included

Such a conditional compilation is extremely useful to keep the slides and handouts in sync. Again, markdown with its simplistic feature set, lacks the ability of conditional compilation. Neither does Pandoc add this feature.

Including external documents

TeX makes it easy to include external documents. This is really important when you want to include source code in your documents. I teach an introductory programming class, and want to make sure that the example code included in my notes is correct. I write the code in a separate file, write the corresponding test files to ensure that the code works correctly, and then include it in my notes using

\typeJAVAfile[src/FactoryExample.java]

which gives me syntax highlighted source code. Pandoc does generate syntax highlighted source code, but does not provide any means to include external source code. So, I have to copy paste the code from the actual source file to the markdown document, but that is an error-prone process.


If I only cared about PDF output (via LaTeX/ConTeXt backend), I could simply use the same TeX macros in the markdown document. Pandoc passes the TeX macros unchanged to the LaTeX/ConTeXt backend, so I would get a TeX document with all the bells and whistles. But, if I tried to generate HTML or DOC output, these TeX macros will be omitted, and I’d get a broken document. One of my reasons to switching to Markdown was the peace of mind that I can generate HTML or DOC output if needed. Using TeX macros in the source takes away that advantage.

So, I started looking for possible solutions and found gpp—the generic pre-processor. It is similar to the C-preprocessor (that handles the #include, #define, stuff in C/C++) but provides many configuration options. I use it with the -H option, which requires macros to be specified in an HTML-like mode:

<#include "file">
<#define MACRO|value>
Use <#MACRO>

Normally the <#...> does not appear in a document, so using gpp is safe.
See the gpp documentation for complete details. I’ll show how to get the three features that I miss from TeX:

  1. Separation of content and presentationWith gppI can define new macros that denote new structural elements, e.g.,
    <#define filename|`#1`>
    The source is included in <#filename src/hello.c>

    When I compile the document using gpp -H, I get

    The source is included in `src/hello.c`

    Sure, this requires more typing that simply using `...`, but that is the price that one has to pay for getting more structure. More importantly, I can define the #filename macro based on the output format:

    <#define filename|`#1`>
    <#ifdef HTML>
         <#define filename|<code class="filename">#1</code>> 
    <#endif>
    <#ifdef TEX> 
         <#define filename|\\filename{#1}> 
    <#endif> 
    The source is included in <#filename src/hello.c>

    Now, if I compile the document using gpp -H -DHTML=1, I get

    The source is included in <code class="filename">src/hello.c</code

    and if I compile using gpp -H -DTEX=1, I get

    The source is included in \filename{src/hello.c}

    This ensures that the document structure is passed to the output as well.

    To make it easy to manage macros, create three files, macros.gpp containing all macros, html.gpp overwriting some of the macros with HTML equivalents, and tex.gpp overwriting some of the macros with TeX equivalents. End macros.cpp file with

    ....
    <#ifdef HTML> 
        <#include "html.gpp"> 
    <#endif> 
    <#ifdef TEX> 
         <#include "tex.gpp"> 
    <#endif>

    and then preprocess the document using gpp -DTEX=1 --include macors.gpp <filename> (or -DHTML=1 for HTML output).

  2. Conditional compilationActually, the previous example already shows how to get conditional compilation: use the -D command line switch and check the variable definition using #ifdef. Thus, the above example translates to:
    Feature of the solution
    
    1. Feature 1 
    
    <#ifdef HANDOUT> 
    Explanation of the feature ... 
    <#endif> 
    
    2. Feature 2 
    
    <#ifdef HANDOUT> 
    Explanation of the feature ... 
    <#endif>

    When I compile without -DHANDOUT=1, I get the slides version; when I compile with -DHANDOUT-1, I get the handout version.

  3. Including external documentsExternal documents can be included with #includedirective. So, I can include an external file using
    ~~~ {.java} 
    <#include "src/Factory.java">
    ~~~

Putting it all together

All that is needed is to run the gpp preprocessor and then pass the output to pandoc.

gpp -H <options> <filename> | pandoc -f markdown -t <format> -o <outfile>

Hide this in a wrapper script or a shell function or a Makefile, and you have a markdown processor with the important features of TeX!

About these ads

24 thoughts on “How I stopped worrying and started using Markdown like TeX

  1. Excellent post. I’ve run into a problem that needs something like C preprocessing. Thanks for reminding me about standalone systems like GOP.

  2. Hi, I tried to install pandoc on windows 7 for using the multiple export feature of Markdown. The Haskell platform required is almost 900 mb and even after installing it in different locations, doesnt run the cabal command reuired to run and install pandoc.

    The online pandoc converter does not have the option of converting to beamer.

    Would you please suggest if there is any other online pandoc converter or the best way to export markdown in beamer?

    I was thinking of doing regex replace section by frame after converting online to Latex but it seems there has to be a better option than that.

  3. Pingback: TeXtalk: an interview with Aditya « Stack Exchange TeX Blog

  4. Thanks for the post, definitely something to think about.

    Just curious about how you deal with tables. This is where Markdown as the source/archival format breaks for me because I would often need to tweak the tex output quite a bit (e.g. format the header row, add booktab rules, etc.). It is also where LaTeX is almost equally bad in separating content from presentation so I imagine it won’t be too easy to make it work with preprocessing the source. Am I missing something?

    • Luckily I do not have table heavy content :)

      ConTeXt does provide a table mechanism that makes it easy to separate content from presentation. See for example, these questions on TeX.SE

      - http://tex.stackexchange.com/a/4848/323
      - http://tex.stackexchange.com/a/13827/323
      - http://tex.stackexchange.com/a/69038/323

      However, pandoc translates table to the old tabluar mechanism rather than the new TABLE mechanism. I’ll post an example that shows how to separate content and presentation in tables.

      • Thanks!

        I actually remember reading the second question not too long ago – at least the LaTeX parts :) I haven’t looked into ConTeXt to be honest, but it may be worth a look just for that, although likely only for personal content. I doubt that publishers actually support it and my colleagues don’t even want to make the “giant leap” to XeLaTeX.

        Some time ago when I was looking at org-mode, Markdown, etc. I decided that I could use them for notes and for starting documents, but at some point I would need to convert to LaTeX and lose the ability to go back. I was also thinking whether LaTeX can be the source format, but converting an existing LaTeX document to anything else is also not great. Sometimes I wonder if I’m using it wrong.

        Going back to the tables, personally I wouldn’t mind to keep them separate from the main document just like we do most of the time for figures. Hopefully in a format, which will also provide information on how to format them properly in an implementation independent way. I want too much, right?

  5. Pretty good post! We were wondering (at work) how to simplify our process when writing new documents in a cross-platform way (we have windows, mac, linux and word and powerpoint documents do not like that!). Markdown seems to be a good option but for things like text color there is nothing in markdown nor Pandoc. I was able to bypass this with LaTeX \color{red} for example and text but I there was no unique solution to produce HTML and PDF from the same file.
    This one seems the best to take! Will try this deeper and come back later to give my first impressions.
    Thanks for the blog post!

  6. I like the post but there is a problem. GPP eats formatting eats the @ that are used in citations, i.e., [@Mycitation] becomes [MyCitation] after running through gpp. Is there a way to tell gpp to not touch the @ symbol or another work around? Thanks.

  7. I noticed that you mentioned conversion into .doc format. I was just what would be syntax for such a conversion, and how that works. I’ve been looking around on Google for information on docx format, but have had no luck so far.

    I guess my question is this: How do you go about creating .doc/.docx using macros and gpp in Markdown? Thanks.

      • Ah sorry. I just re-read what I wrote and realized how bad it was. I’m trying to use GPP to create the .docx compliant XML for .docx. When I try to convert more than once with pandoc (e.g. from .md to (.tex/.html) to .docx), there is always some sort of formatting that seems to be lost (some equations don’t convert properly to Office MathML, for example). I was wondering what your experience was with this. Thanks a bunch!

        • I only use simple in-line equations in my documents and they convert correctly to docx. If you have equations that don’t convert properly, perhaps it is best to reprt that either to pandoc mailing list, or to the issue page for pandoc on github.

  8. When I initially commented I clicked the “Notify me when new comments are added” checkbox and now
    each time a comment is added I get four emails with the same comment.
    Is there any way you can remove me from that service?
    Thank you!

    Here is my weblog: system software ([http://icuficagetyg.webege.com](http://icuficagetyg.
    webege.com “http://icuficagetyg.webege.com”))

  9. Really nice post here!

    (If you want more people finding this, you should consider to change the post’s title into something containing better matching keywords: like adding a sub-title which says “How to pre-process Markdown with ‘gpp’ to enable conditional compilations, include external documents and more…” (or similar).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s