Removing multiple blank lines when typesetting code listings

The listings package in LaTeX has an option to collapse multiple empty lines into a single empty line when typesetting code lists. Today, there was a question on TeX.se how to do something similar when using the minted package. Since the vim module uses the same principle as the minted package, I wondered how one could collapse multiple empty lines into a single line?

One of the fetures of the vim module is that you can source an arbitrary vimrc file before processing the code through the vim editor to generate syntax highlighted code. This feature makes it possible to delegate the task to collapsing multiple blanks lines into a single blank line to vim, the editor. Since the vim module first writes the source code in a file with extension .tmp, the following vimrc snippet will collapse all multiple blank lines into a single blank line whenever a .tmp file is loaded:

au BufEnter *.tmp %s/\(^\s*\n\)\{2,\}/\r/ge | w

Use this inside the vim module as follows (example also available on github):

\usemodule[vim]

\startvimrc[name=collapse]
au BufEnter *.tmp %s/\(^\s*\n\)\{2,\}/\r/ge | w
\stopvimrc


\definevimtyping[CPPtyping][syntax=cpp, vimrc=collapse]

\starttext
\startCPPtyping
  i++;


  i++;






  i--;
\stopCPPtyping
\stoptext

Agreed, this is not as simple as the extralines=1 option in the listings package. But, it is not too complicated when you consider the fact that I had not thought about this feature at all when I wrote the vim module.

How I stopped worrying and started using Markdown like TeX

These days I type most of simple documents (short articles, blog entries, course notes) in markdown. Markdown provides only the basic structured elements (sections, emphasis, urls, lists, footnotes, syntax highlighting, simple tables and figures) which makes it easy to transform the input into multiple output formats. Most of the time, I still want PDF output and for that, I use pandoc to convert markdown to ConTeXt. At the same time, I have the peace of mind that if I need HTML or DOC output, I’ll be able to get that easily.

For most of the last decade, I have almost exclusively used LaTeX/ConTeXt for writing all my documents. After moving to Markdown, I miss three features of TeX: separation of content and presentation; conditional inclusion of content; and including external documents. In this post, I’ll explain how to get these with Markdown.

Separation of content and presentation

TeX gives you a lot of control for creating new structural elements. Let’s take a simple example. Suppose I want to write a file name in a document. Normally, I want the filename to appear in typewriter font. In LaTeX, I could type it as

\texttt{src/hello.c}

but it is better to define a custom macro \filename and use

\filename{src/hello.c}

The advantage is two-fold. Firstly, while writing the file, I am thinking in term of content (filename) rather than presentation (typewriter font). Secondly, in the future, if I want to change how a filename is displayed (perhaps as a hyper-link to the file), all I need to do is change the definition of the macro. Markdown, with its simplistic structure, lacks the ability to define custom macros.

Conditional compilation

TeX also makes it trivial to generate multiple versions of the document from the same source. Again, lets take an example. Suppose I am writing notes for a class. Normally, I like to include a short bullet list on my lecture slides, but include a detailed description in the lecture handout. In ConTeXt I can use modes as follows (LaTeX has a similar feature using the comments package):

Feature of the solution
\startitemize[n] 
   \item Feature 1 

     \startmode[handout] 
       Explanation of the feature ... 
     \stopmode 

   \item Feature 2 

     \startmode[handout]
       Explanation of the feature ... 
     \stopmode
\stopitemize

To generate the slides version of my lecture notes, I compile them using

context --mode=slides --result=slides <filename>

This version just contains the bullet list. Since the handout mode is not set, the content between \startmode[handout] ... \stopmode is omitted.

To generate the handout version of my lecture notes, I compile them using

context --mode=handout --result=slides <filename>

Since the handout mode is set, the content between \startmode[handout] ... \stopmode is included

Such a conditional compilation is extremely useful to keep the slides and handouts in sync. Again, markdown with its simplistic feature set, lacks the ability of conditional compilation. Neither does Pandoc add this feature.

Including external documents

TeX makes it easy to include external documents. This is really important when you want to include source code in your documents. I teach an introductory programming class, and want to make sure that the example code included in my notes is correct. I write the code in a separate file, write the corresponding test files to ensure that the code works correctly, and then include it in my notes using

\typeJAVAfile[src/FactoryExample.java]

which gives me syntax highlighted source code. Pandoc does generate syntax highlighted source code, but does not provide any means to include external source code. So, I have to copy paste the code from the actual source file to the markdown document, but that is an error-prone process.


If I only cared about PDF output (via LaTeX/ConTeXt backend), I could simply use the same TeX macros in the markdown document. Pandoc passes the TeX macros unchanged to the LaTeX/ConTeXt backend, so I would get a TeX document with all the bells and whistles. But, if I tried to generate HTML or DOC output, these TeX macros will be omitted, and I’d get a broken document. One of my reasons to switching to Markdown was the peace of mind that I can generate HTML or DOC output if needed. Using TeX macros in the source takes away that advantage.

So, I started looking for possible solutions and found gpp—the generic pre-processor. It is similar to the C-preprocessor (that handles the #include, #define, stuff in C/C++) but provides many configuration options. I use it with the -H option, which requires macros to be specified in an HTML-like mode:

<#include "file">
<#define MACRO|value>
Use <#MACRO>

Normally the <#...> does not appear in a document, so using gpp is safe.
See the gpp documentation for complete details. I’ll show how to get the three features that I miss from TeX:

  1. Separation of content and presentationWith gppI can define new macros that denote new structural elements, e.g.,
    <#define filename|`#1`>
    The source is included in <#filename src/hello.c>

    When I compile the document using gpp -H, I get

    The source is included in `src/hello.c`

    Sure, this requires more typing that simply using `...`, but that is the price that one has to pay for getting more structure. More importantly, I can define the #filename macro based on the output format:

    <#define filename|`#1`>
    <#ifdef HTML>
         <#define filename|<code class="filename">#1</code>> 
    <#endif>
    <#ifdef TEX> 
         <#define filename|\\filename{#1}> 
    <#endif> 
    The source is included in <#filename src/hello.c>

    Now, if I compile the document using gpp -H -DHTML=1, I get

    The source is included in <code class="filename">src/hello.c</code

    and if I compile using gpp -H -DTEX=1, I get

    The source is included in \filename{src/hello.c}

    This ensures that the document structure is passed to the output as well.

    To make it easy to manage macros, create three files, macros.gpp containing all macros, html.gpp overwriting some of the macros with HTML equivalents, and tex.gpp overwriting some of the macros with TeX equivalents. End macros.cpp file with

    ....
    <#ifdef HTML> 
        <#include "html.gpp"> 
    <#endif> 
    <#ifdef TEX> 
         <#include "tex.gpp"> 
    <#endif>

    and then preprocess the document using gpp -DTEX=1 --include macors.gpp <filename> (or -DHTML=1 for HTML output).

  2. Conditional compilationActually, the previous example already shows how to get conditional compilation: use the -D command line switch and check the variable definition using #ifdef. Thus, the above example translates to:
    Feature of the solution
    
    1. Feature 1 
    
    <#ifdef HANDOUT> 
    Explanation of the feature ... 
    <#endif> 
    
    2. Feature 2 
    
    <#ifdef HANDOUT> 
    Explanation of the feature ... 
    <#endif>

    When I compile without -DHANDOUT=1, I get the slides version; when I compile with -DHANDOUT-1, I get the handout version.

  3. Including external documentsExternal documents can be included with #includedirective. So, I can include an external file using
    ~~~ {.java} 
    <#include "src/Factory.java">
    ~~~

Putting it all together

All that is needed is to run the gpp preprocessor and then pass the output to pandoc.

gpp -H <options> <filename> | pandoc -f markdown -t <format> -o <outfile>

Hide this in a wrapper script or a shell function or a Makefile, and you have a markdown processor with the important features of TeX!

A ConTeXt style file for formatting RSS feeds for Kindle

As I said in the last post, I bought an Amazon Kindle Touch sometime back, and I find it very useful for reading on the bus/train while commuting to work.I use it read novels and books, a few newspapers and magazines that I subscribe to, and RSS feeds of different blogs that I follow. Until now, I had been using ifttt to send RSS feeds to Instapaper; Instapaper then emails a daily digest as an ebook to kindle account at midnight; in the morning, I switch on my Kindle for a minute; the Kindle syncs new content over Wifi; and off I go.

However, Kindle typesets ebooks very poorly,  so I decided to write a ConTeXt style file to typeset RSS feed (check it out on github).  To use this style:

\usemodule[rssfeed]

\starttext
\setvariables
    [rssfeed]
    [title={Title of the feed},
     description={Description of the feed},
     link={A link to the feed},
    ]

\starttitle[title={First feed entry}]
....
\stopttitle

\starttitle[title={Second feed entry}]
...
\stoptitle

\stoptext

It uses the eink module to set the page layout and fonts, and use a light and clean style for formatting feed entries. Since the proof is in the pudding, look at following PDFs to see the style for different types of blogs.

I use a simple ruby script to parse RSS feeds and uses Pandoc to convert the contents of each entry to ConTeXt. The script is without bells and whistles, and there is no site specific formatting of feeds. All feeds are handles the same way, and as a result, there are a few glitches: For example, IEEE uses some non-standard tags to denote math) which Pandoc doesn’t handle and the images generated by WordPress blogs that use $latex=...$ to typeset math are not handled correctly by ConTeXt, etc.

The script also uses Mutt to email the generated PDF to my Kindle account. This way, I can simply add a cron job that runs the script at appropriate frequency (daily for usual blogs, weekly for low traffic blogs, and once a month for table of contents of different journals).

A style file for eink readers

Recently I bought an Amazon Kindle touch. It is more convenient than the IREX DR1000 for reading morning news and blogs (thanks to instapaper’s automated delivery of “Read Later” articles, and ifttt for sending RSS feeds to Instapaper).

I have also started reading novels on the Kindle as opposed to the DR1000. Being small, the Kindle is easier to carry; and its hardware just works better than the DR1000: instant startup, huge battery life, and wifi; all areas where DR1000 was lacking. Still DR1000 is the best device when it comes to reading and annotating academic papers, which is surprising given that DR1000 came out 3.5 years ago; perhaps the “eink devices for reading and annotating academic papers” is too niche a niche market to have a successful product. DR1000 was $800 and IREX is now bankrupt.

Anyways, since I am reading novels on Kindle, I have updated my old ConTeXt style file for DR1000 to also handle Kindle and am releasing that as a ConTeXt module. Actually, as two ConTeXt modules: t-eink-devices that stores the dimensions and desired font sizes for eink devices (currently, it has data only for DR1000 and Kindle as those are the only devices that I have) and t-eink that sets an easy to read style that includes:

  • Paper size that matches the screen dimensions
  • Tiny margins, no headers and footers
  • Bookmarks for titles and chapters (both DR1000 and Kindle can use PDF bookmarks as table of contents)
  • A reasonable default style for chapter and title headings
  • A \startinterlude\stopinterlude environment for title pages, dedication, etc.

I have only tested this with simple novels (mostly texts and pictures). That is why the module does not set any style for sections, subsections, etc, as I did not need them so far.

This is mostly for personal use, but I am announcing this module in case someone wants to give it a shot. To use the module, simply add

\usemodule[irex] 
 [ 
 % alternative=kinde, % or DR1000 
 % mainfont={Tex Gyre Schola}, 
 % sansfont={Tex Gyre Heros}, 
 % monofont={Latin Modern Mono}, 
 % mathfont={Xits}, 
 % size=, % By default, kindle uses 10pt and DR1000 uses 12pt font.
 % Use this setting if you want to set a font size.
 ]

This module passes the font loading to the simplefonts module. So, use any name for mainfont etc. that simplefonts will understand. If you don’t set any option, then the default values, indicated above, are used. So, to test out the module, you can just use (for kindle):

\usemodule[eink]

or (for DR1000)

\usemodule[eink][alternative=DR1000]

Below are the samples from Le Petit Prince. The text and images were taken from this website and converted to ConTeXt using Pandoc. The text is also available from Project Gutenburg, Australia.

If you have a Kindle or a DR1000, you can compare the quality of these PDFs (hyphenation, line-breaking, widows and orphans) from what you get from the eBook version. If I am to spend 5-10 hours reading a novel, I don’t mind spending 15 minutes extra (to create a PDF version of the book) to make that reading experience pleasant.

The output is not perfect, especially in terms for float placement in the Kindle version (Page 5 has an underfull page because the figure was too big to fit in the page, the right float image on page 10 would have been better as a here figure, the right float figures on page 13-14 are much lower compared to where they are referred, etc.). But, I find these more tolerable than a chapter title appearing at the bottom of the page and occasionally loosing pagination when I highlight text (both of which happen with epub documents).

Reading remote files

Won’t it be nice if TeX could pretty-print files hosted on github, e.g.,

\typeRUBYfile{https://raw.github.com/adityam/filter/master/Rakefile}

or include a remotely hosted markdown file in your document

\processmarkdownfile{https://raw.github.com/adityam/filter/master/README.md}

I wanted to add this feature to the filter and vim modules.
Although I knew that ConTeXt could read remote files directly, I thought that it would be hard to plug into this mechanism.

Boy, was I wrong. Look at the commit history of the change needed to add this feature.

All I needed to do, was add \locfilename to get the local file name for a file. If the requested file is a remote file (i.e., starts with http:// or ftp://), ConTeXt downloads the file and stores it in the cache directory, and return the name of the cached file. Pretty neat, eh?

With this change, \process<filter>file macro of the filter module can read remote files. Since, the vim module is built on top of the filter module, the \type<vim>file can also read remote files.

The above feature is currently only available in the dev branch. I’ll make a new release once I add hooks to force re-download of remote files. Meanwhile, if you have a ConTeXt macro that reads files, just add a \locfilename at appropriate place, and your macro will be able to read remote files

Update for the filter module: faster caching

Over the last year, the code base of the filter module has matured considerably. Now, the module has all the features that I wanted when I started with it about a year and a half back. The last remaining limitation (in my eyes, at least) was that caching of results required a call to external programs (mtxrun) to calculate md5 hashes; as such, caching was slow. That is no longer the case. Now (since early December), md5 sums are calculated at the lua end, so there is no time penalty for caching. As a result, in MkIV, recompiling is much faster for documents having lots of external filter environments with caching enabled(i.e., environments defined with continue=yes option).

[Continue Reading]

Some thoughts on lowering the learning curve for using TeX (part I)

TeX has a steep learning curve. Often times, steeper than it needs to be. Take, for example, the special characters in TeX. Almost every introduction to plain TeX, eplain, LaTeX, or ConTeXt has a section on these special characters

\ { } $ & # ^ _ & ~

A good introduction then goes on to explain why these special characters are important; sometimes dropping a hint about category codes. I feel that these details are useless and, at the user level, we should get rid of them.

[Continue Reading]