The filter module — replacing the R module

In a previous post, I described a module that I am writing to easily allow running external filters on some text. I added some more features to that module (download the module from github to test it). Hopefully, this is also show a more realistic use case of the module.

Last time, I defined a filter for running markdown as

\usemodule[filter]

\defineexternalfilter
  [markdown]
  [filter={pandoc -w context -o \externalfilteroutputfile}]

This syntax makes an implicit assumption: the output file can be specified as an option. What if pandoc did not provide such an option and always wrote its output to stdout? In that case, we can use

\defineexternalfilter
  [markdown]
  [filtercommand={pandoc -w context \externalinputfilter >  \externalfilteroutputfile}]

Note the use of filtercommand instead of filter. By default,

filtercommand="value of filter"\space \externalinputfilter

Now lets try something more ambitious: replacing the functionality of the R module. The R module provides two environments, \startR ... \stopR and \startRhidden ... \stopRhidden. Both environments write their contents to an external file, process the file through R. The \startR ... \stopR environment also types the R output (verbatimly). There is another feature that I will come to later. Lets do the above step wise:
big difference that I will come to later. Lets do the above step wise:

First, define an environment that writes its contents to an external file

\defineexternalfilter
  [R]
  [...]

Next, we need to set appropriate options.

  • To process with R, we use
    filtercommand={R CMD BATCH -q --save --restore \externalfilterinputfile\space \externalfilteroutputfile}
    
  • To set the name of the output file, we use
    output=\externalfilterbasefile.out
    
  • To type the result, we use
    readcommand=\typefile
    

Thus, the complete command is (I will explain the continue key later)

\defineexternalfilter
  [R]
  [filtercommand={R CMD BATCH -q --save --restore \externalfilterinputfile\space \externalfilteroutputfile},
   output=\externalfilterbasefile.out,
   readcommand=\typefile,
   continue=yes]

The \startRhidden ... \stopRhidden environment is slightly different. Insteading of typing the result, we want to ignore it. That can be done by setting

read=no

Thus, its comeplete specification is

\defineexternalfilter
  [Rhidden]
  [filtercommand={R CMD BATCH -q --save --restore \externalfilterinputfile\space \externalfilteroutputfile},
   output=\externalfilterbasefile.out,
   read=no,
   continue=yes]

Finally, lets come to the continue key. If the external program is fast, like pandoc (which simply parses a file), processing a temporary file does not lead to a significant time penalty. However, a R program usually does some heavy calculations, and, as such, processing an external file can take significant time. Therefore, we want to cache the results and rerun a chunk only if something has changed. This is specified by

continue=yes

which processes a chunk only if its md5 sum has changed. Thus, a complete replacement for the R module can be

\usemodule[filter]

\defineexternalfilter
  [R]
  [filtercommand={R CMD BATCH -q --save --restore \externalfilterinputfile\space \externalfilteroutputfile},
   output=\externalfilterbasefile.out,
   readcommand=\typefile,
   continue=yes]

\defineexternalfilter
  [Rhidden]
  [filtercommand={R CMD BATCH -q --save --restore \externalfilterinputfile\space \externalfilteroutputfile},
   output=\externalfilterbasefile.out,
   read=no,
   continue=yes]

Yep, thats it. Only three declarations! This correctly runs on the test file that is part of the R module.

Now, I will try to add features so that I use an external program to do syntax highlighting.

Advertisements