The filter module — replacing the R module

In a previous post, I described a module that I am writing to easily allow running external filters on some text. I added some more features to that module (download the module from github to test it). Hopefully, this is also show a more realistic use case of the module.

Last time, I defined a filter for running markdown as

\usemodule[filter]

\defineexternalfilter
  [markdown]
  [filter={pandoc -w context -o \externalfilteroutputfile}]

This syntax makes an implicit assumption: the output file can be specified as an option. What if pandoc did not provide such an option and always wrote its output to stdout? In that case, we can use

\defineexternalfilter
  [markdown]
  [filtercommand={pandoc -w context \externalinputfilter >  \externalfilteroutputfile}]

Note the use of filtercommand instead of filter. By default,

filtercommand="value of filter"\space \externalinputfilter

Now lets try something more ambitious: replacing the functionality of the R module. The R module provides two environments, \startR ... \stopR and \startRhidden ... \stopRhidden. Both environments write their contents to an external file, process the file through R. The \startR ... \stopR environment also types the R output (verbatimly). There is another feature that I will come to later. Lets do the above step wise:
big difference that I will come to later. Lets do the above step wise:

First, define an environment that writes its contents to an external file

\defineexternalfilter
  [R]
  [...]

Next, we need to set appropriate options.

  • To process with R, we use
    filtercommand={R CMD BATCH -q --save --restore \externalfilterinputfile\space \externalfilteroutputfile}
    
  • To set the name of the output file, we use
    output=\externalfilterbasefile.out
    
  • To type the result, we use
    readcommand=\typefile
    

Thus, the complete command is (I will explain the continue key later)

\defineexternalfilter
  [R]
  [filtercommand={R CMD BATCH -q --save --restore \externalfilterinputfile\space \externalfilteroutputfile},
   output=\externalfilterbasefile.out,
   readcommand=\typefile,
   continue=yes]

The \startRhidden ... \stopRhidden environment is slightly different. Insteading of typing the result, we want to ignore it. That can be done by setting

read=no

Thus, its comeplete specification is

\defineexternalfilter
  [Rhidden]
  [filtercommand={R CMD BATCH -q --save --restore \externalfilterinputfile\space \externalfilteroutputfile},
   output=\externalfilterbasefile.out,
   read=no,
   continue=yes]

Finally, lets come to the continue key. If the external program is fast, like pandoc (which simply parses a file), processing a temporary file does not lead to a significant time penalty. However, a R program usually does some heavy calculations, and, as such, processing an external file can take significant time. Therefore, we want to cache the results and rerun a chunk only if something has changed. This is specified by

continue=yes

which processes a chunk only if its md5 sum has changed. Thus, a complete replacement for the R module can be

\usemodule[filter]

\defineexternalfilter
  [R]
  [filtercommand={R CMD BATCH -q --save --restore \externalfilterinputfile\space \externalfilteroutputfile},
   output=\externalfilterbasefile.out,
   readcommand=\typefile,
   continue=yes]

\defineexternalfilter
  [Rhidden]
  [filtercommand={R CMD BATCH -q --save --restore \externalfilterinputfile\space \externalfilteroutputfile},
   output=\externalfilterbasefile.out,
   read=no,
   continue=yes]

Yep, thats it. Only three declarations! This correctly runs on the test file that is part of the R module.

Now, I will try to add features so that I use an external program to do syntax highlighting.

Advertisements

4 thoughts on “The filter module — replacing the R module

  1. This is really great. Certainly not as fully featured as Sweave but I really like the ability to mix markdown and R. Looking forward to the highlighting. If only i didn’t have to learn Context…

    • Actually mixing Sweave and markdown is really easy, because Sweave leaves everything outside the code blocks untouched. I recently used sweave on some markdown + R codeto generate typeset output by first running Sweave on the file and then post processing the result using pandoc.

      I had to go out of my way because I was using ConTeXt. I think that pandoc leaves contents of the code environment untouched, so a LaTeX output will be much easier.

      • That is a great example. It is certainly useful to me since I am comfortable with tool chains. I do like the possibility with your tool of “single” step compilation (that is no explicit pre/post processors). Especially for those not comfortable with tool chains.

        On another note, you might be interested in pgfSweave and tikzDevice since you are using math characters in your figures.

  2. Pingback: Word clouds in ConTeXt « Random Determinism

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s