Reading remote files

Won’t it be nice if TeX could pretty-print files hosted on github, e.g.,

\typeRUBYfile{https://raw.github.com/adityam/filter/master/Rakefile}

or include a remotely hosted markdown file in your document

\processmarkdownfile{https://raw.github.com/adityam/filter/master/README.md}

I wanted to add this feature to the filter and vim modules.
Although I knew that ConTeXt could read remote files directly, I thought that it would be hard to plug into this mechanism.

Boy, was I wrong. Look at the commit history of the change needed to add this feature.

All I needed to do, was add \locfilename to get the local file name for a file. If the requested file is a remote file (i.e., starts with http:// or ftp://), ConTeXt downloads the file and stores it in the cache directory, and return the name of the cached file. Pretty neat, eh?

With this change, \process<filter>file macro of the filter module can read remote files. Since, the vim module is built on top of the filter module, the \type<vim>file can also read remote files.

The above feature is currently only available in the dev branch. I’ll make a new release once I add hooks to force re-download of remote files. Meanwhile, if you have a ConTeXt macro that reads files, just add a \locfilename at appropriate place, and your macro will be able to read remote files

Advertisements

Update for the filter module: faster caching

Over the last year, the code base of the filter module has matured considerably. Now, the module has all the features that I wanted when I started with it about a year and a half back. The last remaining limitation (in my eyes, at least) was that caching of results required a call to external programs (mtxrun) to calculate md5 hashes; as such, caching was slow. That is no longer the case. Now (since early December), md5 sums are calculated at the lua end, so there is no time penalty for caching. As a result, in MkIV, recompiling is much faster for documents having lots of external filter environments with caching enabled(i.e., environments defined with continue=yes option).

[Continue Reading]

Using ConTeXt to convert markdown to PDF

I am using markdown to write the course notes for a lecture that I am teaching this semester. The notes have no math—just text and images—and markdown is the ideal input format. At the same time, I am easily convert a markdown file to a nicely typeset pdf using pandoc—with the ConTeXt filter module providing the plumbing.

However, it took me a while to come up with a work flow that I like. Keep reading

Out of sight, out of mind

The filter module clutters the current directory with temporary files. Normally I do not notice these files because I have set my shell (zsh) and my editor (vim) to ignore these files. Still, I thought that it will be better if the module itself writes the files in a different directory. That is now possible with the latest version of the module. To set the output directory, just use

\setupexternalfilters[directory=output/]

and all the files will be created in output/ directory. The output directory can also be set on a per filter basis:

\defineexternalfilter
  [...]
  [...
   directory=output/,
  ...]

The directory has to be a path relative to the directory from where ConTeXt is run (and not the directory where the files are stored). I thought about supporting absolute paths but, in the end, that was too much trouble to do that reliably (read and write permission in texmf.cnf, strange defaults for \ReadFile macro, … I won’t bore you with the details). I did go out of my way to tell the user that absolute paths are not supported. If you try to use absolute paths you will get an error message

t-filter        : Fatal Error: Cannot use absolute path /tmp/ as directory

So, if you use the filter module and are a cleanliness freak, set the directory option.

Sometimes less is more

This post marks the official release of the filter module. Download it from github.

The biggest change from the previous version is that I decided to drop the support for running a webservice (advertised in a previous post). That feature was a distraction from the main idea of the module—make running external programs inside ConTeXt dead simple. If you were using that feature, do not panic. It is still available in a separate branch and I will try to maintain it as much as possible. I might also, perhaps, write a separate module to support web filters in the future.

Unlike traditional ConTeXt modules, the documentation is not a PDF file; rather it is available as a plain text README file, which is more accessible and easier to maintain.

Apart from dropping a feature and adding documentation, rest of the module is same. I had originally planned to use the new automated namespace feature to clean up the internals of the module, but \definenamespace works only with MkIV. I saw no reason to break the support for MkII. So, the internals still use the verbose manual name spaces. But that should not deter you from using the module. Go ahead, download it, and enjoy!

Word clouds in ConTeXt

A picture is worth a thousand words so goes the cliché. The popularity of word clouds is a testament to this “Chinese proverb” One such word cloud is at the bottom left of this blog. Now, what if you want to include a word cloud in a TeX document, as asked on tex.stackexchage. Here is the external filter module to the rescue, again..

First, we need a word cloud layout engine. I chose the IBM Word Cloud, the engine by Jonathan Feinberg that powers Wordle. To use this engine, install Java on your computer using your favorite method. Then, download the engine from IBM’s website (you need to fill out a silly registration form, sigh). Unzip the file in an appropriate directory (for simplicity, I removed the spaces from the directory name and moved everything to IBM-Word-Cloud). Edit the name of the font in examples/configuration.txt file. Run run-example.sh (or run-example.bat) file to make sure that everything works correctly.

Once this engine is working, getting word clouds in ConTeXt is easy. Download the externalfilter module. Then,

\usemodule[filter]

\defineexternalfilter
  [wordcloud]
  [filtercommand=/opt/java/jre/bin/java -jar $HOME/IBM-Word-Cloud/ibm-word-cloud.jar 
    -c $HOME/IBM-Word-Cloud/examples/configuration.txt 
    -w 800 -h 600
    -o \externalfilteroutputfile\space
    -i \externalfilterinputfile,
  output=\externalfilterbasefile.png,
  readcommand=\ExternalFigure,
  continue=yes,
  ]

\def\ExternalFigure#1{\externalfigure[#1]}

This creates an environment \startwordcloud...\stopwordcloud that stores its contents in an external file, runs that file through ibm-word-cloud.jar and includes the result.

Using this one the famous quote by Knuth gives

\starttext

\startwordcloud
Thus, I came to the conclusion that the designer of a new
system must not only be the implementer and first
large||scale user; the designer should also write the first
user manual.

The separation of any of these four components would have
hurt \TeX\ significantly. If I had not participated fully in
all these activities, literally hundreds of improvements
would never have been made, because I would never have
thought of them or perceived why they were important.

But a system cannot be successful if it is too strongly
influenced by a single person. Once the initial design is
complete and fairly robust, the real test begins as people
with many different viewpoints undertake their own
experiments. 
\stopwordcloud
\stoptext

knuth's quote

The filter module — replacing the R module

In a previous post, I described a module that I am writing to easily allow running external filters on some text. I added some more features to that module (download the module from github to test it). Hopefully, this is also show a more realistic use case of the module.

Last time, I defined a filter for running markdown as

\usemodule[filter]

\defineexternalfilter
  [markdown]
  [filter={pandoc -w context -o \externalfilteroutputfile}]

This syntax makes an implicit assumption: the output file can be specified as an option. What if pandoc did not provide such an option and always wrote its output to stdout? In that case, we can use

\defineexternalfilter
  [markdown]
  [filtercommand={pandoc -w context \externalinputfilter >  \externalfilteroutputfile}]

Note the use of filtercommand instead of filter. By default,

filtercommand="value of filter"\space \externalinputfilter

Now lets try something more ambitious: replacing the functionality of the R module. The R module provides two environments, \startR ... \stopR and \startRhidden ... \stopRhidden. Both environments write their contents to an external file, process the file through R. The \startR ... \stopR environment also types the R output (verbatimly). There is another feature that I will come to later. Lets do the above step wise:
big difference that I will come to later. Lets do the above step wise:

First, define an environment that writes its contents to an external file

\defineexternalfilter
  [R]
  [...]

Next, we need to set appropriate options.

  • To process with R, we use
    filtercommand={R CMD BATCH -q --save --restore \externalfilterinputfile\space \externalfilteroutputfile}
    
  • To set the name of the output file, we use
    output=\externalfilterbasefile.out
    
  • To type the result, we use
    readcommand=\typefile
    

Thus, the complete command is (I will explain the continue key later)

\defineexternalfilter
  [R]
  [filtercommand={R CMD BATCH -q --save --restore \externalfilterinputfile\space \externalfilteroutputfile},
   output=\externalfilterbasefile.out,
   readcommand=\typefile,
   continue=yes]

The \startRhidden ... \stopRhidden environment is slightly different. Insteading of typing the result, we want to ignore it. That can be done by setting

read=no

Thus, its comeplete specification is

\defineexternalfilter
  [Rhidden]
  [filtercommand={R CMD BATCH -q --save --restore \externalfilterinputfile\space \externalfilteroutputfile},
   output=\externalfilterbasefile.out,
   read=no,
   continue=yes]

Finally, lets come to the continue key. If the external program is fast, like pandoc (which simply parses a file), processing a temporary file does not lead to a significant time penalty. However, a R program usually does some heavy calculations, and, as such, processing an external file can take significant time. Therefore, we want to cache the results and rerun a chunk only if something has changed. This is specified by

continue=yes

which processes a chunk only if its md5 sum has changed. Thus, a complete replacement for the R module can be

\usemodule[filter]

\defineexternalfilter
  [R]
  [filtercommand={R CMD BATCH -q --save --restore \externalfilterinputfile\space \externalfilteroutputfile},
   output=\externalfilterbasefile.out,
   readcommand=\typefile,
   continue=yes]

\defineexternalfilter
  [Rhidden]
  [filtercommand={R CMD BATCH -q --save --restore \externalfilterinputfile\space \externalfilteroutputfile},
   output=\externalfilterbasefile.out,
   read=no,
   continue=yes]

Yep, thats it. Only three declarations! This correctly runs on the test file that is part of the R module.

Now, I will try to add features so that I use an external program to do syntax highlighting.