Some thoughts on lowering the learning curve for using TeX (part I)

TeX has a steep learning curve. Often times, steeper than it needs to be. Take, for example, the special characters in TeX. Almost every introduction to plain TeX, eplain, LaTeX, or ConTeXt has a section on these special characters

\ { } $ & # ^ _ & ~

A good introduction then goes on to explain why these special characters are important; sometimes dropping a hint about category codes. I feel that these details are useless and, at the user level, we should get rid of them.

If you are skeptical, I don’t blame you. After all, category codes are the very soul of TeX. However, I strongly believe that they are useless at the user level. Lets go over each of these special characters one-by-one and see if we really need them.

Minimum category codes: \ { }

The only category codes that we need at the user level are \ { }. The character \ marks the start of a control sequence, and { and } group the arguments. The rest, can simply be replaced by control sequences.

Math mode category codes: $ _ ^

In TeX, $ is used to delimit math mode—Knuth used dollars as the math shift character because typesetting math was expensive, so goes an old joke. But do we really need to stick to $? After all, at the user level, both LaTeX and ConTeXt do not use $$ to move to display math mode. Both macro packages provide environments for display math. Can’t we do the same for in-line math? In fact, both LaTeX and ConTeXt also provide macros for in-line math: LaTeX uses \(...\) and ConTeXt uses \m{...} and \math{...}. The only trouble is that these macros are not widely used (and that the LaTeX macros are not robust, but that is easily correctable). The only real argument in favor of $...$ is that it shorted to type, but compared to \(...\) or \m{...}, not by much.

The same is true for _ and ^. Both LaTeX and ConTeXt (in fact, so does plain TeX!) provide macros for both of them: \sp for ^ and \sb for _. But don’t panic! I am not asking everyone to start using \sp and \sb. What I am asking is that _ and ^ have normal meaning in text mode. That is, if I type _, I should get _, not a funky error message. In fact, this is not too difficult to achieve. In LaTeX, use the underscore package (it is easy to extend that to take care of ^ as well), and in ConTeXt use \nonknuthmode somewhere in your preamble.

Of course, the next logical step is to make $ a normal letter: that is, if you type $ you get $. This is not possible in pdftex, because there is no other means of entering into math mode (other than making some other character the math shift character, but that has the same drawback as making $ the math shift character.) However, luatex provides primitive control sequences for entering and exiting in-line and display math. So, it is possible to make $ a normal letter.

Align character &

Horizontal alignment is one of the strengths of TeX. Most table and multi-line display math environments use horizontal alignment and & specifies the alignment point for horizontal alignment. Surely, getting rid of & will not work.

Unfortunately, that is true in LaTeX. The & character is so critical for horizontal alignment at the user-level that eliminating it will mean a lot of change. Perhaps, & can be handled in the same manner as _ and ^: it can be a regular letter in text mode and have special meaning inside horizontal alignment. But, it is not always clear to the users which macros use horizontal alignment internally. As such, changing the meaning of & inside some environments will bring more trouble than benefits.

However, the situation in ConTeXt is completely different. At the user-level, & is never used to indicate the alignment point. Both tables and multi-line math display use \NC ... \NC ... \NC \NR type of syntax to indicate new columns. In such a situation it is all the more awkward to explain to a user why & is a special character. It should just be made a normal letter. LuaTeX provides a \aligntab primitive which can be used instead in alignment macros.

Parameter indicator #

Macros is what makes TeX different from all other text markup languages. Automatic numbering, cross-references, headers and footers, and all possible due to macros. And #1 is used to indicate the first parameter for the macro, #2 the second, and so on. But, why do we need this special meaning at the user-level? Only the macro writer needs to care about it.

Most LaTeX macros are written in .sty files, that are loaded under a different catcode regime anyways. Most ConTeXt macros are written inside \unprotect ... \protect. So, it is easy to set the traditional catcode regime in both cases. If a user really needs to define macros in the middle of the document, there can be a “programming” environment. For example, ConTeXt provides \starttexcode...\stoptexcode, which sets the same catcodes as \unprotect...\protect. Implementing the same environment in LaTeX is trivial (think \makeatletter...\makeatother on steroids).

Unbreakable space ~

Knuth used ~ to indicate an unbreakable space, and that tradition has continued ever since. In this age of Unicode text, do we still need such crutches. It is easy enough to type Unicode 0x00AA (non-breakable space) in most editors. For example, in vim I just need to type CTRL+K+<space>+<space>. A smart syntax highlighting scheme will make the non-breakable space visible. So, there is no real reason to keep on using ~ as a non-breakable space. The same argument holds for the TeX macros for accents, typing in Unicode is easy to input and easy to read (but that will be the subject of another rant).

So, what’s the point of all this?

Now image that all these features have been implemented. Then, we may split the introduction to a TeX macro package into two parts: using the macro package and programming the macro package. Split the first part into two further parts: text mode and math mode. For the text mode, the only special characters are \ { } %. All other characters are normal, that means if you type them, you see them in the output (provided the font has the glyph; lets ignore complex languages like Arabic, CJK, and Indic scripts and setting appropriate font features for them at the moment). \ starts a control sequence, {...} groups an argument, and % is a line comment. For the math mode, explain how to enter math mode (\(...\) or \m{...} or the display-math environments) and explain that _ and ^ are used to indicate sub- and super-scripts. Postpone explaining the programming mode for later. I think that such a scheme will lower the cognitive load on the new user.

Will such a system work? Yes, it will. In fact, it already does. For about an year now, ConTeXt has a \asciimode macro that implements all these features, with a slight twist. % is also a normal letter and you need to type %% to get a line comment (and %{}% if you really need the output %%). This macro is not enabled by default. I think that making it default will simplify understanding TeX for the first time. As an added advantage, it will also make the job of sanitizing the input simpler for converters (such as pandoc) that convert some other markup language to TeX.

5 thoughts on “Some thoughts on lowering the learning curve for using TeX (part I)

  1. I’d agree with most of this: certainly aims for LaTeX3 too! I’ve got a couple of minor comments.

    First, not everyone wants to use UTF-8 sequences all of the time, and so there is a case for “~”, at least, as ‘special’. For the odd accent, using macro input is also easier in some cases. (In the LaTeX world, we also can’t assume UTF-8 in any case, so things are a bit different.)

    Secondly, I’m not clear on what you mean about “$”. You can set it as math-shift, then “\let” it to an implicit token, then change the catcode to ‘other’. The implicit token can then be used ‘wrapped up’ inside a macro (such as “\(” and “\)” in LaTeX), while “$” can be used directly.

    • Joseph, you are right regarding the math shift operator. I thought that such a trick will not work for display math, but it does. So, it is possible to get rid of $ in pdftex as well.

      I am not convinced about using macros for accents and ~ for non-breakable space. Or more precisely, I am not convinced that when talking about accents, a TeX introduction should start with accent macros. Rather it should talk about setting input encoding (or regimes as they are called in ConTeXt) or, in the case of UTF engines, state that the input encoding in utf-8. And only then should it say describe the macros for accents (and non-breakable spaces).

      • For beginners, I’d entirely agree: it’s very much the experience we (UK0-TUG) have on our ‘LaTeX for beginners course’.

    • You are being sarcastic, right? I use a single TeX source to generate multiple versions of the same document (print and screen version with different content and layout) that is hyper-linked, has syntax highlighted code snippets, a cross referenced bibliography and table of contents, and automatic scaling of images depending on the image dimensions and page size (if the image is smaller than a threshold, don’t scale it; it the image is between two thresholds, scale it to fit in the page width the the size remaining on the current page, if it is larger, scale it to fit in page width and page height.) Not to mention proper hyphenation and sub font protrusion). Last time I checked, Word could not even a fraction of this correctly.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s