Over the last few weeks I did a bit of tinkering on a hobby software project. The code was written by two other people during the last several years and I finally managed to find some time to fix a couple of bugs.
The project is writing a functional clone of WGML 4.0 aka Watcom SCRIPT/GML, a text processor that came out of mainframe SCRIPT tradition (which goes back to 1968 or so). WGML is used by the Open Watcom project to produce documentation in a large number of formats: Windows and OS/2 native help files, homegrown Watcom help format, HTML, CHM, and last but not least PostScript/PDF.
Several slightly different WGML executables exist, and run either under OS/2 or 32-bit extended DOS. Running those on modern systems is possible (via DOSBox and such), but quite painful, and creates undesirable dependencies. The source code to WGML was presumably lost decades ago.
“Why don’t you just convert the documentation to some other format” is what a lot of people said, and when asked to show how, the discussion was over. Because it’s actually not that easy. I’m sure it’s not impossible but it’s much harder than it seems. As it turns out, SCRIPT includes a fairly flexible language with macros, variables (symbols), built-in functions, conditionals, and loops, and the documentation (several thousand pages) makes use of all that.
So we’re looking at either a huge amount of extremely boring and mind-numbing work converting the documentation source files, or a huge amount of interesting and challenging work rewriting the WGML text processor. Given that no one is paid for it, only the latter is actually a realistic option.
A number of years later we’re at a point where the rewritten WGML can process nearly all of the Open Watcom documentation and produce output that in many cases perfectly matches the output of old WGML 4.0, although bugs still remain. In the process of fixing bugs, it’s necessary to often re-process existing documents to make sure nothing broke. And that got boring because the rewritten WGML wasn’t all that fast.
Now, fast or slow is always relative. The new WGML was, I believe, always faster than the old DOS version. Which means that processing a 1,000 page manual might take under 10 seconds, but there is about a dozen of such manuals. And they get built in a handful of different formats, which means the time really adds up.
So I thought I’d look into where the time was being spent. The first culprit turned out to be symbol lookup.
The symbols or variables in SCRIPT work somewhat like makefile macros. They might be used like this:
:set symbol="company" value="Open Watcom". :set symbol="watc" value="&company C/C++". A manual for &watc..
After the symbols get substituted, the result will be
A manual for Open Watcom C/C++.
An interesting catch is that if a symbol is not found in a dictionary, nothing gets replaced. Hence a code fragment such as
aptr = &array;
will be output without changes, assuming that a symbol named
array does not exist.
The newly implemented WGML was using simple linked lists for all symbol dictionaries. There are, roughly speaking, local symbol dictionaries for macros and included files, and there is a global dictionary. The local dictionaries tend to be quite small, typically with under five symbols in them, sometimes even none. They also usually don’t need to satisfy a lot of lookups.
The global dictionary is a different story. On a larger document (the C library reference), there were 406,027 symbol lookups against the global dictionary. There dictionary was eventually filled up with almost 200 symbols, and not surprisingly, repeatedly searching such a list is not fast. To satisfy those 406,027 lookups, the code needed to perform 43,697,703 string comparisons.
I set out to replace the global dictionary linked list with a hash table. However, I decided to leave the local dictionaries alone, as the overhead of extra memory and hashing didn’t seem worthwhile.
With a hash function borrowed from the C compiler, and using a hash table with 241 entries, the number of string compares went from down 43,697,703 to 681,408. To put it differently, the linked list implementation needed on average to go through a little over 100 items before it found (or didn’t find) the desired symbol. In contrast, the hash table implementation on average needs to go through under 1.7 items. That is a huge difference.
The effect unfortunately was not as tangible as I expected. The change shaved around 5% off the execution time.
But it made other bottlenecks easier to see: Macro lookup and SCRIPT keyword lookup. When script encounters an input line such as
it first checks whether there is a macro called
br. If there isn’t, SCRIPT checks whether
.br is a valid control word (which it is) and executes it.
The Open Watcom documentation is rather macro heavy. There is just one global macro dictionary, and when processing my test document, it needed to satisfy 2,321,499 macro lookups. This led to several hundred million string comparisons in the original linked list implementation.
At first I replaced the macro dictionary hashing with the same code as the symbol lookup. And that worked… mostly. Some documents didn’t build and at least one regression test failed.
It turns out that there’s an oddity related to macro name length. WGML only considers up to 8 characters when looking up macros. That required a change to the hashing function so that it not only stops at the end of a string, but also after 8 characters if the input string is longer.
With this adjustment, WGML worked again and to satisfy 2,321,499 macro lookups, only 1,873,022 string comparisons were needed, i.e. about 0.8 string comparisons per lookup.
That may sound strange, but there is a simple explanation: Many of the macro lookups search for overridden SCRIPT control words, but there is no override. The corresponding hash table entry is empty, and there is no string to compare.
The macro dictionary modification had a significant effect, and reduced the WGML run time by about 20%.
Control Word Lookup
The next item on the list was SCRIPT control word lookup. There are over 120 control words and the existing code linearly searched through an array. This, again, was not fast.
The SCRIPT control words, with a few exception, consist of two alphabetic characters. Instead of hashing, I decided to use a direct lookup table. A 26 × 26 table holding one-byte indices into the control word array does most of the word. The odd cases are control words
... (used for labels). These are handled specially in the lookup function.
Changing the SCRIPT control word lookup reduced the run time by another 5% or so. A small but worthwhile change.
Old vs New
This isn’t really a criticism of the original rewritten WGML code. With such a project, it is very difficult to guess at the outset where the performance bottlenecks are going to be. Trying to optimize rarely used code is not useful. It’s much easier to get the code working and then analyze what needs speeding up.
After my round of improvements, I thought it’d be interesting to compare the performance of rewritten WGML against the original. I used a 32-bit Windows 7 VM for that, since the old DOS WGML can run directly there.
I chose one of the largest documents, an online help file combining a number of separate manuals. The old WGML 4.0 needed about 12.3 seconds to process the manual. The optimized rewritten WGML took 2.9 seconds to process the same manual.
That’s quite a difference! The new WGML, at least in some cases, needs a little under 25% of time compared to the old one. The old WGML might incur some NTVDM overhead which could skew the comparison somewhat, then again that’s not really avoidable so it’s still a fair comparison.
The new WGML now feels noticeably faster… because it is.