After writing about the likely origins of IBM code page 852, I thought I should revisit the homegrown Czech alternative solution, the Kamenický brothers encoding and their keyboard driver. Its existence is well documented, and the so-called (somewhat misnamed) KEYBCS2 encoding even has its own Wikipedia article. The encoding itself lives on in various conversion tables, and utilities to convert text to or from the Kamenický encoding are easy enough to locate. Sometimes the encoding is also called MJK—the initials of its authors, Marian and Jiří Kamenický.
But finding the actual KEYBCS2 utility turned out to be ridiculously difficult. I scoured the Internet for it. I could not find it. At all. I found a fair amount of text talking about it, but not the actual utility.
In desperation, I started searching my NAS. I must have had the utility in the early 1990s, but after I switched to primarily using OS/2 in the mid-1990s, the DOS keyboard driver wasn’t all that useful, and OS/2 had its own reasonably well functioning support using CP852 (compatible with the built-in DOS support).
After much searching, I found an archive with KEYBCS2.EXE dated 07/27/90 on my NAS. Sadly, all my attempts to run it ended up in failure:
Obviously I was not trying to debug the program, but I was forced to do so.
Looking at KEYBCS2.EXE in a hex editor showed no obvious strings; in fact it all looked pretty random. But examining the memory of a VM showing the silly error message revealed a suspicious looking string:
Program LOCK Version 1.10 (C) 1989 J.Belonoznik. Okay, some daft anti-debugging program wrapper, but why does it think I’m debugging it when I’m not?
After analyzing the “program lock”, I discovered that it inadvertently prevents “locked” programs from running on any halfway modern CPU. The last processor generation it worked on is the 486, but on a Pentium or newer it always fails with a “Debugging is not allowed” message.
The reason for that is (unsurprisingly) debugger detection. The lock code detects modifications to itself, which presumably catches any INT3 breakpoints. And it detects single-stepping by using self-modifying code. It looks approximately (not exactly) like this:
xor ax, ax lea bx, mod_code mov byte ptr [bx], 90h ; NOP opcode jmp short $+2 mov byte ptr [bx], 40h ; INC AX opcode mod_code: nop ; Modified instruction cmp ax, 1 ; Is AX zero or one?
This code relies on the existence of a software-visible prefetch queue on old x86 CPUs. Up to and including the Intel 486, the code will execute a NOP instruction. The second MOV overwrites the instruction in memory, but the next instruction is already in the prefetch queue and will be executed as it was after the jump which flushed the prefetch queue.
On the Pentium and later CPUs, the processor detects a write to code which is currently in the prefetch queue, flushes the queue, and executes the modified code. With effort, the detection could be fooled on a Pentium (not Pentium Pro), because it goes by linear rather than physical address, but it’s quite good enough to detect this instance of self-modifying code.
As a consequence, the “debugger detection” is guaranteed to fail on Pentium and later processors. Obviously the code was written in 1989, if not earlier, and its author merely failed to predict the future.
The upshot is that the “anti-debugging” code serves as a rather effective timebomb which can not be worked around by simply setting the clock back.
Unlocking Program LOCK
I considered various approaches to making the protected KEYBCS2.EXE run on systems with CPUs newer than a 486. The protection is good enough that patching the protected executable is difficult. It checks its own integrity and uses XOR chains to decrypt the executable, which means that patching a byte here and there is quite involved.
I considered writing some kind of a TSR that works around the protection by intercepting the INT 21h calls printing the error message, or a timer hook watching for the case where the code hangs itself. It didn’t seem worth the effort.
So… I considered the likelihood that the original protected program was a DOS .COM file (quite high). In that case, maybe after after the “program lock” shell is done, it reconstructs the original .COM image in memory. So I ran KEYBCS2.EXE, saved the memory contents, chopped off what didn’t seem to belong, and re-saved what was left as KEYBCS2.COM. It worked.
But first I manually got past the debugger checks. Ironically, using a debugger. All I needed to do was to let the program execute a
JMP $ instruction to hang itself, patch it to change to
JMP $+2 (by simply setting the second byte to zero), and continue. There were two instances of those jumps, each being hit several times; that is why the message was printed five times total:
As it turns out, KEYBCS2 is a fairly fancy keyboard driver, and it takes up 10KB memory–not an insignificant amount. One reason for the relatively high memory usage is a nifty configuration menu that can be popped up with Ctrl+Shift+F1:
The KEYBCS2 utility can dynamically switch between several possible keyboard layouts. “Standard” is untranslated BIOS input, while “national” is Czech or Slovak keyboard layout. But there is also a “combined” layout, really two layouts at once with the Ctrl key shifting to the “other” layout. There’s even a dedicated keyboard layout for drawing “graphics”:
There really are lots of features, and that needs memory.
What happened to KEYBCS1?
The whole time I kept thinking, was there ever KEYBCS1? I found the answer in the online help for a KEYBDEF.EXE utility (found in the same archive as KEYBCS2.EXE) which allows users to define their own keyboard layouts for the KEYBCS2 driver:
Indeed there was a KEYBCS1 utility. It was the same as KEYBCS2 but without the pop-up configuration screen, and therefore presumably smaller. There was also a corresponding KEYBSL1 and KEYBSL2 utility which used the Slovak keyboard layout by default.
It is clear that calling the Kamenický encoding “KEYBCS2” is a bit silly because it is equally the KEYBCS1, KEYBSL1, and KEYBSL2 encoding. Note that the utility in fact refers to itself as “KEYBxxy” in its online help.
Also note that some sources incorrectly claim that KEYBCS2 was a newer version of a supposed KEYBCS utility; it was not, because KEYBCS1 and KEYBCS2 were different variants of the same utility.
The way the Kamenický encoding was designed was far from random. It was first put together circa 1986, in an era when VGA did not yet exist and EGA was a high-end adapter. Most PCs used MGA, CGA, or Hercules cards.
The problem with those adapters was that apart from the Hercules Plus (only released in 1986), they all had fixed fonts in ROM. While replacing the character generator EPROM was often possible with many of those cards, it wasn’t always an option for end users, and not everyone had the required equipment to begin with.
The Kamenický encoding was therefore chosen such that national characters were placed in locations where they more or less closely corresponded to visually similar glyphs in the standard IBM PC code page 437. That has two practical advantages: First, Czech or Slovak language text is legible (if ugly) even without customized fonts. Second, the Kamenický encoding preserves all of the the CP437 line drawing characters, unlike CP852.
Compare this screenshot of the once hugely popular Norton Commander when run with the Kamenický encoding
with a screenshot of the same Norton Commander running with PC Latin 2 (CP852) encoding:
CP852 sacrificed some of the line-drawing characters in order to cover more languages. While that is a very reasonable compromise, most users only cared about a single language and preferred undisturbed line graphics instead.
The encoding design also made sorting and case changes somewhat more difficult, because there was no simple algorithmic ordering of the national characters. But that was a problem solvable in software, whereas getting the right characters displayed in the first place wasn’t necessarily just a question of software.
But Wait, There’s More!
An important point to note is that KEYBCS2 (indeed the entire KEYBxxy family) makes no attempt to do anything with screen fonts; it in fact does not necessarily require national screen fonts at all, as explained above.
Then again, users with EGA or VGA hardware would have obviously wanted to use proper fonts. Since KEYBCS2 offers no help, they must have used something else.
In an old document, I found a mention of an EGASET utility that was reportedly used together with KEYBCS2. I quickly discovered that EGASET is not a terribly unique name. Then I found the right one in a most unexpected place, in an archive called CZHPFNT.ZIP.
It is a mystery where the CZHPFNT archive originated, but it’s pretty clear when: April 1989. Imagine my surprise when I found an utility called KEYBCS3.COM included:
It is obviously an older version (the file timestamp is July 1987, the program itself indicates May 1987) of my KEYBCS2. I have no idea why it is called KEYBCS3, but it’s not just a renamed file; the utility clearly calls itself KEYBCS3.
I could not find any mention of KEYBCS3 anywhere. It is also unclear what the ‘3’ means; it presumably means something, but the newer online help from 1990 only mentions KEYBCS1 and KEBYCS2 and the KEYBCS3 utility itself offers no hints.
But that wasn’t the only surprise in the CZHPFNT archive. When I ran the actual EGASET utility that I had been searching for, it showed
Copyright IBM Roece Inc. 1986 — that is the same IBM ROECE (Regional Office for Europe Central and East) mentioned in a previous post:
Here’s a menu that the EGASET utility pops up in response to an Alt + Right Shift key combination:
It appears that EGASET is really a generic EGA tweaking utility which happens to allow loading fonts. IBM ROECE probably ended up using it because that was the first thing they found, not because it’s the most obvious method of overriding EGA fonts.
For something with an IBM name on it, the EGASET utility is rather mysterious. Then again, if it was only distributed by IBM ROECE, it probably ended up almost exclusively behind the Iron Curtain where most of the circa pre-1991 computing history almost completely vanished.
The supplied CSEGA14.FNT font is dated January 1988. The CZHPFNT archive very strongly suggests that IBM ROECE did offer some sort of national language support for at least some countries of the Eastern Bloc, although what exactly that support looked like is very unclear. It is not even clear if IBM ROECE had anything to do with the CZHPFNT archive beyond being involved with the EGASET utility. The included font only contains raw font data, and the archive’s README.TXT offers no clues either.
Because finding the KEYBCS2 utility was so insanely difficult, I made it available, together with several others, here. The original no-longer-working KEYBCS2.EXE is included, together with “unlocked” KEYBCS2.COM and KEYBDEF.EXE. Several alternative keyboard and/or display drivers are included; of those, CZECH.EXE is the newest and fanciest, with minimal conventional memory footprint and keyboard/display support for Kamenický, CP852, CP1250, and ISO 8859-2 encodings.
Although DOS 5.0 and later came with perfectly functional built-in keyboard and font support, it was limited to CP852; the third-party utilities tend to be significantly more capable and flexible while needing less memory.