KEYBCS2

Posted on February 18, 2022 by Michal Necasek

After writing about the likely origins of IBM code page 852, I thought I should revisit the homegrown Czech alternative solution, the Kamenický brothers encoding and their keyboard driver. Its existence is well documented, and the so-called (somewhat misnamed) KEYBCS2 encoding even has its own Wikipedia article. The encoding itself lives on in various conversion tables, and utilities to convert text to or from the Kamenický encoding are easy enough to locate. Sometimes the encoding is also called MJK—the initials of its authors, Marian and Jiří Kamenický.

But finding the actual KEYBCS2 utility turned out to be ridiculously difficult. I scoured the Internet for it. I could not find it. At all. I found a fair amount of text talking about it, but not the actual utility.

In desperation, I started searching my NAS. I must have had the utility in the early 1990s, but after I switched to primarily using OS/2 in the mid-1990s, the DOS keyboard driver wasn’t all that useful, and OS/2 had its own reasonably well functioning support using CP852 (compatible with the built-in DOS support).

After much searching, I found an archive with KEYBCS2.EXE dated 07/27/90 on my NAS. Sadly, all my attempts to run it ended up in failure:

Obviously I was not trying to debug the program, but I was forced to do so.

Looking at KEYBCS2.EXE in a hex editor showed no obvious strings; in fact it all looked pretty random. But examining the memory of a VM showing the silly error message revealed a suspicious looking string: Program LOCK Version 1.10 (C) 1989 J.Belonoznik. Okay, some daft anti-debugging program wrapper, but why does it think I’m debugging it when I’m not?

Unplanned Obsolescence

After analyzing the “program lock”, I discovered that it inadvertently prevents “locked” programs from running on any halfway modern CPU. The last processor generation it worked on is the 486, but on a Pentium or newer it always fails with a “Debugging is not allowed” message.

The reason for that is (unsurprisingly) debugger detection. The lock code detects modifications to itself, which presumably catches any INT3 breakpoints. And it detects single-stepping by using self-modifying code. It looks approximately (not exactly) like this:

        xor  ax, ax
        lea  bx, mod_code
        mov  byte ptr [bx], 90h ; NOP opcode
        jmp  short $+2
        mov  byte ptr [bx], 40h ; INC AX opcode
mod_code:
        nop           ; Modified instruction
        cmp  ax, 1    ; Is AX zero or one?

This code relies on the existence of a software-visible prefetch queue on old x86 CPUs. Up to and including the Intel 486, the code will execute a NOP instruction. The second MOV overwrites the instruction in memory, but the next instruction is already in the prefetch queue and will be executed as it was after the jump which flushed the prefetch queue.

On the Pentium and later CPUs, the processor detects a write to code which is currently in the prefetch queue, flushes the queue, and executes the modified code. With effort, the detection could be fooled on a Pentium (not Pentium Pro), because it goes by linear rather than physical address, but it’s quite good enough to detect this instance of self-modifying code.

As a consequence, the “debugger detection” is guaranteed to fail on Pentium and later processors. Obviously the code was written in 1989, if not earlier, and its author merely failed to predict the future.

The upshot is that the “anti-debugging” code serves as a rather effective timebomb which can not be worked around by simply setting the clock back.

Unlocking Program LOCK

I considered various approaches to making the protected KEYBCS2.EXE run on systems with CPUs newer than a 486. The protection is good enough that patching the protected executable is difficult. It checks its own integrity and uses XOR chains to decrypt the executable, which means that patching a byte here and there is quite involved.

I considered writing some kind of a TSR that works around the protection by intercepting the INT 21h calls printing the error message, or a timer hook watching for the case where the code hangs itself. It didn’t seem worth the effort.

So… I considered the likelihood that the original protected program was a DOS .COM file (quite high). In that case, maybe after after the “program lock” shell is done, it reconstructs the original .COM image in memory. So I ran KEYBCS2.EXE, saved the memory contents, chopped off what didn’t seem to belong, and re-saved what was left as KEYBCS2.COM. It worked.

Running KEYBCS2

But first I manually got past the debugger checks. Ironically, using a debugger. All I needed to do was to let the program execute a JMP $ instruction to hang itself, patch it to change to JMP $+2 (by simply setting the second byte to zero), and continue. There were two instances of those jumps, each being hit several times; that is why the message was printed five times total:

As it turns out, KEYBCS2 is a fairly fancy keyboard driver, and it takes up 10KB memory–not an insignificant amount. One reason for the relatively high memory usage is a nifty configuration menu that can be popped up with Ctrl+Shift+F1:

The KEYBCS2 utility can dynamically switch between several possible keyboard layouts. “Standard” is untranslated BIOS input, while “national” is Czech or Slovak keyboard layout. But there is also a “combined” layout, really two layouts at once with the Ctrl key shifting to the “other” layout. There’s even a dedicated keyboard layout for drawing “graphics”:

There really are lots of features, and that needs memory.

What happened to KEYBCS1?

The whole time I kept thinking, was there ever KEYBCS1? I found the answer in the online help for a KEYBDEF.EXE utility (found in the same archive as KEYBCS2.EXE) which allows users to define their own keyboard layouts for the KEYBCS2 driver:

Indeed there was a KEYBCS1 utility. It was the same as KEYBCS2 but without the pop-up configuration screen, and therefore presumably smaller. There was also a corresponding KEYBSL1 and KEYBSL2 utility which used the Slovak keyboard layout by default.

It is clear that calling the Kamenický encoding “KEYBCS2” is a bit silly because it is equally the KEYBCS1, KEYBSL1, and KEYBSL2 encoding. Note that the utility in fact refers to itself as “KEYBxxy” in its online help.

Also note that some sources incorrectly claim that KEYBCS2 was a newer version of a supposed KEYBCS utility; it was not, because KEYBCS1 and KEYBCS2 were different variants of the same utility.

Encoding Design

The way the Kamenický encoding was designed was far from random. It was first put together circa 1986, in an era when VGA did not yet exist and EGA was a high-end adapter. Most PCs used MGA, CGA, or Hercules cards.

The problem with those adapters was that apart from the Hercules Plus (only released in 1986), they all had fixed fonts in ROM. While replacing the character generator EPROM was often possible with many of those cards, it wasn’t always an option for end users, and not everyone had the required equipment to begin with.

The Kamenický encoding was therefore chosen such that national characters were placed in locations where they more or less closely corresponded to visually similar glyphs in the standard IBM PC code page 437. That has two practical advantages: First, Czech or Slovak language text is legible (if ugly) even without customized fonts. Second, the Kamenický encoding preserves all of the the CP437 line drawing characters, unlike CP852.

Compare this screenshot of the once hugely popular Norton Commander when run with the Kamenický encoding

with a screenshot of the same Norton Commander running with PC Latin 2 (CP852) encoding:

Broken double/single box transitions in CP852

CP852 sacrificed some of the line-drawing characters in order to cover more languages. While that is a very reasonable compromise, most users only cared about a single language and preferred undisturbed line graphics instead.

The encoding design also made sorting and case changes somewhat more difficult, because there was no simple algorithmic ordering of the national characters. But that was a problem solvable in software, whereas getting the right characters displayed in the first place wasn’t necessarily just a question of software.

The same kind of approach of minimally modifying CP437 was also (independently?) used by the Hungarian CWI-1/CWI-2 encoding, and to a lesser extent by the Polish Mazovia encoding.

But Wait, There’s More!

An important point to note is that KEYBCS2 (indeed the entire KEYBxxy family) makes no attempt to do anything with screen fonts; it in fact does not necessarily require national screen fonts at all, as explained above.

Then again, users with EGA or VGA hardware would have obviously wanted to use proper fonts. Since KEYBCS2 offers no help, they must have used something else.

In an old document, I found a mention of an EGASET utility that was reportedly used together with KEYBCS2. I quickly discovered that EGASET is not a terribly unique name. Then I found the right one in a most unexpected place, in an archive called CZHPFNT.ZIP.

It is a mystery where the CZHPFNT archive originated, but it’s pretty clear when: April 1989. Imagine my surprise when I found an utility called KEYBCS3.COM included:

It is obviously an older version (the file timestamp is July 1987, the program itself indicates May 1987) of my KEYBCS2. I have no idea why it is called KEYBCS3, but it’s not just a renamed file; the utility clearly calls itself KEYBCS3.

I could not find any mention of KEYBCS3 anywhere. It is also unclear what the ‘3’ means; it presumably means something, but the newer online help from 1990 only mentions KEYBCS1 and KEBYCS2 and the KEYBCS3 utility itself offers no hints.

But that wasn’t the only surprise in the CZHPFNT archive. When I ran the actual EGASET utility that I had been searching for, it showed Copyright IBM Roece Inc. 1986 — that is the same IBM ROECE (Regional Office for Europe Central and East) mentioned in a previous post:

Here’s a menu that the EGASET utility pops up in response to an Alt + Right Shift key combination:

It appears that EGASET is really a generic EGA tweaking utility which happens to allow loading fonts. IBM ROECE probably ended up using it because that was the first thing they found, not because it’s the most obvious method of overriding EGA fonts.

For something with an IBM name on it, the EGASET utility is rather mysterious. Then again, if it was only distributed by IBM ROECE, it probably ended up almost exclusively behind the Iron Curtain where most of the circa pre-1991 computing history almost completely vanished.

The supplied CSEGA14.FNT font is dated January 1988. The CZHPFNT archive very strongly suggests that IBM ROECE did offer some sort of national language support for at least some countries of the Eastern Bloc, although what exactly that support looked like is very unclear. It is not even clear if IBM ROECE had anything to do with the CZHPFNT archive beyond being involved with the EGASET utility. The included font only contains raw font data, and the archive’s README.TXT offers no clues either.

Archival

Because finding the KEYBCS2 utility was so insanely difficult, I made it available, together with several others, here. The original no-longer-working KEYBCS2.EXE is included, together with “unlocked” KEYBCS2.COM and KEYBDEF.EXE. Several alternative keyboard and/or display drivers are included; of those, CZECH.EXE is the newest and fanciest, with minimal conventional memory footprint and keyboard/display support for Kamenický, CP852, CP1250, and ISO 8859-2 encodings.

Although DOS 5.0 and later came with perfectly functional built-in keyboard and font support, it was limited to CP852; the third-party utilities tend to be significantly more capable and flexible while needing less memory.

This entry was posted in DOS, I18N, IBM, x86. Bookmark the permalink.

34 Responses to KEYBCS2

Vlad Gnatov says:

February 18, 2022 at 5:44 pm

The external protectors are usually easy to remove, well for exe files it’s a little harder, you need to fix entry point. In most cases there is no need for the debugger, special automated tools works just fine.
I successfully unpacked keybcs2 with cup386 [1] /7 keybcs2.exe
The result is practically identical to yours, but file a little shorter and a few uninitialised bytes set to 0.
The keybdef was a little harder, it hung with ‘Debugging is not allowed’ with any cup386 options. After tracing[2], I discovered that it prints that message with int21/09. There are only a few of these calls in the executable (and they are not even encrypted), so it was trivial to disable that check:
keybdef.exe
e4de 7F 90
e4df 03 90
After that cup386 /7 keybdef.exe worked just fine.

[1]: http://old-dos.ru/files/file_719.html
[2]: http://files.mpoli.fi/unpacked/software/dos/editors/trace27.zip/
Michal Necasek says:

February 18, 2022 at 6:34 pm

That was quick, thanks! I’m not familiar with this tool but it really does work. I was lazy so I ran it on a 486, that didn’t need any extra work.

I guess it is a testament to the “protector” that it still manages to defeat CUP386 with the /1 and /3 switches, and only /7 works.
MiaM says:

February 20, 2022 at 1:30 am

Side track / blog post idea:
It would be nice if you could write something about the other more or less known third party keyboard layout drivers for DOS.

I clearly remember that there were at least one such for Swedish keyboard but I never really used it myself. I have tried searching the net but can’t even reliably find what it was called. My memory says it was called something ANDAN something but that yields zero useful results on google. (The fact that “andan” also is Swedish for “the spirit” (as in the phrase “that’s the spirit”. As a bonus “ändan” is Swedish for “the end” or “the rear” (both for like the end of a hose or so but also a persons butt). Thus plenty of irrelevant search results).

Btw I find/found it a bit strange that it was rather hard to find any way for a user to redefine the keyboard mapping when software loaded keyboard maps were a new thing. A clear sign that computers were moving from being for “tinkerers” to “users”.
ender says:

February 20, 2022 at 3:47 am

I know that there was a ton of different programs for Yugoslavian letters, both for keyboard layout and graphic character set. The Hercules graphic card that was in my father’s 286 had a physical switch in the back to toggle between čČšŠžŽćĆđĐ and ~^{[′@}]|\, and there were several programs that switched to QWERTZ layout.

While I remember using codepage 852 when the 286 was upgraded with a Trident VGA card and DOS 5.0 in ’91 or ’92, the remnants of the old “437” codepage lingered in various data until the early 2010’s (I’ve seen web addresses in one newspaper printed with č instead of ~ very often).
Michal Necasek says:

February 20, 2022 at 12:45 pm

That’s not a bad idea, but I’d need a lot of help. The trouble is that if any mention of such a keyboard driver survived, it is almost by definition all going to be written in the language the driver was designed for. I can likely understand enough Polish and maybe Russian, and perhaps Slovenian or Serbo-Croatian, but very much doubt that I could understand enough Swedish or Norwegian. Even in the languages where I can decipher technical documentation (like Polish), I might not be able to find the right search terms.

Sometimes one might get lucky and find something like this written in English, but that’s rare.

I don’t think the keyboard drivers shipped with DOS had any flexibility whatsoever. I suppose they were defined by IBM for secretaries and professional typists, and the last thing those people wanted was to fight with keyboards that were unpredictably different from standard typewriter layouts. That is in contrast with KEYBCS2 utility; that was clearly intended for programmers who often had major trouble with national keyboards layouts that removed too many useful characters (or made them unreasonably difficult to type).
Vlad Gnatov says:

February 20, 2022 at 1:21 pm

Here is 100+ keyboard/font drivers that were used in xussr: from the first alpha to keyrus monster, with some interesting things like pu_drv between:
http://old-dos.ru/index.php?page=files&mode=files&do=list&cat=94&id=1
Michal Necasek says:

February 20, 2022 at 1:27 pm

That’s amazing. Were most of them so bad, or did every programmer in Russia feel like they had to write their own driver in order to be called a programmer? 😀
Vlad Gnatov says:

February 20, 2022 at 2:03 pm

Haha, I wrote one too. So I guess it’s mostly latter 🙂
I’m speculating again, but I think it’s consequence of ES[*]-culture, which is itself a result of planned economy: it was easier to spend a few hundred/thousand man-hours to write needed software, than to get it from another institution or government department.

[*] https://en.wikipedia.org/wiki/ES_EVM
ender says:

February 20, 2022 at 6:01 pm

Here are a few I collected over the years for Yugoslavian layouts: https://eternallybored.org/misc/YUSCII.7z
vbdasc says:

February 20, 2022 at 6:34 pm

In Bulgaria, during DOS times, there circulated many (probably dozens) utilities which basically did the same thing – let users enter text in one of the two “Bulgarian industrial standard” keyboard mappings – BDS (typewriter standard; may have been adopted by IBM too) and Phonetic (unofficial, but widely used) AND reprogram the EGA/VGA screen fonts for the unofficial MIK code page. And the latter part was essential because maybe >99 percent of all DOS software ever made in Bulgaria used MIK. I still have maybe four or five of these utilities and can provide them if anyone is interested.

Actually, in 2007 I myself made such an utility. Because the company where I worked was selling some DOS software for the Bulgarian market (don’t laugh!), and all available MIK utilities were copyrighted, and the company was not willing to pay to license any of them 🙂
Michal Necasek says:

February 20, 2022 at 7:55 pm

Was part of the problem that DOS supported Bulgaria too late and/or poorly? I see that MS-DOS 6.22 supported Bulgaria but maybe not in the default install (needed KEYBOARD2.SYS and EGA3.CPI?).

I’d be interested in those old utilities for sure. If nothing else, I’d want to check how they work, because not all of them are done in the same way (judging from what I’ve seen so far).
Michal Necasek says:

February 20, 2022 at 8:05 pm

Thanks!

Wow, YUSCII is “something else”. I can see that in many ways, CP852 was actually an improvement! I don’t think any of the other homegrown encodings I’ve come across so far were 7-bit.
ender says:

February 20, 2022 at 8:39 pm

AFAIK, Finnish used something similar, which is why the IRC protocol still treats | and \ (along with a few other characters) in nicknames as equal.

And yes, the DOS prompt on the first computer we had at home was C:Đ>.
Yuhong Bao says:

February 21, 2022 at 10:43 am

“Actually, in 2007 I myself made such an utility. Because the company where I worked was selling some DOS software for the Bulgarian market (don’t laugh!),..”
I wonder why you didn’t switch codepages.
vbdasc says:

February 21, 2022 at 7:40 pm

@Michal Necasek: The problem with the DOS support of Bulgarian was that it didn’t include the “industry standard” MIK codepage, supporting the IBM 855 codepage instead, which nobody used, and also it had no support for the Bulgarian Phonetic keyboard layout.

On the same note, Windows up to and including Windows XP had no support for MIK codepage and Phonetic keyboard layout either, which gave rise to a prolific software industry for tools for “bulgarizing” Windows too 🙂 In fact, the MIK code page was never supported by any version of Windows, but the sunset of DOS software at last dealt with the need of it. As for the Bulgarian Phonetic keyboard layout, it began to be supported by Windows Vista, and began to be supported properly by Windows 7, which killed the aforementioned software industry for good 🙂

I managed to find several of the utilities I promised. They’re at http://ekom2002.com/MIKTOOLS/MIKTOOLS.ZIP . I hope they’re all abandonware by now. They all work well in a full-screen NTVDM, screen fonts and all. Have fun with them 🙂
vbdasc says:

February 21, 2022 at 7:52 pm

@Yuhong Bao: “I wonder why you didn’t switch codepages.”

It’s not as easy, when everyone expects your exported/imported data to be encoded in a certain codepage, when half of the end user’s computers have already installed some software that might conflict with your choice of codepage, and anyway, a habit is a powerful thing.
Michal Necasek says:

February 21, 2022 at 8:15 pm

Thanks for the tools!

I guess the situation in Czechoslovakia (before and after the split) was much simpler. While CP852 was somewhat unpopular in DOS, it was acceptable and there was just one standard keyboard layout. In Windows, the CP437 compatibility wasn’t really an issue and official Microsoft support started turning up in 1992 (Windows CEE) / 1993 (localized Czech Windows 3.1). I think there were some tools to “Czechify” Windows 3.0/3.1 in the early 1990s, but that soon became pointless because all Windows applications of course expected the built-in Windows support with CP1250.
ender says:

February 21, 2022 at 11:35 pm

> I wonder why you didn’t switch codepages.

Yugoslavian codepage that mapped čČšŠžŽćĆđĐ to ~^{[′@}]|\ was obsolete with DOS 5.0, but data containing these characters was common at least until early 2010’s (and I wouldn’t be surprised if there are still databases lurking with old records using that codepage).
Vlad Gnatov says:

February 23, 2022 at 2:29 pm

> that MS-DOS 6.22 supported Bulgaria but maybe not in the default install (needed
> KEYBOARD2.SYS and EGA3.CPI?).
Out of curiosity, I’ve checked cpi files in MS-DOS 3.3 – 6.22:

MS-DOS 3.3, 4.0 have identical fonts for 437, 850, 860, 863, 865 codepages;
MS-DOS 5.0 adds 852 codepage, other fonts are identical to 3.3, 4.0;
MS-DOS 6.0 changes 852 fonts, other fonts are not changed;
MS-DOS 6.2 adds ega2.cpi with 850, 852, 857, 861, 869, 737 codepages,
850, 852 fonts are the same as in ega.cpi;
MS-DOS 6.22 changes 737 fonts in ega2.cpi and adds ega3.cpi with
437, 850, 852, 855, 866 codepages, 437, 850, 852 are the same as in ega.cpi.

So it seems that Bulgarian(855) and Russian(866) codepages only appeared
in 6.22 (june-1994) in non-localized MS-DOS. Also, you’re right, keybrd2.sys is needed for Bg keys layout.
Michal Necasek says:

February 23, 2022 at 6:28 pm

OK, that explains why the built-in DOS support in Bulgaria never got anywhere. It came way too late, and even then it needed manual effort to set up at all.

Interestingly Russia was one of the first non-Western countries to get localized version of DOS, back in 1989 with MS-DOS 4.01. But that didn’t do much to prevent homegrown solutions from being popular.
Vlad Gnatov says:

February 23, 2022 at 8:25 pm

Well, dos 4.x was quite unpopular OS version, also there were issues with distribution.

p.s. You may find this article interesting: http://rdos401.org/
Michal Necasek says:

February 24, 2022 at 10:22 am

Unpopular maybe, but it at least was there and it was something software developers could target. In many (most, really) countries even that didn’t exist at a time.

Yes, I’ve seen that page (or at least one that looked extremely similar) before. The code page discussions sound so familiar — IBM didn’t want a different code page for each country, but users wanted Norton Commander to look right.
Jason Stevens says:

March 8, 2022 at 7:46 am

Somehow all the hackiness of code pages, and drivers seems to culminate with the straight up ‘port’ of MS-DOS to Russian. http://www.rdos401.org/

I guess what is more interesting is why Russian, and not all the other nations / languages. Not to mention they must have had OAK access at least.. If not more, it seems odd that 1989 Russia would be the choice.
Michal Necasek says:

March 8, 2022 at 11:02 am

AFAIK that was Microsoft/BillG choosing Russia. Doing their bit for the Perestroika or something? I dunno. Also presumably because Russia was by far the biggest of the Eastern Bloc nations.

I’m pretty sure it was done with Microsoft’s help, so yeah they had all the tools they needed. Funny that Russian MS-DOS 4.01 is AFAIK the only DOS version that starts with a big screen telling users not to pirate the software.

I wouldn’t even say code pages were hacky… it was a reasonable solution for the time. The biggest problem really was that in the times before MIME and HTTP and all that, there was no standardized marker identifying the code page of the content.
Richard Wells says:

March 8, 2022 at 8:14 pm

Code pages had been in use with mainframes from several years before being brought over to PC-DOS. It may be a hacky solution but it was IBM’s proven solution. Translating between code pages was the responsibility of the software doing the file transfer. The local system user should know what code page is in use and the remote system will return a list of supported code pages. There wasn’t much point in embedding code page information inside a document since systems outside the IBM ecosystem had their own localized implicit code pages without an IBM number and even within the IBM ecosystem, there were a number of non-standard code pages.
OBattler says:

March 28, 2022 at 5:48 am

Here in Slovenia, we used the YUSCII encoding, and there was a booming industry of EGA and VGA drivers for it, as well as keyboard drives. I have quite a lot of them, from Slovenia and beyond – VGAYU, KY20, SL*.EXE, CHERRY, CROCHERY, FOGGS, etc. I just can’t find EGAYU.COM anywhere. Someone even made an EGA_999.CPI, COUNTRY.SYS, and KEYBOARD.SYS that had YUSCII as code page 999, and I have those files somewhere as well, I believe it was SAOP d.o.o. from Sežana that made that.

DOS programs using YUSCII here predominant until the early 2010’s when finally everyone began migrating to the Windows versions.

There was also an Atari character set for Slovenian, and I have somewhere an old utility called KONVERZ.EXE that came with the WIN.INI magazine here (the biggest computer magazine here back in the 90’s), that could handle Atari, YUSCII, CP 852, and CP 1250 and convert to/from any of them. It was for DOS.

Also, from my research, I can tell you that character set support was a mess in a lot of other places:

– For Traditional Chinese, you either had to use localized IBM PC DOS which used IBM’s proprietary code page 938 that noone else used, or you had to use E-Ten that used the Big5 standard;

– For (South) Korean, there were three competing character sets – IBM code page 934, KS C 5601 a.k.a. Wansung that in MS-DOS 5.x and earlier, Microsoft had misleading given the code page number 934, but reassigned it to the now standard 949 in 6.x, and the later Johab character set (now known as code page 1361), I have numerous Korean display drivers for DOS (that they call Hangul BIOS), but none of them supports the actual code page 934, only Wansung and, later, Johab, also hardware Korean support was usually only Wansung (there were Korean Hercules, EGA, MCGA, and VGA cards). I’m also yet to get my hands on any copy of IBM PC DOS Korean earlier than 6.3, and by 6.3, IBM had also already switched to Wansung/Code page 934;

– Japanese was a mess of its own, with every vendor adding its own extension to code page 932 / Shift-JIS, and support varying from entirely incompatible architectures (PC-98, FM-Towns) to weird IBM compatibles with hardware Japanese support (PCjx, Toshiba J-3100, PS/55, PC AX standard, etc.) and in the VGA era, VGA display drivers for standard IBM compatibles (DOS/V but also a driver that came with later MS-DOS 5.00a for AX that emulated AX display on a (S)VGA, and some 3rd party drivers as well);

– For Russian, it ranged from built-in support in DOS to a lot of 3rd party drivers, the most famous of which being KEYRUS which dealt with both keyboard and display;

– For Vietnamese, there was VIETDOS that ended up having to sacrifice every single line drawing character and some beyond as well, in order to fit in all of the Vietnamese diacritics combinations;

– For Thai, the most famous was the Hercules (it was originally created to support Thai in DOS!), but later, there was a Thai edition of MS-DOS 6.22 that was English with Thai support, and it basically emulated text mode in graphics mode, which is the only way to do all the combining characters;

– For Farsi/Persian, there was the SEPAND driver that I have somewhere, that handled display, keyboard, and even printer, I believe it used the Iran standard character set that neither MS nor IBM ever officially supported, also Windows was a complete mess as well, with there being two Windows 3.1x versions – the official Windows 3.1 from Microsoft that was dongle-protected(!) and a 3rd party version of Windows 3.11 from SinaSoftt hat was not protected, and was instead based on the Arabic version with minor character set changes, and therefore not at all compatible with the official version;

– For Greek, there were apparently three different code pages – code page 851 that seems to have been very rarely used, and code pages 869 and 737 that were at some point officially added to DOS. When Microsoft localized MS-DOS into Greek with Windows 9x, they made versions for both code pages 869 and 737, and therefore, Windows 9x Greek versions usually had two copies of the OS on them. Also, according to README.TXT of Pan-European Windows 95, there were apparently two versions of Windows 3.1x as well, much like with Farsi/Persian, just with no dongle protection involved – official version with the Microsoft font layer, which I have, and an unofficial version with the Pouliadis font layer, which I’ve yet to get my hands on a copy of;

– For Arabic and Hebrew, Microsoft distributed a driver that essentially emulated text mode in graphics mode – a bit overkill for Hebrew, but I can see why it was needed for Arabic where each character can have up to 4 different forms depending on what is to the left and to the right of it. Also, there were so many different Arabic code page that I’ve lost count of them, off the hand, I can remember 708 (ASMO), 720 (Transparent ASMO), 864 (MS-DOS Arabic), and 868, but there’s others as well;

– Central Europe was known for its variants of code page 437 – Kamenický in Czechoslovakia, Mazowia in polish, and there was a Hungarian code page as well – notably, Windows 3.1 for Central and Eastern Europe comes with a character set converter plugin for the File Manager that can handle all these character sets;

– Latvian, like the rest of the Baltic languages, ended up official using code page 775 in Windows 9x’s DOS (and code page 1257 inside Windows), but that seems to have been a code page invented by Microsoft, as I have an old Latvian DOS driver somewhere that uses an entirely different character set that also supports Russian, and a keyboard layout that allows typing in both Latvian and Russian.
Michal Necasek says:

March 28, 2022 at 5:38 pm

That’s excellent research, thanks. It’s kind of distressing how many of those utilities, once widely used, seem to be completely gone.

The code page mess is kind of amazing. And it seems that the sooner a country started with national support, the bigger the mess, with various incompatible solutions. A while ago I read up on the Korean NLS story and it was quite something. Japan was if anything even worse, with all the incompatible hardware.

It’s interesting to observe the competing motivations; most of the code pages were put together the way they were put together for good reasons, but everyone’s priorities were different. And of course then there was the self-inflicted pain, like Microsoft using different official code pages in DOS and Windows. Part of the general problem seems to have been that the people in Redmond didn’t have much of a clue (and why should they?) while the various homegrown solutions were very well tailored to the local needs.

The Polish encoding was Mazovia, Hungarian was CWI (mostly CWI-2 I guess). FYI, there were 3rd party kits for Czech/Slovak NLS that could be installed in Windows 3.1 but also Windows 3.0. I assume Poland/Hungary/etc. had similar solutions before Microsoft got moving, perhaps even in the Windows 2.x days. At least one of the Czech packages was called ‘CRC Type 2.0’ which, as you can imagine, is a terrible search term. Mentioned here, page 22 in PDF. The article alludes to other solutions, but does not name any, and says that CRC Type 2.0 was probably the best of them.

And yes, the Hercules card was initially created to enable Thai support, beating IBM’s EGA to soft fonts by about two years.
Yuhong Bao says:

March 29, 2022 at 8:57 am

On codepage 864:
http://archives.miloush.net/michkap/archive/2006/04/22/580636.html
Pingback: WordSet: Stolen Without Compensation | OS/2 Museum
VøkØng says:

December 6, 2025 at 1:21 am

Dot matrix printing was always a hoot. Norwegian/Danish characters æøåÆØÅ was normally mapped into [|]{} in 7-bit devices (if I remember correctly). Same as for 7-bit text terminals character sets. So if you wanted sensible letters on paper you had to get new (e)proms for the device. So if you programmed on a terminal C source code would always look a bit funky… Pretty cool if you got a mail between two PCs through a X.400 system too, where any intended [|]{} character ended up as an æøåÆØÅ character because of “helpful”convertions.

In DOS the situation became better, only øØ was missing in the default set. One could get by ignoring the odd Cent and Yen signs on the screen, which are the two characters that would be replaced by ø and Ø in cp865 (again, by memory). And their replacement did not screw up any of the line drawing (TUI) characters.

There was resident programs available to fix some of the mess, but I do not remember names for them. Also, some utils existed for the Sami language, a minority indigenous language group in some northern parts of Europe (Norway/Sweden/Finland/Russia). I do no speak that language but at least one of the utils/systems to fix the situation is named SAMTAST (from memory, for any interest or reference). According to an official document I came across while checking online, that was still considered a problem at the highest levels of Norwegian government in 2002.

Correct alphabetical sorting of strings was a problem in a lot of imported software for many years. I expect that problem could have been even more extensive in other countries.
OBattler says:

February 16, 2026 at 8:02 am

In the mean time, more has been uploaded to Archive.org – the “Računalniško opismenjevanje” (“Computer literacy gaining”) CD has a utility that adds 3rd party Slovenian (YUSCII?) support to Windows 3.x, also, the prevalence of non-CP1250 Windows versions even in the mid to late 90’s caused the WIN.INI magazine to use WONDER.FON for their menu, which was basically CP1250 MS Sans Serif with another name.

Also uploaded to Archive.org has been Greek for Windows 3.1 by Pouliadis, which predates Microsoft’s officia Greek version of Windows 3.1 by an entire year. This is the very “Pouliadis font layer” mentioned in Pan-European Windows 95’s README.TXT. And it appears that the ISO 8859 encoding was also used in DOS alongside CP 737 and CP 869 (and the rarely used CP 851), as CP 928 (ELOT).

And, of course, a Lithuanian localization of OS/2 Warp 3.0, apparently, it was done by a 3rd party in Lithuania.

Also, it turns out 3rd party DOS support existed for a variety of languages, such as Georgian (KEYGRU – modification of KEYRUS), Armenian (including a 3rd party unofficial Armenian localization of Norton Commander 2.0 that a YouTube video at least used to exist of), Kazakh, Ukrainian (KEYUKR, etc.), and so on.

And it also turns out that there was a mess when it came to Simplified Chinese as well – it turns out, there are, in fact, two different versions of code page 936 – the Microsoft / GBK version, and an IBM version used a layout similar to code pages 932 (Japanese), 934 (IBM Korean), and 938 (IBM Traditional Chinese).

And IBM managed to create a further mess later on with their versions of CP 850, 852, etc. with added Euro sign support, and assigned ID’s such as 858 and so on.

And the YUSCII rabbit hole appears to go deeper – there were also TWO cyrillic variants of it, one for Serbo-Croatian, and one for Macedonian. Both 7-bit. Granted, the USSR also had its own venture in 7-bit with KOI-7 though KOI-8 and later CP 866 prevailed.

The big question mark here is India – ISCIII (Indian Script Code for Information Interchange) does exist, but I wonder if any DOS display and keyboard ever existed for Devanagari, etc.

I also wonder if in North Korea, they ever had DOS, Windows 3.x or Windows 9x support for their own KPS encoding, that would be interesting to get our hands on.

And finally, we have the Baltic sea, where, it turns out, there were a LOT of home-grown code pages, and Microsoft ended up simply choosing 775 for Batlic support in Windows 9x DOS because it was apparently the most popular.

And Turkish is a bit of a mystery as for why CP 853 (Latin 3, which also had Esperanto characters) was abandoned in favor of CP 857 (Latin 7). And it also turns out Latin 4 is a mystery – even ISO 8859 appears to have had 8859-4 struck out and I have no idea what characters it *actually* used to contain.

And, for some reason, modifying the upper 128 characters of ASCII was done in East Asia as well – eg. the backlash becomes the Yen sign in CP 936, the Yuan sign (identical to the Yen sign) in CP 936, 938, and 950, and the Won sign in CP 934, 949, and 1361. In addition, the visual representations of the control codes 01-1F were also different – CP 932, notably, at least on DOS/V, moved some of the line / box drawing characters there.
Michal Necasek says:

February 16, 2026 at 12:45 pm

Fascinating, thank you!

Do you know what was the technical reason for the separate Cyrillic YUSCII variants? Was it not possible to put all the required characters into a single codepage? Or was it just the usual problem where everyone wanted only “their” characters and keep the rest unmodified as much as possible?

In the last para, did you mean “lower 128 chars of ASCII”? Although technically, ASCII is only 128 chars to begin with, so “lower 128” is redundant and “upper 128” an oxymoron.
OBattler says:

February 17, 2026 at 3:30 am

That was because ЌќЃѓ (ḰḱǴǵ) and ЋћЂђ (ĆćĐđ) were deemed to be equivalent and for Macedonian names, it was deemed acceptable to replace the former with the latter in the other languages of SFRY, and to replace the latter with the former in Macedonian. Of course this is different now.
OBattler says:

February 17, 2026 at 3:30 am

And yes, I meant the lower 128 characters of ASCII, sorry.