Where Did CP852 Come From?

Posted on February 15, 2022 by Michal Necasek

In the 1990s, a lot of my documents were written in code page 852 (CP852), also known as PC Latin 2. This code page is sometimes called “Eastern European”, which is a bit misleading, given that it does not cover major Eastern European countries like Ukraine; sometimes it is also called “Slavic”, which is no less misleading because it covers languages like Hungarian or Albanian that aren’t remotely Slavic.

In those days, fighting with code pages was a constant source of annoyance and pain. DOS and OS/2 used CP852, Windows used CP1250, and Unix/Linux used ISO 8859-2. Of course these code pages were all incompatible with each other. The worst problem was early web where content was often offered in some 8-bit encoding but with no hint as to which encoding that might have been (let’s play a guessing game!). It is a real shame that UTF-8 hadn’t come a bit earlier.

In the early to mid-1990s the situation was further complicated by several non-standard encodings, like the Kamenický brothers encoding in Czechoslovakia or the Mazovia encoding in Poland. Those encodings originated in the mid-1980s and tended to preserve most of the CP437 semi-graphic characters; code page 852 did not, on the other hand it covered quite a few languages. Users initially preferred the non-standard national encodings because those worked better for them, but built-in operating system support pushed those out.

And now I started wondering: When did CP852 become available to users, and where did it actually come from? The first question can be answered reasonably accurately, while the second remains unclear.

The first major product that offered CP852 support was IBM DOS/MS-DOS 5.0 in 1991. This was followed by OS/2 2.0 in 1992. There is no indication that any earlier DOS version supported CP852, although in 1992, Microsoft sold a shrink-wrapped product called AlphabetPlus which allowed installing CP852 and the associated national support on top of MS-DOS 3.3 or 4.0.

As it turns out, thanks to the surviving betas of MS-DOS 5.0 it’s possible to define the timeline of CP852 support in DOS quite accurately. In the MS-DOS 5.0 beta from December 1990 there is no sign of CP852 or any associated country support.

The next beta from January 1991 is most interesting. It offers several previously unsupported countries in its setup:

New country support in Jan ’91 MS-DOS 5.0 beta

Czechoslovakia, Hungary, Poland, and Yugoslavia all needed CP852 and weren’t supported before. The setup offers all those countries but… it does not work! Perhaps the support was not yet complete, or there was a packaging error, but the EGA.CPI file (screen fonts) shipped with the January 1991 MS-DOS beta is actually the same as in the previous MS-DOS 5.0 beta and does not include CP852. So the language support can be configured, but enabling it fails. This clearly indicates that January 1991 was when Microsoft was in the middle of adding the required support.

The next MS-DOS beta from March 1991 includes the expanded EGA.CPI file and supports CP852 properly:

Functioning national language support in March 1991 DOS 5.0 beta

Thus the oldest available OS build with functioning CP852 support known to date is… OS/2 2.0 pre-release build 6.123 from February 1991. This build contains the updated character definitions and CP852 support actually works. Unfortunately no older betas of OS/2 2.0 are available for comparison, and it is thus unclear if 6.123 was the first build with CP852 support or not. Given that MS-DOS had only just started getting that support in January 1991, there is a reasonable chance that OS/2 6.123 was indeed the first such build.

This leaves an open question: Did any earlier IBM products include CP852 support? Perhaps DB2? Or some communication product? It is not impossible.

Backporting CP852

An obvious question occurred to me: Can the extended national language support (NLS) in DOS 5.0 be grafted on top of an earlier DOS version? It appears that it can:

All it takes is copying over COUNTRY.SYS, KEYBOARD.SYS, and EGA.CPI. That allows DOS 4.0 to support the new countries, keyboards, and code pages added in DOS 5.0, while using the original DISPLAY.SYS, NLSFUNC.EXE, KEYB.COM, and MODE.COM shipped with DOS 4.0.

Note that DOS 4.0 uses the same NLS infrastructure that IBM introduced in DOS 3.3. Correctly configuring DOS NLS is somewhat mind-bending, but that’s a topic for another time.

CP852 Origins

So… where did code page 852 come from, exactly? The obvious place to look would be IBM. According to IBM’s globalization database, CP852 was registered in… 1993. That is not exactly credible, given that DOS 5.0 provably supported CP852 in 1991. Even more confusingly, RFC 1345 says that their source for CP852 is IBM’s NLS reference manual from March 1990. Who do you trust, IBM or people citing IBM?

My attempts to locate an old copy of IBM’s “Green Book” have proven unsuccessful. It is thus very unclear when IBM first published CP852.

An interesting wrinkle is IBM’s definition of character set (not code page!) 982, Latin 2 PC. That character set, which is supposedly primarily contained in CP852, was defined in 1986! Did IBM really sit on it for several years before putting together a code page with that character set? That sounds quite unlikely.

Note that 1986 would be roughly in line with ISO 8859-2 or ECMA-94; the second edition of ECMA-94 defined “Latin Alphabet No. 2” (equivalent of ISO 8859-2) in June 1986. It’s quite likely that IBM was involved in the standardization process.

A well-informed source tells me that the CP852 character bitmaps (that end up in EGA.CPI) as well as Czech, Hungarian, Polish, Slovak, and Yugoslav keyboard layouts for DOS were defined by IBM in 1987. Yes, 1987–the year DOS 3.3 was released, and four years before those keyboard layouts and CP852 would be supported in DOS 5.0.

In the late 1980s, the countries that would typically use CP852 were all behind the Iron Curtain and many of them embargoed. IBM PCs were not commonly available there. So why would IBM care?

There turns out to be some interesting and obscure history. IBM has a lab in Budapest which opened in 1936 and never since closed. IBM also had offices in Prague, Sofia, and Warsaw. Vienna was the headquarters of IBM ROECE, variously given as “Regional Office Europe Central and East” or “Regional Office for Eastern and Central Europe”. IBM ROECE’s territory was essentially the Eastern Bloc minus Soviet Union: Bulgaria, Czechoslovakia, East Germany, Hungary, Poland, Romania, Yugoslavia.

According to an article published in German ComputerWoche in February 1976, IBM ROECE was indeed selling systems to Eastern Bloc countries. Note that the article mentions “over 300” systems installed, but that’s mid-range and mainframe systems, not PCs (obviously). IBM also reportedly had trouble selling larger System/370 machines due to embargoes. Yugoslavia was the biggest buyer because it was not subject to the COMECON embargo.

This was never talked about much by either side. Communist governments didn’t want to talk about buying from evil capitalist Americans because their own technology was inferior, and IBM perhaps didn’t want to boast too much about selling to evil communist Eastern Bloc countries either.

But it does explain why IBM worked on national language support for those countries well before the Berlin Wall fell and the borders opened.

P.S: What’s a Caron, Anyway?

A minor related mystery is that of the caron. Several languages (mostly Slavic) use a diacritical mark which likely originated in Czech and is called “háček” (it’s that funny little thing above the ‘c’), which means “hook”. In current international standards, the diacritical mark is called “caron”. The problem is that no one understands why, and the whole thing might be a misunderstanding. There are some guesses, but the origin of the term “caron” is completely unclear; there is evidence that it was supposed to designate an inverted caret (^) and not a diacritical mark. It is fascinating to see how mistakes can spread.

This entry was posted in DOS, I18N, IBM, Microsoft, OS/2, PC history. Bookmark the permalink.

47 Responses to Where Did CP852 Come From?

Richard Wells says:

February 15, 2022 at 10:06 pm

Unicode: Just say no. In 1990, the Comecon nations were largely stuck with 64 kbit memory chips resulting in PC compatible systems with only 128K of which 32K was given to video. Even the super 8-bit machines with a monstrous 256K needed that extra memory to create RAMDISKs in order to load CP/M from cassette since the modest production of computers was still about 10 times as numerous as the disk drives being manufactured. Cutting the number of characters that could be stored was the last thing needed.

Caron: For the GPO to include this in the style guide, new typesetting machine supporting new typographical symbols would have been installed prior. That means RFPs and internal memos detailing the need for caron. Those probably started sometime after 1956 when accurately reproducing text from Czechoslovakia would have become relevant. Finding out when the GPO decided on caron as a term would require little more than FOIA requests covering that decade. Tedious and expensive but not difficult.
MiaM says:

February 16, 2022 at 7:42 am

Richard: Where there even any disk drives produced in the east block?

For example it seems like the GDR used imported TEAC drives.
Yuhong Bao says:

February 16, 2022 at 9:07 am

I wonder what about Unicode.
Yuhong Bao says:

February 16, 2022 at 9:09 am

In particular, the first work on Unicode were being done in 1989-1990.
Michal Necasek says:

February 16, 2022 at 10:54 am

Yes, there were. Bulgaria produced hard disks (IZOT), I believe USSR also produced hard disks and floppy drives. But all in small quantities and relatively outdated. At the same time hardware was imported, especially from Taiwan.
Richard Wells says:

February 16, 2022 at 11:12 am

@Miam: Robotron had a floppy drive manufacturing plant in the DDR. BRG in Hungary made a really snazzy 3″ hard shell floppy. MOM was another Hungarian floppy manufacturer though doing the standard 8″ and 5.25″ drives. ISOT in Bulgaria made 8″ and 5.25″ floppy drives.

The Soviet Union had built a huge floppy disk plant but apparently no disks were ever made there.

https://www.robotrontechnik.de/index.htm?/html/komponenten/fs.htm will provide pictures of the Robotron designed floppy drives, the Robotron manufactured drives licensed from TEAC, and the drives manufactured in other Comecon nations for use by Robotron.

One funny bit of Communist marketing was the number of products given English names to create the illusion that those were bootleg copies of American products. Examples include “One Track System” and “Cassette Aided Operating System.”
Michal Necasek says:

February 16, 2022 at 12:11 pm

Yes, 16-bit Unicode encoding was highly unattractive, even though it wasn’t really around at the time anyway. UTF-8 didn’t exist at all. No one had Unicode fonts, and no one wanted giant fonts with thousands of characters; there was nowhere to put them. 8-bit encodings made perfect economic sense because the 8th bit was already there, free for the taking.

Interestingly, WordPerfect for DOS (at least in the 5.x times) used internal document coding with multi-byte characters and a “universal” code page, essentially the practical equivalent of Unicode for storing documents. That was in contrast with e.g. Microsoft Word which used typical 8-bit code pages.

The problem with the caron is that the 1967 GPO manual makes no suggestion that “caron” should be a diacritical mark at all, or that it is in any way related to Czech.
Vlad Gnatov says:

February 16, 2022 at 12:12 pm

> The Soviet Union had built a huge floppy disk plant but apparently no disks were ever made there.
AFAIR, there were a few big floppy plants in ussr. For example Elektronmash under Kyiv, which produced tons of (very low quality) 5.25″ floppies:
http://foto.a-le.ru/wp-content/uploads/2015/02/KIMG_5396a.jpg
and around 100k hdd: МС5401 (likely clone of Seagate ST-506) and МС5405 (clone of ST-225 ?)
Michal Necasek says:

February 16, 2022 at 3:43 pm

According to this article, Zbrojovka Brno (an arms manufacturer but briefly also a computer maker) was just starting hard disk production in 1989-1990 but it was shut down because it no longer made economic sense — those were 10 MB ST506 hard disks, apparently clones of some Western 5.25″ drive smuggled into Czechoslovakia.

More photos of Soviet/Russian floppies and drives here: https://red-innovations.su/index/photos_c/fdd.html
Michal Necasek says:

February 16, 2022 at 4:10 pm

What about Unicode? Nothing, of course. Unicode was completely irrelevant in the 1980s and early 1990s.
Vlad Gnatov says:

February 16, 2022 at 5:20 pm

>More photos of Soviet/Russian floppies and drives here
Ah, infamous gmd-90. They were so bad, that in pack of 10 every first(almost) had bad sectors and every 4th or 5th was completely unusable, because bads were on track 0.
I’m speculating, but I think that terrible diskettes quality brought whole class of software to popularity in (x)ussr: floppy disk rescuers/reanimators.

p.s. I found in bookmarks article with nice photos of soviet hdds:
http://oldpc.su/articles/hdd/hdd.html
Richard Wells says:

February 16, 2022 at 9:08 pm

The story of the Soviet floppy disk factory that never produced a disk was told at https://adambager.com/tag/floppy-disk-factory/ Unfortunately, that left floppy disk manufacture with the existing plants doing little more than small test batches.
Richard Wells says:

February 16, 2022 at 10:33 pm

The caron only refers to a diacritic element when used in typography so I would not expect the GPO to have offered a different meaning. There to have be memos detailing why the GPO chose to implement carons and why the term caron was used. Bureaucracies mean memos and memos provide proof.

I suspect 1956 as a crucial year since I have seen claims that some analysis errors in that year were caused by mistakes in transliteration. Being able to provide text using the exact same symbols as originally used will prevent transliteration and the attendant problems.
Michal Necasek says:

February 17, 2022 at 12:17 am

Well, look at the GPO manual — it’s not hard to find. As far as I can tell, it simply shows a ‘v’ shape next to the text “caron”, and that’s it, there is no other mention of it. The context in which it is given is ambiguous. The section is titled “Signs and Symbols”, and the only explanatory text says “This list contains the signs and symbols frequently used in printing by this Office.” The table contains all kinds of symbols, including diacritical marks (acute, grave, macron, etc.). Now, the curious thing is that the list contains both “circumflex” (diacritical mark) and “caret” (proofreading symbol), which visually look the same in the table. If one thought “caron” was a diacritical mark, surely one would expect to find it near the circumflex. But no, it is listed after the caret.

In other words, the symbol list in the GPO manual groups all diacritical marks together, and “caron” is not among that group. At the same time I can easily see how someone could look at the table and go “oh, let’s call this diacritical mark ‘caron’, it’s right there in the GPO manual”.

Seeing the memos would certainly be great.
Michal Necasek says:

February 17, 2022 at 12:33 am

There is supporting evidence that the caron was never meant to be a diacritical mark in a newer GPO manual. The 1981 edition has a very similar symbol table, but it lists accents as a separate group. There is no caron among them. I can’t find the caret anywhere in the entire table of symbols either.
Yuhong Bao says:

February 17, 2022 at 3:14 am

Unicode may not be common yet but the first work on it was already being done.
Michal Necasek says:

February 17, 2022 at 9:42 am

Sure, but how is that relevant? Even in 1990, most people had never even heard of Unicode. More importantly, they already had a solution (code pages). Not a great one but a working one. Especially in DOS, the last thing anyone wanted was giant translation tables eating precious memory.
Michal Necasek says:

February 17, 2022 at 10:04 am

I’ve done a bit more digging. The 1959 edition of the GPO style manual has a table (page 178) that’s very similar to the 1967 edition, but there’s only caret, no caron. The 1973 edition reorganized the table of symbols (page 172), “accents” are listed separately, and include neither caret nor caron.

Here’s the thing. The “caron” appears to only exist in the 1967 edition of the GPO style manual, not in older or newer editions. That is mighty strange.
Richard Wells says:

February 17, 2022 at 10:15 pm

In 1990, East Germany had embraced the 7-bit ASCII of Basicode in order to provide a single common programming manual across the various systems being built. Companies in the US were preparing to support all the special characters used in other countries at the same time that those countries were preparing to drop the special characters just to have useful computers. History is fun.
Fernando says:

February 17, 2022 at 11:09 pm

In my country at the time there was the same problem with Code Page 850 or ISO 8859-1, I had to change code page a few times a day depending the text encoding of the program or document.
Searching the web and using machine translation take me to these Blog where the author had the same question as you, apparently the oldest document that he could find was the 1967 edition of the United States Government Printing Office Style Manual with the term “Caron” (well you found it independently):
https://babelstone.co.uk/Blog/2009/08/antedating-caron.html
Doing another search in Google books found that the IBM Advanced Interactive Executive for the Personal System/2, AIX PS/2 – Technical Reference Version 1.1, Volume 2 from 1989 support Code Pages, it only show snippets and I can’t find this document or earlier manuals to check if they support Code Page 852, but probably shows that before 1989 IBM was working in international support for their operating systems.
Michal Necasek says:

February 17, 2022 at 11:32 pm

Yes, that “Antedating caron” post is what I read and linked to. It’s really a mystery.

Now I’m wondering when CP850 first popped up. It’s definitely in PC DOS 3.3 (March 1987) but I don’t know if it was more or less brand new at that point or already around for some years. The DOS 3.3 EGA.CPI includes fonts for CP437, CP850, CP860, CP863, and CP865.
Richard Wells says:

February 18, 2022 at 3:58 am

The PC DOS 3.3 announcement seems to indicate that the code pages were new. The OS/2 1.0 announcement also indicates it is new. Prototypes must have been an open secret since revised printers with code page support also announced on April 2, 1987.
John Elliott says:

February 18, 2022 at 1:43 pm

According to Larry Osterman, NLS support was ported to PC-DOS by IBM from their mainframes. (Certainly meseems that the CPI file format was derived from a file format that already existed, rather than being created from scratch for PC-DOS).

As for where the code pages 850, 860 etc came from, I’d guess that they were created at the time of the port. The alternative would be that they already existed in the mainframe world, but they’re all based on codepage 437, which is intimately tied to PC hardware. So would they be any use on a mainframe except for data exchange with PCDOS?
Michal Necasek says:

February 18, 2022 at 2:24 pm

According to the IBM globalization registry, the code pages in DOS 3.3 were all registered/copyrighted in 1986, i.e. right before DOS 3.3 came out. Same for the character sets. So yes, definitely created when NLS was being added to DOS 3.3. CP437 shows a 1984 registration date, slightly older.

Going by the same globalization registry, IBM started with some kind of internationalization work in the EBCDIC world in 1977; moreover, with the whole EBCDIC vs. ASCII thing, IBM was certainly no stranger to translating incoming/outgoing text. The ASCII-based code pages were only being standardized in the mid-1980s in any case.

It’s entirely possible or even likely that the actual NLS code was written for DOS from scratch, but the CPI format was adopted from elsewhere because IBM already had some font editors for that format. I remember seeing something about a “MULTIFON” tool somewhere. The actual code pages seem to have been new, but the code page concept almost certainly existed on some other IBM systems already.
Rich Shealer says:

February 19, 2022 at 12:39 am

Do the assigned numbers of code pages have a meaning or are the sequentially or randomly applied?
Michal Necasek says:

February 19, 2022 at 12:53 am

I don’t think they do. Related code pages may be grouped together (e.g 85x covers mostly various Latin alphabets, similarly 125x is the Windows Latin codepage range) but that’s about it. The numbering is not sequential. There’s not much of a system that can be discerned, e.g. CP850 (DOS) corresponds to CP1252 (Windows), while CP852 corresponds to CP1250 (Latin 2), and CP855 corresponds to CP1251 (Cyrillic).
Yuhong Bao says:

February 19, 2022 at 9:49 am

That is of course because DOS 3.3 was released after Windows 1.0.
Michal Necasek says:

February 19, 2022 at 12:17 pm

Yes, but Windows is not particularly relevant for CP852 discussion. Even Windows 3.1 (April 1992) did not have Latin 2 support equivalent to DOS 5.0 (June 1991). As far as I know, the equivalent CP1250 only showed up in Windows 3.1 for Central and Eastern Europe in late 1992. Microsoft’s approach for Windows in the late 1980s clearly was “let the OEMs deal with this nonsense”.
John Elliott says:

February 20, 2022 at 2:48 am

The numbering is also confused by pages being allocated at various times by IBM and Microsoft, who didn’t always give the same page the same number. ISO 8859-1 is 819 to IBM but 28591 to Microsoft, for instance.
Yuhong Bao says:

February 20, 2022 at 3:59 am

(and by then NT 3.1 was already planned to use Unicode)
Richard Wells says:

February 20, 2022 at 5:16 am

Some of the EBCDIC code pages seem to have been followed by similar DOS code pages like 300 (EBCDIC Kanji DBCS) and 301 (DOS Kanji DBCS). Most of the code page numbers seen to have been allocated in blocks for a given purpose with skipped numbers getting filled later. It does seem a fortuitous coincidence that the PS/2 (models 85xx) was paired with code pages 85x.

MS did provide the functions AnsiToOem and OemToAnsi to help solve the problems of changing between the code pages though MS’s focus seemed to be more on making sure the filename was correctly displayed under DOS than in handling all data entry conversions. Curiously, most of the references I can find suggest that the Ansi functions were new to Windows 3 but they are listed on pages 151 – 152 of my Windows 2 Programmer’s Reference.
Michal Necasek says:

February 20, 2022 at 11:53 am

The horribly misnamed “ANSI” code page was definitely not new in Windows 3. My Windows 1.03 SDK reference from 1986 refers to “ANSI” and “OEM” character sets. The “ANSI” set is listed in the reference, and the OEM set is only described as “system-dependent”. The AnsiToOem and OemToAnsi functions were in Windows 1.03 as well. And yes, the whole “OEM” code page naming was clearly Microsoft-speak for “not our problem”.

The other day I installed Windows 3.1 for Central and Eastern Europe (late 1992). The first thing it wants is to set up DOS to use CP852 so that national characters can be used in file names. I see that and think “oh please don’t” because I can immediately see the hell that will break loose when people try to copy such files to floppies or move them around a network.
ender says:

February 20, 2022 at 11:54 pm

Well, pre-VFAT, filenames were simply 8-bit, so national characters in filenames would simply look wrong on machines that used different codepages.

The real fun started with Win95 and VFAT, since that uses UCS-2 character set. I remember helping a friend, who’s father got some floppies from Russia, where all filenames were in Cyrillic. They showed up as ____________.ext in Slovenian Windows 95, and you couldn’t access any of them. Luckily running scandisk on the floppies mangled the filenames into unreadable mess, but which then could be opened.
Michal Necasek says:

February 21, 2022 at 9:49 am

OS/2 reportedly had potential problems with national characters in HPFS. The reason was that directory entries were sorted, and changing the system codepage could change the sort order of files.

Even in DOS, the files won’t be inaccessible but typing their file names might be difficult.

Actually with YUSCII, you mentioned the C:Đ> prompt — since the backslash was replaced by Đ, I’m guessing that ‘Đ’ could not be used in file names?
Richard Wells says:

February 21, 2022 at 11:22 pm

The filename should be using keys on the localized keyboard and converted as necessary. No problems in country since the key is available under DOS as well. Now, passing the file to a system using a different language’s keyboard might be a problem but that is best resolved through corporate standards. Anyone who chooses to create filenames with the use of Alt combos deserves the extra work created for themselves.

Sorting will always be a challenge but different languages have different sorting procedures, some of which seldom seem to be used. For example, French seems to have a strange sorting algorithm. I haven’t seen many programs that do the sort left to right with uppercase, then left to right with lower case without accents, and then right to left for the accents.

John Parry in the 1991 Translating and the Computer conference provided a paper describing the problems with sorting across languages. Not much of those conferences is of general interest but character sets* do get occasional mentions and the papers point to fairly obscure references as to the development. Convenient since all the conference material is online.

* including the then new code pages 850 and 852 and a 1984 paper extolling the virtues of 16-bit character sets
Michal Necasek says:

February 22, 2022 at 9:37 am

Ah yes, Sweden and Finland. I believe their not-quite-ASCSII was the reason for trigraphs in C. For a long time I couldn’t understand what useful function trigraphs could possibly serve, because it never occurred to me that someone might want to program in C and not have ASCII or even the basic ASCII character set.
MiaM says:

February 26, 2022 at 4:34 am

Not only Sweden and Finland. Denmark and Norway uses the same 7-bit codes for their ÅÆØ as Sweden and Finland uses for ÅÄÖ, and they are also pronounced the same way. Germany had Ä and Ö on the same 7-bit codes as Sweden and Finland, but instead of Å they have Ü.

All this was in the old ISO 646 character standard that afaik were never used on a PC except for possibly in some printer drivers. On a PC it would be really awkward as the 7-bit code for \ were used for one of the national characters in ISO 646.

So there certainly were at least six and a half country that would had some use for a substitute for the {}[] characters.

But more importantly C were a niche thing almost only used in the rather small Unix world. In addition to the “home computer” languages Basic and Assembler, the common languages were Pascal, Fortran and in the financial world Cobol. I remember a Swedish computer magazine doing an article in about 1983 about various languages and there were loads of languages that gained very little traction over the years but still are somewhat known. Logo, Modula, Lisp, Fortran and whatnot.

Side track re file names that can’t be accessed: If you use an SMB client that lets you set any file name and create a file name that ends with a space and/or with a dot on a NT4 computer then at least one of the two can’t be deleted using any of the supplied user interfaces (delete, deltree, deliting in explorer, deliting in winfile). IIRC one of them can be deleted by deleting the directory the file is in. I learned this the hard way by using a SMB client on my Amiga and copying files where the names first had gotten truncated by the 30 char name length limit on the Amiga and then copied to the PC without fixing the names.
Michal Necasek says:

February 26, 2022 at 10:12 am

I mostly know the “IRV” (International Reference Version) of ISO 646/ECMA-5, which is almost identical to ASCII, but I see that the “basic set” sacrificed {}[]\|#@ (and more) to national characters. I have never seen any mention of this being used on PCs, but then PCs of course always used 8-bit coding. The early PCs all had the character bitmaps in ROM, and I’d expect that if you had to make major changes to support national characters, the 7-bit codes just made no sense.

Do you know what systems would have been the ones requiring trigraphs, i.e. 7-bit coding with national characters and supporting C development?

You’re right that C was not a thing when the PC came out. IBM supplied tools for BASIC, assembler, Pascal, FORTRAN, COBOL, but not C. The first C compilers appeared pretty quickly but I’d say it wasn’t until about 1985 that C became important on PCs (and at the same time the importance of UNIX grew significantly). Both Windows and OS/2 were significantly C-oriented.

Oh, yeah, dots in file names are a bit of a culture clash. As you probably know, CP/M and DOS have a “virtual” dot which is not in fact stored anywhere (on the FAT file system). For that reason, “FOO” and “FOO.” are equivalent from the user’s point of view. There was likely some logic in NT that strips a trailing dot to preserve the existing behavior. But if you manage to create a file which actually has the dot there… yeah that could be a problem.
zeurkous says:

February 26, 2022 at 1:37 pm

WinRAR at least used to perfectly create files with names that ended in
a dot, when extracting from a tape archive containing files with such
names.

Those files then couldn’t be accessed by Windoze, although bypassing to
the NT level worked. The most obvious workaround, merecalls, was to
rename (or remove) the file from within the WinRAR environment. Messy.

It’s apparent that, in the modern world, there has to be some kind of
escape convention for fs-special characters. UNIX is fairly limitless
when it comes to naming, but even here one can’t have file names
containing the string “OS/2”, for example.

In 1967, such a tradeoff would’ve been not only acceptable, but obvious.
In 1985 it indeed made a lot less sense already.

That mess-dos et al. treat a whole slew of characters as special, as
opposed to one (or at most a couple), mecan only consider foolish.
Michal Necasek says:

February 26, 2022 at 3:10 pm

In DOS, there is a huge difference between what the command processor treats as special characters and what the actual file system does. The file system treats ‘:’, ‘/’, ‘\’, ‘.’, ‘*’, ‘?’ as special, but that’s really about it. UNIX is AFAIK much much less picky and really anything but the path separator is OK.

DOS also has other fun behaviors like ‘/’ works a path separator on the API level, but definitely not in COMMAND.COM. I remember being quite surprised when I found out many years ago (code that worked fine on DOS failed in some minimal DOS-like environment because it used ‘/’ as a path separator).

One of the crazier things that survived into Windows is how aux.*, con., lptN.*, etc. virtually exist in every directory. As a harmless example, ‘copy foo.txt con.txt’ is illustrative.
zeurkous says:

February 26, 2022 at 3:42 pm

On UNIX, NUL is also not allowed. But yeah, that’s about it.

(Interesting how a character intended as harmless ended up being treated
as an absolute stop value…)
Richard Wells says:

February 26, 2022 at 11:20 pm

The characters dropped from the US character set for the VT-220 for use in models sold out of the US match well with the trigraphs added to C. See https://en.wikipedia.org/wiki/National_Replacement_Character_Set
ender says:

February 27, 2022 at 12:27 am

> Actually with YUSCII, you mentioned the C:Đ> prompt — since the backslash was replaced by Đ, I’m guessing that ‘Đ’ could not be used in file names?

Yup, Đ and đ could not be used in filenames (because they were really \ and |), and while the other letters could be used, case-insensitivity did not work on them, so you could have č.txt and Č.txt in the same directory.
Michal Necasek says:

February 28, 2022 at 6:51 pm

I’m not sure how much of that was technology driving culture or the other way around, but that sounds highly likely. Basically those ASCII characters were reserved for replacement by national characters, so whoever decided to modify ASCII used exactly those characters.

Replacing those characters was probably okay for “standard” text but must have been really painful for any kind of programming work.
Winfried says:

March 10, 2022 at 9:29 pm

> AFAIR, there were a few big floppy plants in ussr. For example Elektronmash under Kyiv, which produced tons of (very low quality) 5.25″ floppies

I happen to be a veteran of the Elektronmash floppy plant. Around 1990, I joined german Boeder AG (or rather its sibling Boeder Consult GmbH) which back then equipped the Kiev floppy plant. They made both 5,25 and 3,5inch floppies there. All the equipment came from western manufacturers and the whole plant was more or less a copy of the Boeder floppy plant in Berlin.

We tried our best to set the conditions for quality floppy manufacturing, but we could not be everywhere… There is a machine called “burnisher” which polishes the surface of the punched round pieces of magnetic tape (called “cookies”). Polishing is done with a polishing tape, and of course this tape has to be fed forward during polishing. One of my colleagues found that the people there had set the feed to zero. They were always short of supplies, and they tried to save some polishing tape, not caring about surface quality of the floppies…
Michal Necasek says:

March 10, 2022 at 11:25 pm

Cool story. Thanks for writing that!

Out of curiosity, if the cookies weren’t properly polished, what would have been the most common symptoms that end-users would have seen?
Winfried says:

March 11, 2022 at 3:05 pm

I am not that much an expert of floppy manufacturing and application (I worked only 2 years in this business) but I think the most common errors would be drop-outs, i.e. loss of single bits.

This “polishing” was mainly done to remove minor surface defects such as “warping” (which is caused by unrolling the rolls of magnetic tape).

BTW the floppy drive magnetic head also has a small abrasive effect. Since all floppys were checked for drop-outs (and also for “drop ins”, randomly occurring “extra” bits and for some other parameters) using modified floppy drives, they usually get better the more often you test them. But running them through the test systems again and again will also have abrasive effects on the magnetic read/write heads, so this is not really a solution…