Why Does Windows Really Use Backslash as Path Separator?

More or less anyone using modern PCs has to wonder: Why does Windows use backslash as a path separator when the rest of the world uses forward slash? The clear intermediate answer is “because DOS and OS/2 used backslash”. Both Windows 9x and NT were directly or indirectly derived from DOS and OS/2, and certainly inherited much of the DOS cultural landscape.

That, of course, is not much of an answer. The obvious next question is, why did DOS use backslash as a path separator? When DOS 2.0 added support for hierarchical directory structure, it was more than a little influenced by UNIX (or perhaps more specifically XENIX), and using the forward slash as a path separator would have been the logical choice. That’s what everyone can agree on. Beyond that, things get a bit muddled.

The only thing that is clear is that Microsoft and IBM were responsible for using the backslash as path separator in DOS 2.0. Microsoft reportedly wanted to use the forward slash as path separator, but IBM nixed the idea because it would have created an incompatibility with DOS 1.x, which already used the forward slash as a switch character, separating command options.

Microsoft old-timers all agree that IBM was strongly against changing the forward slash as a switch character. They are less clear on where that particular slash usage had come from.

There are silly theories about it, like “the slash came from CP/M”. Well, it didn’t. There is no real evidence that CP/M used the forward slash anywhere except the name of the product. Most CP/M commands had no options at all. Third party CP/M tools (such as Micrsoft’s) may well have used slashes, but not the OS itself.

There is obvious nonsense like “CP/M got the slash from VMS”, which is simply not possible because CP/M is older than VMS, and CP/M did not use the forward slash anyway.

SCP’s 86-DOS likewise did not use the forward slash. Which means DOS did not inherit the forward slash from its direct or indirect predecessors (86-DOS and CP/M). It must have come from somewhere else. Can we find out where from?

In fact even PC DOS 1.0 (1981) used the forward slash very, very little. The FORMAT command had the /S option to copy system files, and LINK had a /P option to pause before writing the resulting executable, so that floppies could be swapped. That was it.

Notably LINK and FORMAT were both written by Microsoft. An important point is that PC DOS 1.0 did use forward slashes as option separators upon release.

PC DOS 1.1 (1982) significantly increased the use of the forward slash. COPY had the /A, /B, and /V switches, DIR had /P and /W switches, while DISCKOMP, DISKCOPY, and FORMAT now had a /1 switch (for one-sided floppy operation). The linker likewise had new options such as /DSALLOCATION, /HIGH, /LINE, /MAP, /PAUSE, and /STACK.

That kind of forward slash usage was no doubt why IBM insisted on keeping it in DOS 2.0. Changing the slash semantics had a clear potential for destroying data, especially when running batch files written for DOS 1.1. Something like ‘COPY FOO + BAR /A’ has rather different semantics when /A is a switch vs. when /A is a file or directory in the disk’s root directory.

In retrospect it seems silly to change the path separator in order to preserve backward compatibility with a short-lived early PC operating system, but hindsight is 20/20 and in 1982, the future popularity of UNIX and DOS (and its derivatives) was incredibly non-obvious.

At any rate, it’s clear that DOS used the forward slash since version 1.0, even though its immediate predecessors, CP/M and 86-DOS, did not. So where did it come from? Manuals for old Microsoft language products provide a very good hint. Microsoft’s tools for the 8080 (F80 FORTRAN compiler, M80 macro assembler, LINK 8080 linker all used the forward slash as a switch character separating command line options, and did so at least as far back as 1977. In other words, Microsoft used slash as a switch character before the 8086 even existed, and continued using it for 8086-based tools.

Did Microsoft invent the slash usage? Of course not. A good hint was provided on Larry Osterman’s blog by Hans Spiller, an old time Microsoft compiler author. Even that hint looks wrong, because the claimed history was DEC TOPS-10 to CP/M to DOS, even though CP/M itself did not use the slash. But TOPS-10 sure did use the slash. And a lot more.

In his Computer Connections memoir, Gary Kildall claimed (page 26) that Bill Gates and Paul Allen used a DEC PDP-10 timesharing system, almost certainly running TOPS-10, after stealing other users’ passwords (because the OS recycled memory pages used by other users without scrubbing them, leaving information including plaintext passwords behind).

Microsoft’s own MS-DOS Encyclopedia (1988) mentions that Marc McDonald, an early Microsoft employee, developed an 8-bit multitasking OS called M-DOS, which was “modeled after the DEC TOPS-10 operating system”. Further on, the same book explicitly says that “version 1.x of MS-DOS, borrowing from the tradition of DEC operating systems, already used the forward slash for switches in the command line”.

There we have Microsoft’s own word that the forward slash came not from CP/M, not from IBM, but from DEC, and there’s an explicit mention of TOPS-10. Can we find supporting evidence?

It’s TOPS-10!

The DEC TOPS-10 goes back to the late 1960s and is certainly old enough that CP/M and DOS could have been influenced by it. UNIX, on the other hand, is not quite old enough to have had much influence on CP/M at all, because both were developed more or less at the same time, in the early 1970s. While UNIX was a strong influence on DOS 2.0, there is little evidence that it influenced DOS 1.x in any way at all (while CP/M was a very major influence, obviously).

The DEC PDP-10 mainframe appeared in 1966, and since 1970 it was marketed as DECsystem-10, running the TOPS-10 operating system. The PDP-10 with TOPS-10 was perhaps the first widely used time-sharing system, and it was used by many universities in the 1970s. As such, many early Microsofties, likely including Gates and Allen, would have been familiar with TOPS-10.

Nowadays it is easy to find a TOPS-10 manual, and browsing through it is quite a revelation. Here’s a short example (page 2-220):

PROG        FOR     1 <055>    10-SEP-79    DSKC: [27,5434] 
PROG        REL     1 <055>    10-SEP-79
PROG        EXE    68 <055>    10-SEP-79 

The DIR command lists directory contents. File names have a fixed length and a three-character extension. Executables have an .EXE extension. Disk names have a colon at the end. That looks a lot like DOS and nothing like UNIX.

Note that the odd numbers in square brackets are the TOPS-10 way of specifying directories. Good thing DOS didn’t adopt that one. Also note that DOS replicates the non-obvious behavior where ‘DIR PROG’ is the equivalent of ‘DIR PROG.*’ and will list files with any extension. That’s very unlikely to be a coincidence.

The list of TOPS-10 standard file extensions (appendix D in the manual) brings a lot of DOS memories. The .EXE extension for executables is just the first; there is .OBJ for object files, .SYM for symbol files, .CRF for cross-reference files, .SYS for system files, .BAS for BASIC source files, or .TXT for ASCII text files. Again, these file extensions are very much like DOS and not at all like UNIX.

The TOPS-10 macro assembler also used the IRP macro construct which is known from MASM, but not used by other assemblers (notably Intel’s ASM86 has quite different macro syntax). It is extremely likely Microsoft found the inspiration for MASM in DEC’s assembler.

Case Closed?

Only Microsoft old-timers could probably answer these questions, with the huge caveat that human memories tend to be less than reliable after 40 years. However, there is historical evidence that Microsoft used TOPS-10, very strong circumstantial evidence that several aspects of DOS were influenced by TOPS-10 (including the use of a forward slash as a option separator), and a clear statement in the MS-DOS Encyclopedia saying that the forward slash usage came from DEC operating systems, including mentions of TOPS-10.

It is probably fair to say that Windows today uses backslash as a path separator because 50 years ago, TOPS-10 used forward slash as an option separator.

Update: After posting this article, I realized that although it gives a good explanation of why DOS did not use the forward slash as path separator, it is vague on why backslash was used instead. The answer is the IBM Model F keyboard. The visual similarity of the forward slash and backslash likely played some role, but the deciding factor was ergonomics.

The path separator needed to be a key that does not require Shift to type and that did not already have established use (such as the dot, comma, or semicolon keys, many of which were used by Microsoft language tools). The backtick (`) may have been the only other key available, and given the choice a backslash makes a lot more sense.

The ergonomics of non-US keyboards were naturally not considered, which caused the path separator to be ridiculously difficult to type on some national keyboards (e.g. German), perhaps as a perverse reminder that ergonomics do matter.

This entry was posted in DEC, DOS, IBM, Microsoft, PC history. Bookmark the permalink.

42 Responses to Why Does Windows Really Use Backslash as Path Separator?

  1. Richard Wells says:

    Beyond MS, a number of other CP/M programs adopted the same concept* and DRI codified the slash with CP/M 3 (aka Plus) though they also offered $ as an alternate command line separator. So it was pretty much a standard for the programs that IBM hoped would be quickly ported to PC-DOS.

    *John Elliott has a page on the history of the introduction of the slash to non-MS CP/M programs.

    IIRC, Unix wound up with its use of symbols based on the unusual terminals AT&T purchased in the early 70s. Those terminals lacked the full range of keys given to more conventional keyboards so the early bindings were for the keys available and keyboards with more keys were purchased, additional bindings were added. Copying Unix always seemed a bad idea because Unix was always working around mistakes of the phone company.

    Backslash was probably the best choice for a path since it was one of the few keys remaining on the IBM PC that didn’t require holding the shift key down. At least one of the DEC OSes used the bang (!) as a path separator which is inconvenient and would have been a lot worse had case sensitive file names been a more frequent default.

  2. Michal Necasek says:

    Yes. Just one minor note — CP/M Plus came out in 1983, about the same time as DOS 2.0.

  3. Nathan Anderson says:

    “However, there is historical evidence that Microsoft used TOPS-10, very strong circumstantial evidence that several aspects of DOS were influenced by TOPS-10 (including the use of a forward slash as a path separator)”

    …FYI I think you meant “…use of a forward slash as aN OPTION separator”

    (Command switches/options appear to be discussed on pages 1-10 and 1-11 of the TOPS-10 manual scan that you linked to. Also, somehow the A tag for the link to the manual only enveloped “TOPS-10 manua”, leaving the final ‘l’ of “manual” out in the cold.)

  4. MiaM says:

    Another reason for the back slash is that it at least somewhat resembles the forward slash. It was also a character that were most likely not used by most if not all existing software at the time. For 7-bit key codes it, together with bar, square brackets and curly brackets, were the characters that were repurposed for national non-US characters, like for example the swedish åäöÅÄÖ. Anyone using existing printers or who did interchange data with other older systems would most likely not want to use those characters if the did live in a non-english speaking part of the world.

    Btw the back slash were and still is quite awkward on some non-US keymaps. For example on a Swedish “extended” keyboard you have to press AltGr and press the key to the right of the 0 (zero).

    In the old days before “extended keyboards” you instead had to hold ALT down and press the key that corresponded to the US layout. Or press CTRL+ALT+F1/F2 to switch between US and national layout.

    There were a time period where the keyboard driver in MS-DOS stopped supporting using ALT to access the US layout when using a national layout, so if you didn’t have an “extended keyboard” you were stuck at having to press and hold alt, type 092 at the numerical keyboard, and releasing alt, to be able to produce a backslash. Or you could try to find a third party keyboard driver which actually did exist. The fact that third party keyboard drivers did exist says a bunch about how sucky DOS really was 🙁

    Side track: at some point in time DOS started to require the annoying mode con code prepare junk that didn’t do any good for the user and mostly just ate up precious memory. (Well, in theory it might had done something good, like perhaps changing the sort order for non-US-ASCII characters to follow the national sorting rules or similar, but at least for the countries where the PC were sold at in mid-80’s there were no real use for the code page junk).

    Richard Wells: What terminals did AT&T use at the time? There wasn’t that many to choose from…

  5. André says:

    Requiring mode con code prepare? Well… the setup program entered that in my config.sys/autoexec.bat, but I used to comment that out and it just worked, so requiring?

  6. Michal Necasek says:

    Thanks, corrected.

  7. Richard Wells says:

    @MiaM: The “Graphics Terminal II” PDP-7 used for the initial Unix versions used what looks to be a Teletype ASR-33 with a very limited character set. The character set in the Decwriter a few years later is larger and includes the vertical line used to denote piping.

    I can’t find a link to the articles complaining about the terminals in use at Bell Labs in the mid-70s. Sorry.

  8. Michal Necasek says:

    The ASR-33 is, from what I know, the grandfather of Unix-y terminals. The DEC VT100 (and follow-up models) was of course quite influential, but not early on, as it was only introduced in 1978.

    The name of Teletype Corporation lives on in ‘/dev/tty’.

  9. Rugxulo says:

    MiaM, you bring up an interesting point. (I’m American, but I’ve briefly dabbled in DOS i18n stuff.) It’s certainly a kludge, but it’s also not as dire as it sounds. Certainly, Michal knows all about NLSFUNC + COUNTRY + DISPLAY + KEYB + MODE + *.CP[XI] and the myriad of settings related to that. It’s quite a minefield, but it would make an interesting article! Well, for some of us!

    As far as typing certain problematic characters, you clearly need a better text editor (VIM?). They often have macros, abbreviations, digraphs or whatnot to help. (Heck, you could probably use sed in a pinch!) Even programming languages often have workarounds for that (C89 trigraphs “??/” or C94 diagraphs; Pascal’s “(.” and “.)” and ISO Modula-2’s “(!” and “!)”, etc. etc. etc. Yes, I know those aren’t all backslash, but you know what I mean, replacements did exist for some chars.). Again, I’m sure Michal knows more than enough about that as well.

    Of course, typing at the actual prompt can be annoying, too, certainly not helped by the default shell (or DOSKEY). I do realize that, and I recognize that writing a .BAT just to avoid that tedium is itself tedious. Still, even with better (compatible) replacements nowadays, we can’t go back in time. So it’s probably moot. But grievance noted! 🙂

  10. John Elliott says:

    DRI codified the slash with CP/M 3 (aka Plus) though they also offered $ as an alternate command line separator.

    What? CP/M Plus uses square brackets for options.

  11. Michal Necasek says:

    I have actually mostly avoided DOS NLS support. For programming, the US keyboard layout simply can’t be beat, so I just use that. When writing letters or reports, it’s different, but I did pretty much none of that in DOS (Windows 3.1 CE aka Central European was available fairly early on). I know just enough about NLSFUNC et al to understand that it’s complicated and eats precious memory.

    Also, when I started with DOS, there was very little or no official support. Homemade solutions were common for Czech, Polish, Russian, and I assume other languages (there were TSRs for keyboard and display support, with completely non-standard codepages). I would say that by the time IBM/Microsoft added proper support, DOS was not that relevant. Larger-scale introduction of PCs in the former Eastern Bloc more or less perfectly coincided with the introduction of Windows 3.0. And the installed base of old PCs and XTs was negligible, most machines were at least 286 with VGA.

    From what I remember, in Czechoslovakia the popular DOS-based word processor was T602 (homegrown), and that had its own NLS. Lack of NLS severely limited the use of WordPerfect, WordStar, MS Word, etc. In late 1992 (IIRC) there was already Windows 3.1 CE with official font/codepage support so that people could use WinWord, Ami Pro, or whatever. Between that and T602, no one much cared for the built-in DOS NLS.

    I don’t know what the situation was like in Poland or Russia. I do know that Russian MS-DOS 4.01 showed up in 1990, though I have no idea how much of an impact it had.

  12. John Elliott says:

    Larry Osterman says the DOS 3.x NLS support was ported by IBM from their mainframe systems, which might explain why it feels overengineered (such as the ‘number of pointers’ and ‘type of pointer’ fields in the CPI header, which are only ever set to 1).

  13. Stu says:

    On DOS NLS; I always found it annoying how UK DOS/Win9x installations were set up to use CP850 which replaced many of the box-drawing characters commonly used in DOS programs with a bunch of extra accented characters that aren’t used in English.

    Since the only non-ASCII character of significant use in the UK (the GDP symbol) is already in CP437, we didn’t need all our programs looking “corrupted” (and the associated conventional memory drain) at all. Stupid decision.


  14. Richard Wells says:

    John Elliott: Take a look at page 35 [Sec 5.1] of the CP/M Plus Users Guide Prelim Nov82 which lists a lot of special characters. Square brackets are for global and local options with parentheses to modify square bracketed options. Slash and dollar sign are listed as “option delimiters in a command line.”

  15. Andreas Kohl says:

    Bob Berner’s “How ASCII Got Its Backslash” should be a good starting point.

  16. Vlad Gnatov says:

    AFAIK, that COUNTRY + DISPLAY + KEYB… kludge was never popular in Soviet Union/ex-SU. I can’t say about Rusia or other ex-SU republics, but in Ukraine first was alfa from Moscow’s Academy of Sciences (1985 I think): http://old-dos.ru/index.php?page=files&mode=files&do=show&id=5413
    But it didn’t last long and was replaced with keyrus in a few years:

  17. Julien Oster says:

    In Germany on the other hand, as far as I could tell DOS NLS support was *very* popular, and used by pretty much anyone I remember. Non-IBM compatible PCs like the Siemens PC-D had that layout (or a similar one) by default in their BIOS/DOS variant.

    For quite a while during the 80s and 90s, I “sort of” knew the standard US layout, for those situations where I had to type something into a boot prompt, or on DOS boot disks where “keyb gr” (which ironically requires pressing one of the keys that got transposed: yz) was not available.

    Only much later, toward the end of the 90s, did I decide that the US keyboard was indeed much more convenient for programming, so I actively switched to that (without much trouble at all, surprisingly to me), and never looked back.

  18. Michal Necasek says:

    That matches my experience. German PCs running DOS were almost guaranteed to be configured with NLS, fully “Germanized”. I am familiar with the y/z transposition problem 🙂

    Because I prefer the US layout, it never really occurred to me how confusing it must be to e.g. boot DOS with F5 and have the keyboard behave differently. And for DOS it is bad, because colon, backslash, pipe, and other important keys are in different positions.

  19. Michal Necasek says:

    I wouldn’t call it a kludge, it was the official NLS implementation, warts and all.

    The timeline kind of explains it… in 1985, there were no IBM/MS DOS sales in Russia (or anywhere in the Eastern Bloc). IBM or Microsoft had no incentive and probably no good channels to develop NLS. But the need was there, and it was filled. After 1990, official support showed up, but at that point the homemade solutions were pretty entrenched.

    I don’t know about Russia but in Central Europe one issue was that the official support (code page 852) covered Polish, Czech, Slovak, Hungarian, and I think other languages in a single code page. That made total sense from a global perspective, but wasn’t compatible with the existing country-specific NLS solutions.

    Windows started with a clean slate and early support.

  20. Yuhong Bao says:

    In fact, Windows 1.0 predates DOS NLS support (in DOS 3.3)

  21. Richard Wells says:

    Windows uses ECMA-94 as the basis for the character set. ECMA-94 was drafted in 84 with ratification in 85. One code page for Western Europe, the Americas, New Zealand and Australia is simpler than the proliferation of code pages necessary with DOS. Windows added a set of code pages about the time that Windows 3.1 arrived to expand the character set for additional nations. Fortunately, Unicode got supported quickly enough to save me from having to implement code page craziness similar to what Word Perfect for DOS did.

  22. Michal Necasek says:

    I am not aware of proliferation of code pages with DOS. CP 850 (the DOS equivalent of ISO 8859-1) covered pretty much all of the Western languages. The problem (in my experience at least) is that it clashed with the “original PC” codepage 437, because some applications required the CP 437 graphical characters that CP 850 had to replace.

    NT definitely went for the more generic and ultimately much simpler approach with Unicode.

  23. Yuhong Bao says:

    I wonder what this “Canadian French” codepage was created for:

  24. Michal Necasek says:

    It looks like a slightly modified CP 437, with just enough changes to support Canadian French. Keeping the graphical symbols, but dropping e.g. German characters.

  25. Richard Wells says:

    The 86x code pages (except for Greek) all look to be business line variants keeping the line drawing characters. dBase developers cheer.

    https://www.aivosto.com/articles/charsets-codepages-dos.html seems to be a fairly complete list of the major code pages for DOS and if one goes elsewhere in the site, one can see a lengthy list of IBM code pages that explain why 437 is the initial DOS code page. I spent a little too much time after seeing it examining if 898 and 899 were ASCII matches to the EBCDIC code pages used by DisplayWrite.

    There were also code pages generated without an official IBM code page number, IBM code pages that shipped with applications, and specially modified code pages. WordPerfect did the last which caused problems with a number of other code page and font tools.

  26. Rugxulo says:

    CP850 isn’t quite ISO 8859-1, though. The resident i18n expert in
    FreeDOS is (or was? haven’t seen him lately) Henrique Peron (.br), so
    he’d be a great person to ask. He knows tons of stuff.

    As an Esperantist (lapsed??), I was interested (circa 2008) in ISO
    8859-3, aka Latin-3. (Six special chars, or twelve if you separately
    count uppercase. Though there are also 7-bit workarounds, which are
    fairly common.) So, from limited experience, I know a little. I’m not
    entirely sure since I know nothing of OS/2, but some useful code
    pages were (only?) from there. E.g. CP819 (“true” ISO 8859-1), 912, 913
    (Latin-3), 914, etc.

    DR-DOS 7.03 had (undocumented) CP853, but it was missing the lowercase
    “j^” char. I don’t think its KEYB supported E-o, though. And FreeDOS only
    officially supports CP853 (equivalent but still not 1:1 Latin-3
    compatible, but it has boxed chars which some apps obviously need to
    use) in EGA.CPX, and its KEYB supports it, too. There are third-party
    workarounds (ISOLATIN.CPI from Kosta Kostis’ website; some wimpy TSR
    on one .nl site) if you just had to use Latin-3. Or obviously use a
    supporting editor (Blocek, GNU Emacs, Mined). Or similar tool (iconv?
    recode? sed???). There is currently no COUNTRY.SYS support for E-o, so
    I don’t think NLSFUNC works (UCASE table or whatever).

    Though I never did much writing (moreso only reading lots of E-o print
    periodicals), so I only dabbled with this out of curiosity (since DOS
    is my favorite). Well, or used “x metodo” (or official, but less
    machine-friendly, “h metodo”).

    I’m sure you Euros 🙂 can understand or explain better than I can!
    Seriously, though, it would make a great article or two (but complex
    as heck, opening a huge can of worms). “Kia mondo!” (What a world!)

  27. Michal Necasek says:

    I thought everyone knew what CP 437 was 🙂 That’s what the IBM PC/XT/AT hardware implements, i.e. the only choice there was before codepage switching.

    It’s also interesting that because CP 437 does cover several languages, DOS had national keyboard support since version 3.0 (1984), long before switchable codepages.

  28. Richard Wells says:

    CP 437 may have been what was baked into the character ROM but changing the character set was done. IBM’s APL for the PC had to replace many characters with the APL specific characters. Not a formal code page implementation design but a similar technique.

    Some other IBM products had unique code pages assigned but I don’t know if the PC implementations thereof modified the character set to match. Not about to buy a 3278/79 hardware adapter just to find out if it supports the full 3270 EBCDIC code pages. I suspect DisplayWrite did since it was able to use and display all the characters from both the EBCDIC character and symbol code pages including some not available with CP 437.

    Code pages and related concepts were a lot more complex than any documentation dared to explain.

  29. Michal Necasek says:

    Except changing the character set was not done on MDA/CGA, it was only EGA and later that had loadable fonts.

    I would actually say that the code page concepts were simple, but the reality was insanely complicated because of the multitude of code pages and software packages each with its own ideas.

    I remember mid to late 1990s fighting with badly designed web pages that used some code page (typically whatever someone’s default in Windows was) and it was up to you to guess which one it was, because it never occurred to the designer that someone might read the web site with a different OS/browser/configuration, and their software didn’t bother marking the used encoding in the headers either. These days it’s so much better.

  30. Rugxulo says:

    GRAFTABL was used for CGA, no?

    Also, regarding “insanely complicated”, it can’t be that simple because no one’s ever happy. Surely you’re aware of ANSI C89’s locale.h (and the almost non-existent support, especially for DOS compilers) as well as the i18n changes in C94 (before C99). It’s as complicated as you make it (but most people don’t care).

    Oh, I was going to say that I think that some countries’ governments mandated official OS language support for themselves. So it’s not just a hobbyist wish or a convenience but rather a necessity. (Sorry if that sounds obvious or naive to you, but I never knew.)

  31. Michal Necasek says:

    Yes, though GRAFTABL was for graphics modes only. Not sure how useful it ever was.

    Natural languages are complicated (not that computer languages are any better), and that means i18n is complicated. For the most part, being able to type and display/print the characters one needs is what users really want. Sorting, currency characters or decimal separators, date formats, all that is useful but rarely essential. Luckily Unicode solved the biggest problems and UTF-8 solved most of the problems that UTF-16 created.

    Governments certainly mandated specific code pages and such. Which is really logical if you think about it, because without that, things would have gotten either a lot more complicated (with back and forth conversions required) or they just plain wouldn’t work (mangled characters, inconsistent sorting, and so on).

  32. Richard Wells says:

    Some corrections regarding my previous comments. The IBM APL for the PC ran in CGA graphics mode and installed a graphical font. The text mode special font was the non-IBM APL*Plus which apparently was supplied with a special replacement ROM. 3270 support at least in one of IBM’s PC implementations was done with a ROM that had multiple character sets. Memories from 30+ years ago do blur together.

    Olivetti used GRAFTABL to handle the character sets needed through Europe. The Olivetti MSDOS documentation provides an explanation in English of how it works and the necessary hot keys to switch back into US code pages plus all the special character entry chords. IBM products using methods similar to GRAFTABL but before GRAFTABL included the previously mentioned IBM APL and a utility authored by IBM Israel in 1983.

    I have followed some of the language and character set efforts. It often seemed that half the development was spent on obscure parts of the language that only the academic proposing it would ever use. I am happy that I started programming when people were willing to adjust the needs of the language to the limitations of the technology. No lower case, no diacritics, no problem. I managed to ignore most of the corporate proprietary proposed internationalization schemes and only had to deal with it when Unicode had solidified.

  33. Vlad Gnatov says:

    All that GRAFTABL do is loading the last 128 characters of 8×8 font in the memory
    and pointing int 1F vector to it. This allows int 10 functions 8,9,A (slow and therefore rarely used) to read/write these characters in the CGA graphic modes. Unfortunately, not many DOS programs worked in the graphic mode and those which worked (e.g. chiwrite, lexicon) used its own fonts and wrote directly to video memory, it was still slower that in the text mode (at least on PC/XT), but somewhat acceptable.
    Anyway, standard method to get cyrillic on MDA/CGA was to replace the video ROM chip.

  34. Michal Necasek says:

    Yes, I remember all-caps no-diacritics printouts, made by giant noisy printers that printed a whole line at a time. The funny thing is that no diacritics are much better than broken diacritics. And “broken” was definitely a period between “let’s ignore NLS” and “functioning NLS”.

  35. techfury90 says:

    Japanese DOS/Windows uses ¥ for the path separator instead of \. Sort of. It’s the same hex byte, but the JIS X 0201 character set used for the 1 byte portion of the SJIS charset/encoding has the ¥ glyph at the byte position for \. Keyboards also have a ¥ key instead of \ in both the old NEC PC-98 layout and the modern IBM OADG layout.

  36. Michal Necasek says:

    It’s as if they wanted to demonstrate the difference between a glyph and a code point.

  37. Rugxulo says:

    Just one obvious remark: “decimal separators [are] rarely essential”. Au contraire, mon ami! Big money has been lost when international transactions haven’t correctly noticed nor agreed upon where the comma goes (and some mistaking it for a period, which I think is common in Europe). Hey, would you loan me 5,000 dollars? It is my birthday, after all. 😉

  38. Rugxulo says:

    Actually, July 1 (07-01 to U.S.) is my birthday, but you silly futurists and your silly timezones …. DOS was always local time (unlike *nix preferring UTC) and two-second granularity in file time, bad for makefiles, but it’s not a huge problem on slow machines.

    Yeah, ambiguity is annoying, especially when trying to be technically correct. Even U.S. versus British (or other) English has many differences. It’s unavoidable, and I’m not complaining about complexity. But it shouldn’t be ignored or trivialized. (I know you agree, just saying.)

  39. Michal Necasek says:

    Sure, I understand that — I have to pay attention to that sort of thing all the time, commas and periods mean something else in my US bank account and in my German bank account.

    It’s just that the l10n support in the OS does not necessarily do all that much, especially when it helpfully offers to put the wrong character in.

  40. Michal Necasek says:

    There’s one interesting thing about the timestamps. When you have (say) a DOS floppy, the timestamps will be in local time from wherever the files were written. When you copy the files with a modern OS, the timestamps will get converted to UTC. How? Well, the modern OS literally just makes something up. This can lead to interesting issues when multiple copies of the same file end up with different timestamps depending on which route they took.

    I suspect that if the designers thought about it at all, they decided that there’s no point in asking the user what TZ the files were created in, because the user probably has no real clue either. And the number of people who even notice that there’s a problem is minimal.

  41. MiaM says:

    Codepage 850 certainly isn’t Latin-1, although it might contain all or at least most of the chars in Latin-1, but placed where it makes it more compatible to Codepage 437.

    Re time stamps: Isn’t the time stamp on “dos media”, written by for example modern Windows, using whatever timezone the computer is set to be in? (Or perhaps whatever timezone the current user has set the computer to)?

    All this stuff re separators, date formats e.t.c. is in theory a good thing, but in practice it tends to cause problems because someone somewhere didn’t account for their existance while someone else took account for them, and you end up with software trying to interpret a time stamp in one format as if it was in another time format. Then the user ends up having to set the regional settings to US even though they are far away from the US.

    Maybe the worst thing about this is that it usually ends up this when a large non-it-company has a contract with some large IT company to provide some custom software, and any changes including fixing obvious bugs/blunders will be charged to the customer. (This is also the kind of crappy business ethics that makes the internal IT support within a large company to not escalate the fact that “move” rather means “copy, and then delete but only if it was in the inbox” in the sorting rules in Microsoft Outlook 97).

    Which countries/governments were any successful in mandating certain character encodings and similar stuff? At the time it would reasonaby had been brought up to that level, we were kind of stuck with all the code pages for DOS on a PC, the unique Macintosh 8-bit char set, and also the ISO Latin versions (where -1 was the default on some computers even though you might not had been able to change char set, for example Amiga and IIRC the “DEC Multilingual character set” on the later VT terminals).

  42. Miëtek Bak says:

    There was a certain amount of proliferation of DOS code pages for the Polish language, including multiple variants of the so-called Mazovia encoding.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.