Warning: Long post!
After having good luck with rebuilding core PC DOS 1.1 from source code, I thought I’d do the same with the DOS 2.11 source code released by the CHM. What follows is largely a collection of notes that I wrote down while banging the released source code into shape. That turned out to be a lot harder than building DOS 1.1 for two reasons.
One is that the released DOS 2.11 source code is a lot more extensive and includes source code for numerous utilities (CHKDSK, DEBUG, EDLIN, SYS, etc.). The other, bigger reason is that the CHM unfortunately created a bit of a mess when releasing the code and sorting out the pieces was not trivial.
The CHM placed all DOS 2.x related files in just two directories, ‘v20object’ and ‘v20source’. It is now clear that the files came from at least three distinct sources:
- MS-DOS 2.00 OEM distribution disks
- MS-DOS 2.11 source code of unknown provenance
- Miscellaneous debris such as WordStar 3.20 overlay files
Fortunately for me, Jeff Parsons has done a lot of legwork reconstructing the DOS 2.0 OEM distribution disks. These disks were clearly an early version of what Microsoft later called OAK (OEM Adaptation Kit). The disks contain generic DOS 2.0 binaries, with the notable exception of IO.SYS which had to be supplied by the OEM. There is example “skeletal” IO.SYS source, together with source for PRINT.COM that OEMs might modify, and example OEM source module for FORMAT.COM which OEMs had to write.
There are also development tools (MASM and CREF) on the disks, together with LINK which is part of the DOS 2.0 distribution binaries and was meant to be shipped to end users. And last but not least, there’s fairly extensive documentation that was meant to aid OEMs in adapting DOS 2.0 to their hardware.
As far as can be ascertained, the CHM released complete and unmodified (except for timestamps perhaps) contents of the DOS 2.00 OEM distribution disks, with most but not all of the files stored in the v20object directory. So far so good.
The easy part is the debris, the files WSBAUD.BAS, WSMSGS.OVR, and WSOVLY1.OVR. Those just don’t belong and are part of WordStar. There’s also an odd collection of .txt files in the v20source directory; those are exact copies of .DOC files in the v20object directory. The likely intention was to not hurt Windows’ feelings by presenting it with the difficult task of recognizing that files with the .DOC extension are plain ASCII files. At any rate, the .txt files are completely redundant.
The hard part is the actual MS-DOS 2.11 source files. First of all, there are several “duplicates”, for example DOSMAC.ASM and DOSMAC_v211.ASM. CHM never explained what that’s about, but it’s apparent that DOSMAC.ASM is an older version from the MS-DOS 2.00 OEM distribution disks, while DOSMAC_v2.11.ASM is a newer DOS 2.11 file which belongs with the rest of the source code.
Everything is lumped into a single directory, with no hint how to build anything, except for two files named DOSLINK and COMLINK. Those are clearly LINK input files which show how to link IBMBIO.COM/MSDOS.SYS and COMMAND.COM. Those are very extremely useful because they show exactly what goes into the two largest DOS 2.11 components and which order the object files are linked in (which naturally matters).
Where Did This Come From?
The ever unreliable Wikipedia claims that “Microsoft made the code to […] a mixture of Altos MS-DOS 2.11 and TeleVideo PC DOS 2.11 available to the public”, which is at best half right, and probably not even that. The part about TeleVideo Personal Computer DOS 2.11 is not wrong but it may be misleading (more on that below), and the Altos bit is a major misunderstanding. The DOS 2.0 OEM disks came with a file called SKELIO.ASM, which is titled “IO.SYS for the ALTOS ACS-86C.” — in other words, Microsoft provided source code to IO.SYS for Altos ACS-86C machines as an example of IO.SYS implementation, something that OEMs needed to adapt to their own hardware. This was DOS 2.0, not 2.11, and the OEM distribution disks were obviously not at all OEM specific.
The bulk of the ‘v20source’ files is indeed the source code to DOS 2.11 which had something to do with TeleVideo. It is apparent that in late 1983, the IBM PC was important enough that TeleVideo wanted their COMMAND.COM built with IBMVER set true and MSVER set false—two macros controlling conditional compilation.
The catch is that a COMMAND.COM built with ‘IBMVER’ set to true would by default print an IBM copyright message starting with “The IBM Personal Computer DOS”. TeleVideo clearly couldn’t use that and the source files (TDATA.ASM and UINIT.ASM) were modified to say TeleVideo instead of IBM.
Sadly, this was done in the era before the PC/AT, when most PCs had no real-time clock. And so we have most of the COMMAND.COM source file dated 08/18/1983, but the ones obviously modified for TeleVideo are dated 01/01/1980 because of course the programmer responsible did not bother setting the date when prompted to do so at system start-up.
Now, it would be tempting to assume that all files dated 01/01/1980 must have been modified for TeleVideo. Sadly, a single look at the MS-DOS 2.00 OEM distribution disks suffices to convince us that Microsoft was sloppy and some (only some!) of the source files distributed by Microsoft were likewise dated 01/01/1980.
The upshot is that there’s no easy way to tell which files might have been modified for TeleVideo based on the timestamp alone. It is obvious some source files were modified but not how many. That’s assuming the files with 1983 timestamps are unmodified, which is of course not a given.
It’s also not clear who modified the files. The only files with obvious TeleVideo modifications are part of COMMAND.COM, which normally wouldn’t need to be modified by OEMs. Maybe TeleVideo had the COMMAND.COM source code, but it’s also possible that Microsoft simply modified a couple of strings and built a custom COMMAND.COM for TeleVideo without ever giving anyone the source code.
The latter is in fact quite likely for one simple reason: The source code released by the CHM contains no TeleVideo hardware specific code. There’s no IO.SYS source code, no OEM module for FORMAT, no SYS adaptations, nothing. There’s a lot of source code for utilities that OEMs could not normally modify (CHKDSK, EDLIN, DEBUG) and none of the source code that OEMs would need to write. It is therefore likely that the code came from some Microsoft archive and the TeleVideo connection is a bit of a a red herring.
In summary, the files published by the CHM are a mixture of MS-DOS 2.00 OEM distribution disks and generic DOS 2.11 source code with COMMAND.COM modified to say TeleVideo instead of IBM.
Although we know very little about the precise origin and lineage of the released DOS 2.11 source code, we can guess a thing or two if we compare it against extant OEM DOS 2.x distributions. The obvious target is a disk with TeleVideo MS-DOS 2.11 that’s floating around, although someone unfortunately managed to lose its IO.SYS and MSDOS.SYS.
Naturally before comparing anything, the source code needs to be built. Which is not totally trivial, but can be done. More on that below.
CHM vs. TeleVideo
Let’s see how the rebuilt DOS 2.11 source code provided by the CHM compares with the lone TeleVideo DOS 2.11 disk.
First of all, note that .EXE files tend to exhibit a 4-byte difference in the header at offsets hex 12-13 and 1C-1D. The word at offsets 12h is a checksum, but there’s no agreement on what the word (or dword) at offset 1Ch actually is. Some claim it is “overlay information”, others indicate that it has varying uses in practice. Microsoft used to say that it is “offset of symbol table file”. At any rate, this has no bearing on the functionality of DOS 2.0 EXE files.
After some experimentation, I was able to confirm that even the same version of LINK working with the same input files on the same machine produces different executable files depending on where in memory it is loaded etc. Most likely the word at offset 1Ch is essentially uninitialized data, but it is also reflected in the checksum at offset 12h. At any rate, this four-byte difference can be considered a linker artifact and can be safely ignored when comparing .EXE files.
Back to the actual comparisons:
- CHKDSK.COM, COMMAND.COM, DEBUG.COM, EXE2BIN.EXE, FC.EXE, FIND.EXE, MORE.COM, and RECOVER.COM all match TeleVideo binaries
- EDLIN.ASM has ‘roprot equ true’ even though it’s false in EDLPROC.ASM; setting ‘roprot’ to false in both files produces EDLIN.COM matching TeleVideo’s
- DISKCOPY.COM is different, TeleVideo clearly wrote their own
- PRINT.COM and SORT.EXE do not match but are somewhat similar
- SYS.COM/FORMAT.COM are very different, TeleVideo clearly wrote their own
Interestingly, PRINT.COM built from the CHM source code is almost identical to the one shipped with Compaq DOS 2.11. There is a one byte difference — the DOS version check in the CHM source code (DOSVER_HIGH) is for version 2.11, while Compaq checks for 2.10.
RECOVER.COM on the same Compaq DOS 2.11 disk has the string ‘Vers 1.51’ in it, while the CHM-provided RECOVER.COM has ‘Vers 1.50’, suggesting that Compaq used slightly newer DOS as a basis. The timestamps also suggest as much, since the newest source file provided by the CHM is dated 10/20/1983, while Compaq DOS 2.11 has files dated 5/30/1984, about 7 months newer.
The SORT.EXE utility built from source exhibits additional differences in the EXE header when compared with the TeleVideo version, even when the actual code/data within the file matches exactly. Apart from the above mentioned differences at offsets hex 12-13 and 1C-1D which are irrelevant, there are also differences in the minimum and maximum paragraph allocation fields (hex 0A-0B and 0C-0D). The MS-DOS 3.3 OAK reveals that after it’s built, SORT.EXE is processed with the EXEFIX utility included in the OAK. This sets the minimum and maximum paragraph allocation in the EXE header to 1.
Clearly the SORT.EXE utility that at least some OEMs shipped with MS-DOS 2.11 was run through the same or equivalent EXEFIX utility. This is completely missing from the CHM source code release and since there are no build recipes, there isn’t even any hint that such a utility existed. However, OEMs must have used it because such EXE file header cannot be produced by LINK alone.
In the interest of historical accuracy and a perfect match, I always try to find tools old enough that they could plausibly have been used at the time. While that was not too hard when reconstructing PC DOS 1.1, it turned out to be quite tricky with DOS 2.11.
Reading the documentation provided with the MS-DOS 2.00 OEM distribution reveals interesting clues. README.DOC provided on the OEM disks contains the following slightly cryptic note: COMMAND.ASM is currently too large to assemble on a micro. There is another clue in the DOS 2.11 CHKMES.ASM file: The DOST: prefix is a DEC TOPS/20 directory prefix. Remove it for assembly in MS-DOS assembly environments using MASM. Except there is no instance of ‘DOST’ in that file except in the comment. But in two other files, GENFOR.ASM and PRINT.ASM (older files from the DOS 2.0 OEM disks) there is an odd looking ‘INCLUDE DOST:DOSSYM.ASM’ directive.
The upshot is that at least in the days of DOS 2.0 (1982 or early 1983), Microsoft built DOS on a DEC TOPS/20 system and not on a PC. We can guess that things changed during DOS 2.1 or 2.11 times, sometime in 1983. For example EDLIN.ASM contains the following comment dated 7/23/83: Split EDLIN into two seperate [sic] modules to allow assembly of sources on an IBM PC.
As an aside, building DOS 2.0 on top of PC DOS 1.x would have been exceedingly painful. The source code for just the DOS kernel alone is about 400 KB in size, well beyond the capacity of the 320K floppies supported by DOS 1.1, or even the 360K floppies supported by DOS 2.0 for that matter. The time required to assemble all that code on an 8088 CPU was also quite significant. DOS 2.0 at least solved the storage problem thanks to hard disk support.
What is also abundantly clear is that some of the DOS source files push old MASM versions beyond their limits. Ancient MASM either runs out of memory or hangs/crashes.
The MS-DOS 2.00 OEM disks come with MASM 1.10, which is a useful starting point but not more. OEMs may have used MASM 1.10 to adapt MS-DOS 2.0 for their machines, but they would not build the bulk of DOS 2.0 with it. They’d build IO.SYS, FORMAT.COM, and perhaps whatever other utilities they wanted to provide, but not those much larger projects like MSDOS.SYS or COMMAND.COM.
The linker provided on the MS-DOS 2.00 OEM disks (LINK.EXE version 2.00) on the other hand doesn’t cause any trouble (apart from the annoying EXE header differences). It’s only MASM that is so finicky. Note that LINK likes to complain that “there was 1 error detected”, namely no STACK segment. This is not a problem for .COM files and can be safely ignored.
It is worth mentioning that OEMs clearly built DOS 2.11 with LINK version 2.00 or 2.01. Both older (1.xx) and newer (2.30 and above) versions of Microsoft’s LINK produce more mismatches. It is also notable that LINK 2.00 and 2.01 produce different executables, but the only difference is again in the four EXE header bytes previously mentioned.
Initially I was able to build most of the DOS 2.11 source code with IBM MASM 2.00 (MASM.EXE dated 7-18-84, file size 76,544 bytes). This MASM version is slightly too new, though only slightly. While this version in most instances produces identical code to whatever Microsoft used, it’s just different enough that Microsoft must have used something a bit older.
For example, when building COMMAND.COM from the source provided by the CHM, the result is the same size as COMMAND.COM on the TeleVideo DOS 2.11 disk, but not identical. The actual difference is not big but there is one.
MASM Version Zoo
In the TCODE5.ASM module, on line 512 there is the instruction ‘CMP [SINGLECOM], 0FFF0H’. IBM MASM 2.00 translates that (at offset 28CC in the resulting binary) as 83 3E BC 09 F0 (‘CMP W,[009BC],0F0’), but in the actual TeleVideo COMMAND.COM file, sadly with a destroyed timestamp, the same instruction was translated as 81 3E BC 09 F0 FF (‘CMP W,[00BC],0F0’). That is in fact what for example MASM 1.10 produces.
In other words, the newer IBM MASM 2.00 is a little cleverer. It knows that the F0h constant can be sign extended to produce FFF0h, thus saving one byte of opcode. This is a useful marker which provides a clue as to what MASM version Microsoft may have used.
IBM MASM 2.00, Microsoft MASM 3.00, 4.00, and later versions all produce the shorter instruction encoding. MASM versions 1.x produce the longer encoding up to and including MASM 1.25 (1983), but MASM 1.27 (1984) generates the shorter encoding. (In general, MASM 1.27 seems to be significantly more different from 1.25 than what the version number difference might suggest.)
MASM 2.04 (1982) runs out of memory when assembling TCODE5.ASM, but otherwise generates the older, longer encoding.
The real takeaway is that MASM version numbers prior to 3.00 or so are completely meaningless. Microsoft MASM up to 1.25 and 2.04 exhibits the old behavior when dealing with the FFF0h immediate, while MS MASM 1.27 and IBM MASM 2.0 show the new behavior.
It is probably not a coincidence that MASM 1.27 and IBM MASM 2.0 display 1984 copyright dates, while the other versions show 1983 or older.
The probable chronological order of MASM versions before 1985 is 1.00, IBM 1.0, 2.04, 1.10, 1.12, 1.25, IBM 2.0, 3.00, 1.27. No, it does not make any sense.
MSDOS Segment Order
I encountered an odd problem with the file MSDOS.ASM. My first attempt was again to build it with IBM MASM 2.00. The assembly succeeded with no errors or warnings, but the resulting MSDOS.SYS was completely nonfunctional and immediately hung. It turned out that IBM MASM 2.00 generated the segments in the wrong order. Nasty.
MASM 1.10 (from the DOS 2.00 OEM disks) just hung when assembling the file.
On the other hand, MASM 1.12, 1.25, 1.27, 3.00 and later all assembled MSDOS.ASM without problems and produced the correct segment order.
In the end I determined that the entire DOS 2.11 source code can be successfully built with MASM 1.25 from 1983, about the same vintage as the source code. Whether that was actually the version used is anyone’s guess at this point, but it easily could have been.
Another stumbling block was the file MISC.ASM in the DOS kernel. This file fails to cleanly assemble with any version of MASM, but the errors it produces vary wildly across MASM versions.
The troublemaker is an instruction on line 432 of MISC.ASM: ‘TEST BYTE PTR [SI+SDEVATT], ISSPEC’. The ISSPEC symbol is nowhere to be found, which tends to cause an impressive cascade of phase errors in older MASM versions.
Checking the OAK for DOS 3.21/3.3 reveals the cause of the problem. Newer OAKs have the following in DEVSYM.INC:
ISSPEC EQU 0010H ;Bit 4 - This device is special
The DEVSYM.ASM from the CHM instead has:
ISIBM EQU 0010H ;Bit 4 - This device is special
The cause of problem is clear: Somehow the CHM only provided the DEVSYM.ASM from the DOS 2.00 OEM kit, not the DOS 2.11 version of DEVSYM.ASM matching the rest of the source code. Microsoft must have renamed the ISIBM equate to ISSPEC between DOS 2.0 and 2.11. Editing DEVSYM.ASM and changing ISIBM to ISSPEC solves the problem.
One thing that the CHM did right was providing the DOS source files as a ZIP archive. Naive people might think that checking the files into a git repo is all it takes, but that would be a terrible mistake, for two reasons. One is that the timestamps would be lost, and the other is that the source files only look like text files at first glance, but they really are binary files.
For some reason, many of the source files are padded to a size that is a multiple of 256. The files are padded with null characters… mostly. Some of the files (e.g. TCODE3.ASM) have an ASCII CR thrown in with the nulls, not followed by LF as it is elsewhere in the file.
Some files, such as SYSINIT.ASM, have junk null characters at the end, but their size is not a multiple of 256 (or even a multiple of 8). The null characters in SYSINIT.ASM are in fact followed by a sequence of CR, LF, CR, LF, ESC. Most likely the file originally had trailing nulls but was later edited and ended up with a newline (and escape) at the end.
Then there are files that are just plain bizarre. TDATA.ASM contains four instances where CR is not followed by LF but rather 8Ah, which is LF (0Ah) with the high bit set. MASM appears to strip the high bit and does not mind at all. I do not know what significance this has, if any, but it certainly does not look like random corruption.
At least some of the source files may have come from a DEC TOPS-20 system or perhaps some other midrange computer. They may have been copied indirectly, using some kind of remote link. Whatever it was, the source files do not look like text files created on a PC, where one would expect exact file sizes and likely ESC at the end. However, some of the source files were almost certainly edited on a machine running DOS.
While MASM is extremely forgiving, many DOS-based text editors are not and may modify the source files in undesirable ways when editing.
The CHM lumped all the source files into a single directory. It is unclear whether that was how the files were originally built or not. In later DOS OAKs (3.21, 3.30) there’s a sensible hierarchical structure, but that’s hard to achieve with an old MASM for one trivial reason: Before MASM 4.00 (1985), there was no way to specify an include path.
Now, this problem can be easily worked around using the APPEND utility… but that did not exist in the DOS 2.x days. The APPEND utility only started shipping with DOS 3.3 (1987), although it originally appeared as part of the IBM PC Network Program (1985).
Maybe all the files were really shoved into one gigantic directory, or maybe I’m missing something.
The source code provided by the CHM does not allow building a functional FORMAT.COM; as mentioned above, there is no OEM specific code, and FORMAT requires an OEM-provided module that’s typically called OEMFOR.ASM. Said module has to provide several routines: INIT, DISKFORMAT, BADSECTOR, DONE, and WRTFAT, plus miscellaneous variables.
I decided to take PC DOS 2.1 FORMAT.COM as a basis and reconstruct OEMFOR.ASM based on that. It turned out that when IBMVER is defined in the CHM-provided source code and OEMFOR.ASM is reconstructed, the result is an almost perfect match for PC DOS 2.1. There is one byte difference at offset 16h in the file. For reasons that are not obvious, the source code defines DOSVER_HIGH as 020Bh (2.11) while PC DOS 2.1 defines it as 0200h (2.0). The upshot is that normally FORMAT.COM would require DOS version 2.11 or higher, but IBM’s version requires DOS 2.0 or higher. The FORMAT.COM binary shipped with PC DOS 2.1 could have been built from different/modified source or it could have been patched after building.
While reconstructing OEMFOR.ASM, I learned that IBM’s FORMAT.COM uses an unpublished interface to the IBMBIO.COM module. IBM’s FORMAT looks at the very first word of loaded IBMBIO (at 70:0) and takes that to be an offset to a hard disk table internal to IBMBIO. There are BPBs of up to two hard disks which FORMAT uses to obtain hard disk geometry.
I also learned that IBM’s FORMAT.COM is a bit lazy and when it finds any problem (a track that won’t format or verify without error), it reports the entire track as bad and does not attempt to report individual bad sectors (which the generic format code can deal with). Back in the day, that was motivation for cleverer third-party utilities.
The documented FORMAT /B switch is interesting in that it creates a floppy with 8 sectors per track (either single- or double-sided) which is not bootable but can be made bootable under either DOS 1.x or 2.x using the SYS command. FORMAT creates a disk with “bogus” IBMBIO.COM (1,920 bytes) and IBMDOS.COM (6,400 bytes) big enough for PC DOS 1.1. It’s not big enough for both IBMBIO.COM and IBMDOS.COM in PC DOS 2.x, but it’s more than enough for IBMBIO.COM, and in DOS 2.x IBMBIO.COM can load a non-contiguous IBMDOS.COM.
There’s also an interesting generic /O switch which was not documented by IBM but was documented by Microsoft: The /O switch causes FORMAT to produce an IBM Personal Computer DOS version 1.X compatible disk. The /O switch causes FORMAT to reconfigure the directory with an 0E5 hex byte at the start of each entry so that the disk may be used with 1.X versions of IBM PC DOS, as well as MS-DOS 1.25/2.00 and IBM PC DOS 2.00. This switch should only be given when needed because it takes a fair amount of time for FORMAT to perform the conversion, and it noticably[sic] decreases 1.25 and 2.00 performance on disks with few directory entries.
This refers to exactly the one difference between PC DOS 1.1 (aka DOS 1.24) and the released MS-DOS 1.25 source code: Version 1.25 (and 2.x) stops searching a directory when it encounters an entry starting with zero, while older versions do not and all unused/deleted entries must start with 0E5h.
Unsurprisingly, IBM’s FORMAT.COM includes boot sectors for both DOS 2.x and 1.x in order to create disks that can be made bootable under DOS 1.x.
As with PC DOS 1.1, I set out to reconstruct IBMBIO.COM source code. Unlike the DOS 1.x case, I was not able to reproduce an identical IBMBIO.COM file.
The reason is that unlike DOS 1.x, DOS 2.x IBMBIO.COM/IO.SYS includes a relatively large module called SYSINIT provided by Microsoft. This was normally provided to OEMs in the form of an object file (SYSINIT.OBJ), as seen on the MS-DOS 2.0 distribution disks.
The SYSINIT module was not hardware specific but it was responsible for initialization that needed to be performed before the DOS kernel (IBMDOS.COM) could run. SYSINIT was also responsible for loading IBMDOS.COM and for processing CONFIG.SYS and loading device drivers.
The trouble is that SYSINIT.ASM provided by the CHM in source form is too new for PC DOS 2.1. It notably includes support for the COUNTRY statement in CONFIG.SYS, which was not part of PC DOS 2.1.
There’s also SYSINIT.OBJ provided on the MS-DOS 2.0 OEM disks, but that is not suitable either because it was built with IBMVER set to FALSE and MSVER TRUE. One difference is that the MSVER variant of SYSINIT calls a function called RE_INIT (provided by the OEM) at the end of its initialization phase, while the IBM variant has no such function at all. Presumably this was something OEMs other than IBM needed.
Combining an OEM BIOS module matching PC DOS 2.1 IBMBIO.COM with the SYSINIT.ASM from DOS 2.11 fortunately produces perfectly satisfactory results, and I was able to get a 100% match on the reconstructed OEM specific part of IBMBIO.COM.
Puzzling out IBMDOS.COM/MSDOS.SYS
Building a DOS kernel matching some known existing binary turned out to be remarkably difficult. Not least because somehow the CHM “forgot” one source file, IO.ASM. Fortunately, John Elliott already reconstructed it, saving me quite a bit of boring work. Thanks!
My first IBMDOS.COM target was PC DOS 2.1 but I gave up after realizing that IBM must have used somewhat different and almost certainly older source code with numerous minor differences.
Reproducing IBMDOS.COM from Compaq DOS 2.11 seemed more promising. The source code matches what Compaq shipped fairly closely, but there are major differences in the initialization code. For reasons that are very unclear, Compaq’s IBMDOS.COM includes quite a bit of hardware specific initialization code that really should have been in IBMBIO.COM.
Compaq also has additional code in the Ctrl-C logic (CTRLC.ASM) which invokes INT 17H. Again, this is code that should be in IBMBIO.COM. Compaq was obviously able to modify DOS significantly more than a typical OEM could, and the modifications suggest that unlike other OEMs, Compaq probably had the full DOS source code.
It is also notable that unlike the majority of MS-DOS 2.11 OEMs, but like IBM, Compaq built the DOS kernel with the IBM switch set to TRUE and MSVER set to FALSE.
Given the unexpected amount of hardware specific code in Compaq’s IBMDOS.COM and complete lack of TeleVideo’s IBMDOS.COM, I then decided to reproduce a MSDOS.SYS from one of the other OEM MS-DOS 2.11 releases.
After checking a couple of OEM DOS 2.11 releases (Corona, Eagle, Tandy, Wyse) I realized that many of them have near-identical MSDOS.SYS, with a file size of 17,176 bytes (note that in some cases, OEMs call the file IBMBIO.COM; that is not relevant).
There are interesting differences between those releases. For example Eagle and Wyse differ in one single byte at offset 5D5h. Eagle clearly didn’t want the ‘HEADER’ message to be displayed and set the first byte of the sign-on message to ‘$’, probably through binary patching.
Tandy and Wyse shipped 100% identical MSDOS.SYS.
Corona’s MSDOS.SYS exhibits two differences: At offset BF2h, Corona has 3Bh instead of FFh. This is the ‘OEM number’ assigned by Microsoft which most OEMs clearly didn’t bother with. Note that Microsoft documented (in DOSPATCH.TXT) how to patch the OEM number in an existing MSDOS.SYS. At offset 3639h, there is a difference in the ‘CANCEL’ character defined in STDSW.ASM (Corona sets it to 18h, the others to 1Bh).
While trying to produce a MSDOS.SYS matching the OEM releases, I kept stumbling over the IBM and MSVER defines. I just could not figure out how they should be set because no combination produced satisfactory results.
Then I finally realized that the DOS 2.11 OEM distribution kits almost certainly shipped MSDOS.SYS in the form of object files, only DOSMES.ASM was likely provided in source form. The object files would have been built with IBM set to FALSE and MSVER TRUE. But OEMs could easily build DOSMES.ASM with the defines flipped around.
So I tried that… and bingo! If all source files are built with IBM FALSE and MSVER TRUE, while only DOSMES.ASM is built with IBM TRUE and MSVER FALSE, the resulting MSDOS.SYS is nearly identical to the OEM files. It’s identical with Corona MSDOS.SYS except for the OEM number, and it only differs from Tandy and Wyse MSDOS.SYS in the previously mentioned ‘CANCEL’ character at offset 3639h (the CHM-provided source builds it as 18h, the others have 1Bh). I consider that a success.
There is one curious difference between DOS built with IBM set to TRUE vs. FALSE. In the IBM variant, the code for the EXEC system call (INT 21h/4Bh) is built into COMMAND.COM while the non-IBM variant has it in MSDOS.SYS. The rationale is unclear, except the “IBMVER” style EXEC matches PC DOS 1.x where EXE file loading logic resided in COMMAND.COM.
The upshot is that an IBMDOS.COM built with IBMVER set to TRUE had better be matched with a COMMAND.COM also built with IBMVER set TRUE, or the EXEC functionality will be missing.
There is a similar dependency with IBMBIO.COM/IO.SYS; if the DOS kernel is built with MSVER set to TRUE and includes EXEC logic, the BIOS SYSINIT module can use it to load COMMAND.COM. But when IBMDOS.COM is set with IBMVER set TRUE, IBMBIO.COM must include its own minimal EXEC implementation. A BIOS module built with IBMVER can be used with a DOS kernel built with MSVER, but not vice versa.
Microsoft vs. IBM
For reasons that may be lost to the mists of time, Microsoft very early on started maintaining those two versions of DOS, which might be called IBM style and Microsoft style. During building, the desired version was typically selected by defining either IBMVER or MSVER, as previously mentioned.
In some cases, the IBM version included PC hardware specific logic, such as timer code in PRINT.COM or interrupt controller tweaks in DEBUG.COM. In some cases, the code was adapted to IBM PC conventions, such as the use of function keys for line editing in the DOS kernel.
Some of the differences were rather non-obvious, like placing the EXEC functionality into either MSDOS.SYS (Microsoft style) or COMMAND.COM (IBM style) as detailed above. It is likely that Microsoft considered the MS-style behavior sensible, but IBM had some reason to insist on the IBM-style variant.
Most OEM releases of MS-DOS 2.11 were built Microsoft style, and that’s also what Microsoft provided on OEM distribution disks (clearly visible in the case of the MS-DOS 2.00 OEM distribution disks provided by the CHM). Compaq was a notable exception and built their DOS IBM style. TeleVideo likewise used IBM-style COMMAND.COM and other utilities (and presumably the DOS kernel, too, even if that has not been preserved).
As noted above, OEMs liked to mix things up. At minimum Corona, Eagle, Tandy, and Wyse all built MSDOS.SYS (whether they named it MSDOS.SYS or IBMDOS.COM) with the DOSMES module assembled in the IBM style.
Rather strange is the case of DEBUG.COM. At least Corona, Eagle, and Tandy all shipped identical DEBUG.COM with the SYSVER equate set to TRUE in DEBMES.ASM, even though the rest of the code was built with SYSVER set FALSE. As a result, the OEM versions of DEBUG.COM included two redundant messages (BADDEV and BADLSTMES) which could never be shown.
The IBM style version of MORE.COM used 25 lines, while Microsoft style used 24 lines (recall that IBM’s 25-line screens were unusual, with 24-line terminals being standard at the time). The IBM version of MORE.COM queried the screen width from the BIOS (INT 10h/0Fh). The versions also differed in control character handling: IBM style MORE.COM printed them except for BEL, Microsoft style did not print them at all.
By changing the IBMVER/MSVER constants, it is possible to build binaries that are an extremely close match for e.g. Tandy 1000 MS-DOS 2.11 (files dated 10/20/1984) from otherwise unmodified source code provided by the CHM.
It is apparent that over time, as OEM hardware trended towards a high degree of PC compatibility, IBM-style DOS became dominant. But in the MS-DOS 1.x and 2.x days, OEMs were much more likely to ship Microsoft-style DOS and OEMs like Compaq who desired a high degree of IBM compatibility were the exception.
Comparing the DOS 1.1 source with DOS 2.11 it is obvious that DOS 2.0 was a very major update and almost the entire core of the operating system was either heavily modified or written from scratch.
The list of user-visible changes was accordingly quite significant. Hierarchical directory structure, support for hard disks, handle-based file I/O modeled on UNIX, environment variables, I/O redirection, loadable device drivers, system configuration via CONFIG.SYS—those were all big changes, largely designed to take DOS further away from CP/M and much closer to UNIX.
On the source code level, it’s apparent that DOS 2.0 additions were all developed with the MASM assembler in mind. The code relies on numerous not entirely trivial macros which do not necessarily make the source any easier to understand (an echo of C++ templates). There is a clear trend away from old upper-case only assembly code with short identifiers and towards lower case code with mixed-case and sometimes quite long (over 20 characters) identifiers.
It’s probably also fair to say that DOS 2.0 was the last major rewrite of DOS. In many ways, DOS 2.0 is closer to DOS 6.x than it is to DOS 1.x. There were many changes and improvements since then, but nothing even remotely approaching the level of fundamental changes that occurred between DOS 1.x and 2.0.
Putting It All Together
As ought to be apparent from the preceding paragraphs, massaging the source code provided by the CHM into a buildable and functional form is not a trivial task, but it can be done, and the result is here. Here’s what I did with the DOS 2.11 source files provided by the CHM:
- Organized source files into a directory structure that matches DOS 3.21/3.3 and later
- Added John Elliott’s reconstructed IO.ASM/IO2.ASM (via pcjs.org), merged into a single and slightly reduced IO.ASM
- Duplicated source for Microsoft style DOS into parallel directories (MSDOS vs. DOS, CMDMS vs. CMD)
- Replaced far too broken MASM 1.10 with MASM 1.25
- Added EXEFIX.EXE from DOS 3.3 OAK (used for SORT.EXE)
- Kept LINK.EXE and EXE2BIN.EXE provided by the CHM
- Added batch files to build source files, in either IBM (MK.BAT) or Microsoft style (MKMS.BAT)
- Reconstructed OEM portion of IBMBIO.COM and FORMAT.COM to match PC DOS 2.1
- Made a handful of trivial changes to the CHM-provided source code, as detailed above
The build environment is not self-hosting due to the dependency on the APPEND utility which was not available in DOS 2.x days. The source was successfully built on PC DOS 2000, in Windows XP, and in 32-bit Windows 7. It should build in any DOS 3.21/DOS 3.3 or newer environment with functioning APPEND utility.
The result is a limited version of DOS 2.11. Included is core DOS, i.e. IBMBIO.COM, IBMDOS.COM, COMMAND.COM, as well as the following utilities: CHKDSK, DEBUG, DISKCOPY, EDLIN, EXE2BIN, FC, FIND, FORMAT, MORE, PRINT, RECOVER, SORT, SYS. This makes for a fairly minimal but fully functional DOS 2.11 environment.
The files can be copied over an existing bootable DOS 2.x disk. Care must be taken that the system file names match what the boot sector expects (IO.SYS plus MSDOS.SYS vs. IBMBIO.COM plus IBMDOS.COM). Note that the system files can be renamed, but IBMBIO.COM/IO.SYS must be a contiguous file at the start of the disk’s data area (i.e. occupying the first few clusters right after the root directory).
Perhaps the most significant missing piece is FDISK. No attempt was made to reconstruct FDISK source code because FDISK was provided entirely by OEMs, with no Microsoft source code (unlike FORMAT and SYS), or at least not until DOS 3.2 in 1986. More or less any FDISK utility from an existing DOS 2.x release should be usable.
In closing, it is excellent that the CHM and Microsoft were able to release the historic DOS 2.11 source code. It is a shame that (not counting the OEM-provided bits) the code was only 99% complete, making it highly non-trivial to build functioning binaries.