A number of years ago, the Computer History Museum together with Microsoft released the source code for MS-DOS 1.25 (very close to PC DOS 1.1) and MS-DOS 2.11. I never did anything with it beyond glancing at the code, in no small part because the release was rather poorly organized.
Now I finally decided to look at the code for DOS 1.1 and see how far I could get with it. For both DOS 1.1 and 2.0, there are ‘object’ and ‘source’ directories. The ‘object’ directory for DOS 1.1 simply contains a copy of PC DOS 1.1, which is not particularly revealing or useful on its own (and strictly speaking I’m not even sure why the CHM thought it could publish those files).
The ‘source’ directory is much more interesting and contains the following files:
05/09/1983 09:59 AM 63,781 ASM.ASM 05/17/1983 06:19 PM 67,064 COMMAND.ASM 07/02/1982 11:33 AM 3,625 HEX2BIN.ASM 08/03/1982 12:29 AM 36,882 IO.ASM 05/17/1983 06:15 PM 114,253 MSDOS.ASM 05/17/1983 06:20 PM 649 STDDOS.ASM 07/01/1982 11:54 PM 16,223 TRANS.ASM
This turns out to be an interesting mix, and an included 2013 e-mail from Tim Paterson explains its origin: Those files are the source code for what SCP (Seattle Computer Products) shipped to its customers. ASM, HEX2BIN, and TRANS were SCP’s development tools used for initial DOS development. MSDOS.ASM and COMMAND.ASM are source code for core DOS components. IO.ASM is source code for SCP’s IO.SYS (i.e. IBMBIO.COM equivalent).
Macros called MSVER and IBMVER are used to build either IBM-style or Microsoft-style COMMAND.COM and DOS kernel (MSDOS.SYS/IBMBIO.COM). Interestingly, MSDOS.ASM and COMMAND.ASM are clearly intended to be assembled with MASM, not with the included SCP assembler. IO.ASM on the other hand is written for SCP’s ASM. It can be assumed that at some point around 1982, Microsoft converted the DOS kernel and COMMAND.COM from SCP’s assembler to MASM.
The obvious gaping hole is the lack of any source code for IBMBIO.COM. I do not know exactly what arrangement IBM and Microsoft had at the time, but in the days of DOS 1.x and 2.x OEMs did not get the source code for IBMBIO.COM/IO.SYS suitable for PC compatibles.
I toyed with the idea of writing my own IBMBIO.COM replacement, but eventually gave up because it’s not a totally trivial piece of code and I had no real documentation to work with (until much later). The MSDOS.ASM source code obviously uses the IBMBIO interface, but makes no attempt to document it. The provided IO.ASM source is quite useful, but SCP’s hardware was different enough from the IBM PC that it is of limited utility.
So, disassembler it was, and I produced reconstructed source code for PC DOS 1.1 IBMBIO.COM. Actually assembling it turned out to be a bit of an adventure; more on that below.
Building COMMAND.COM was not difficult once I found the right assembler. Very quickly I established that although it must have been originally written for SCP’s ASM, the COMMAND.COM source used numerous directives (e.g. SEGMENT, GROUP) that ASM does not support. MASM was the obvious suspect and after going through a couple of MASM versions, MASM 1.00 from 1982 looked like a good match (IBM’s MASM 1.0 appears to work as well, but its runtime suffers from bugs which may cause MASM to hang).
The CHM source code for COMMAND.COM contains the following section:
IBMVER EQU FALSE ;Switch to build IBM version of Command MSVER EQU TRUE ;Switch to build MS-DOS version of Command HIGHMEM EQU TRUE ;Run resident part above transient (high memory)
This needs to be flipped around to build an IBM style COMMAND.COM, i.e. IBMVER must be TRUE and MSVER and HIGHMEM must be FALSE. Note that the code can be built as is to produce a Microsoft-style COMMAND.COM which does run on top of IBMBIO.COM and IBMDOS.COM, but exhibits numerous differences.
COMMAND.COM can be built as follows:
MASM COMMAND; LINK COMMAND; EXE2BIN COMMAND .COM
The result of this is rather interesting. The fresh COMMAND.COM is 4,959 bytes long, just like the one in PC DOS 1.1, and it is almost identical. There is a three byte difference at offset 161h in the file, which corresponds to the following line in the source code:
MOV [COMFCB],AL ;Use default drive
In the COMMAND.COM shipped with PC DOS 1.1, that instruction is not exactly missing but rather replaced with NOPs. That strongly suggests someone at either IBM or Microsoft patched COMMAND.COM after it was built, rather than removing the line from source code and rebuilding.
The fact that the file was otherwise identical is good evidence that Microsoft in fact did use MASM 1.00 or something very close.
Reconstructing the PC DOS 1.1 version of IBMDOS.COM was a little more involved. As Tim Paterson’s e-mail explains, the provided source code is MS-DOS 1.25, but PC DOS 1.1 corresponds to MS-DOS 1.24. The revision history in MSDOS.ASM shows:
; 1.25 03/03/82 Put marker (00) at end of directory to speed searches
The changes are not identified in the source code, so I had to disassemble IBMDOS.COM and readjust the source code to match. That took some time even though the resulting modifications turned out to be fairly small.
Once again the assembly had to be modified to set MSVER FALSE and IBM TRUE (with the HIGHMEM and DSKTEST switches remaining FALSE).
Once this was done, I ended up with an IBMDOS.COM file that was 6,076 bytes long, while the PC DOS 1.1 original is 6,400 bytes long (an exact multiple of 256 but not a multiple of 512). The first 6,076 bytes were identical, but at this point I do not know why IBM’s version is longer. The extra bytes in IBM’s file are mostly zeros, but there’s also a hundred bytes or so of what appears to be junk, more or less random data copied from a buffer that hadn’t been zeroed.
That said, my shorter IBMDOS.COM appears to be working just fine. The padding at the end of IBMDOS.COM that IBM shipped should have no functional significance.
Although I had no source code for IBMBIO.COM at all, I did have SCP’s IO.ASM, as well as this very handy document.
My first attempt was to produce IBMBIO.ASM that could be built with SCP’s ASM. That turned out to be… interesting. SCP’s ASM uses a syntax that is not unlike MASM and other PC assemblers, but exhibits quite a few differences.
For example, ‘SHL AL,1’ must be coded simply as ‘SHL AL’, leaving out the immediate. On the other hand, ‘MUL CX’ must be coded as ‘MUL AX,CX’, explicitly mentioning the accumulator.
Unlike MASM, SCP’s assembler does not try to be clever and guess what ‘MOV AX,LABEL’ might mean. In ASM, it means ‘move offset of LABEL into AX’. To move the word at LABEL into AX, one must write ‘MOV AX,[LABEL]’.
To resolve size ambiguity, ASM does not use the BYTE PTR or WORD PTR syntax but rather B or W pseudo-operands, such as ‘MOV B,[FOO],5’ to indicate a byte-sized operation is intended.
Segment overrides are coded differently and instead of ‘MOV ES:[BX],AX’, one must write ‘SEG ES’ as a separate “instruction”, followed by ‘MOV [BX],AX’. This reflects the fact that segment overrides are encoded as prefixes separate from the instruction itself.
Some forms of the XCHG instruction ended up being backwards, e.g. ‘XCHG BX,SI’ was came out as what would be written in MASM as ‘XCHG SI,BX’. This has no impact on the behavior of the code and of course without seeing the original source, I don’t know how the XCHG was written.
After massaging the source code, I was left with a curious problem. Instructions such as ‘CMP DL,100’ ended up with a different encoding, namely the ‘S’ (sign extend) bit was set. Since the instruction does not set the ‘W’ (word data) bit, the ‘S’ bit is irrelevant. I was able to confuse ASM into producing the same encoding as found in IBMBIO.COM by using ‘CMP DL,100-256’, which takes advantage of a quirk in SCP’s ASM. But in this case, it’s reasonably certain that the original source code did not contain such weird constructs.
In the end, I convinced myself that SCP’s ASM was not what was used to build IBMBIO.COM in PC DOS 1.1. So I went ahead and converted the source code back to MASM style, which was not nearly as straightforward as I thought it’d be and I learned much about MASM’s phase errors and similar nonsense. I also gained appreciation for why so many programmers disliked MASM and why TASM and other assemblers quickly gained a following.
I was fairly certain that MASM was used to build IBMBIO.COM because various quirks matched exactly, such as pointless NOPs after certain MOV instructions (see here for an explanation where those come from).
I was able to re-create IBMBIO.ASM that produced an exact match for IBMBIO.COM in PC DOS 1.1, although again there was some random-looking junk in the middle and perhaps extra zeros at the end.
DOS Boot Notes
While reconstructing IBMDOS.COM, I was forced to acquaint myself closer with how the PC DOS 1.1 boot process works.
The files IBMBIO.COM and IBMDOS.COM must be the first two files on a bootable floppy, in that order, and stored in consecutive sectors right after the end of the root directory. The boot sector does not fully parse the directory entries but verifies that IBMBIO.COM and IBMDOS.COM are the first two files. The boot sector “knows” that the root directory is on sector 4 of the disk, right after the boot sector (at sector 1) and two one-sector copies of the FAT. This is true for both single-sided 160K and double-sided 320K floppies.
After that, the boot sector loads the first 20 sectors (10 KB) from the disk’s data area as a single blob starting at address 60:0. The assumption is that 10 KB covers all of IBMBIO.COM and IBMDOS.COM (and anything extra won’t cause harm). Keep in mind that the boot sector itself is loaded at 0:7C00, just below 32K. The boot sector then jumps to 60:0.
IBMBIO.COM (further just IBMBIO) is split into several sections. Near the the end (offset 650h) there’s early initialization code which does the following:
- Install a DPT (Disk Parameter Table) at 50:70 and point interrupt vector 1Eh at it
- Jump into the second initialization phase in the middle of IBMBIO
The second initialization stage is in a 512-byte area which later becomes a disk buffer. It does the following:
- Set stack pointer to 0:600, i.e. 60:0, right below IBMBIO
- Reset disks (INT 13h/00)
- Initialize serial ports and printer
- Install interrupt vectors 1, 3, and 4
- Clear print screen flag at 50:0
- Move 8KB down from E0:0 to BF:0; this is IBMDOS.COM, overwriting the no longer needed tail of IBMBIO
- Detect installed drives and memory
- Call the initialization entry point at the beginning of relocated IBMDOS.COM
- DOS initialization returns with DS pointing to segment where COMMAND.COM will load
At this point, DOS will be functional but IBMBIO isn’t done yet. It further does the following:
- Install interrupt vectors 25h/26h (absolute disk read/write)
- Set DOS DTA (Disk Transfer Area) to DS:100h, just after the PSP
- Open COMMAND.COM using a statically defined FCB
- Read the entire COMMAND.COM file
- Set DOS DTA to DS:80h as programs expect by default
- Jump to DS:100h (start of COMMAND.COM code)
Now the IBMBIO initialization is done. Again, the entire second stage runs out of a 512-byte bounce buffer which is later used to handle DMA 64K boundary crossing.
What’s the deal with DMA 64K boundary crossings? To recap, the floppy drive in the IBM PC uses DMA, but the DMA controller can only access memory within a single 64K aligned block during one operation. If DOS needs to read or write a sector to/from memory that crosses a 64K boundary, IBMBIO needs to use the bounce buffer.
How can this work, you ask? How can the initialization code read a random COMMAND.COM file while residing inside a bounce buffer that disk reads may need? Easy: In PC DOS 1.1, it is guaranteed that COMMAND.COM will be loaded within the first 64K. In fact PC DOS 1.1 needs much less than 32K of memory to run.
That said, if the IBM-provided COMMAND.COM were replaced with an user-supplied executable that is 60 or so kilobytes big, DOS boot might fail due to the IBMBIO initialization code overwriting itself while handling a 64K DMA boundary crossing. This is very unlikely to have caused problems in practice.
Once initialization is completed, IBMBIO takes up less than 1.5K memory.
It is noteworthy but unsurprising that the interface “exported” by IBMBIO for use by DOS resembles a CP/M BIOS module. There’s a jump table at the beginning of IBMIO which DOS calls into using far calls. Interestingly, CP/M-86 merged the BIOS, BDOS, and CCP (COMMAND.COM equivalent) into a single CPM.SYS module.
The boot process also changed at some point in the early life of DOS (aka QDOS/86-DOS). Initially, disks had a large reserved area (typically two tracks or so) at the beginning, containing the boot sector and the BIOS + DOS components. This was changed so that IO.SYS/IBMBIO.COM and MSDOS.SYS/IBMDOS.COM would show up as files and the reserved area was reduced to the bare minimum, i.e. one boot sector. The internal logic changed very little however, and the BIOS and DOS files still had to occupy sequential, consecutive sectors at the beginning of the disk’s data area. Which is why they were marked as system/hidden to avoid being moved/fragmented.
That new arrangement changed very little for the boot loader, which only needed to start loading from a different sector, but had two advantages: The system files were easy to replace (as long as the newer files didn’t occupy more clusters on the disk), and they became optional. A non-system floppy didn’t need to waste space with boot files, which was roughly 5% of its capacity (about 8 K out of 160 K).
I was able to put together a bootable 320K floppy running something very close to PC DOS 1.1 with source code and tools that allow rebuilding IBMBIO.COM, IBMDOS.COM, and COMMAND.COM. Batch files (MKBIO.BAT, MKDOS.BAT, MKCOM.BAT) are provided to simplify the effort. Everything is done “in place” with very little room left on the disk.
The floppy (image) should run on any PC compatible or emulator.
Note that when building IBMBIO.COM on top of DOS 1.x, the user must enter ’60’ when prompted “Fix-ups needed – base segment (hex):”. When building on DOS 2.0 or later, file redirection does that automatically… but DOS 1.x did not support that yet. I do not know if it’s possible to write the code such that MASM/LINK would do the work that EXE2BIN otherwise needs to do.
In MSDOS.ASM, there are several instances of ‘IFDEF NEWVER’ bracketing code that was apparently added in DOS 1.25. Since NEWVER is not defined by default, code corresponding to DOS 1.24 is built, which happens to match PC DOS 1.1.
A version of MASM 1.00 dated 1-05-82 is used to build the source code. This is about the oldest MASM version I could find which is capable of building the source without errors. It also happens to be older than the source code, which means it could have been used back in the day, although it is not known what tools exactly Microsoft used, or if the DOS components were typically built on a PC at all. Clearly they at least could have been.
Happy retro development!
“DOS kernel (MSDOS.SYS/IBMBIO.COM)”
CP/M-86 for the IBM PC merged BIOS, BDOS, and CCP into a single file. The inclusion of CCP was optional for other OEMs though a good idea. CP/M-86 had a large block* size on hard disks. Combining the files could reduce allocated disk space by up to 48KB.
* Equivalent to cluster size in DOS.
To go off on a slight tangent: if CP/M just calls them “blocks”, where
does the term “cluster” come from…?
Cluster seems to be DEC terminology with the block as a term for 512 bytes corresponding to a sector. Not too surprising to see MS adopt DEC nomenclature given the usage of DEC systems by them. Not sure why IBM went for that instead of sticking with the IBM tape descriptors which used records that get combined into blocks. DRI hard coded records as 128 byte units that were the basis for all disk operations. Those records would be combined into blocks which would need some number of sectors.
Side note: Since file sizes were padded out to a multiple of 128 bytes, it would seem that the version of assembler in use for PC-DOS development still had the old CP/M record based IO method. Not sure why it got a complete unused record in addition to padding out the final partial record but there were a lot of strange coding techniques to work around bugs that are forgotten to history.
Nice – I got MS-DOS.SYS from the 2.00 / 2.11 distribution building (some disassembly required, as the source was incomplete), but I didn’t attempt to use period-accurate tools to do it.
The tools running on top of DOS 1.x naturally used FCB-based I/O since there was nothing else.
The source files for MS-DOS 2.11 are similarly weird and many are padded to a multiple of 256 bytes, not just 128. I currently do not understand where that came from. It could have originated on some DEC machines because it’s clear that when DOS 2.0 was done, it couldn’t even be built on a PC, it had to be built on a DEC (TOPS-20 apparently). By the time DOS 2.11 came around, the DOS source code was adapted to build on a “micro” (i.e. a PC).
I’m in the middle of attacking the DOS 2.11 source code although it is way more complex and messy (the way CHM released the code was far from ideal, mostly because they didn’t bother explaining what came from where, but I got that sorted out thanks to Jeff Parsons).
I got your reconstructed IO.ASM, what’s that based on?
Also do you happen to have found some OEM DOS 2.11 variant that’s a good match for the CHM DOS 2.11 source code? TeleVideo DOS 2.11 has 100% matching COMMAND.COM but no MSDOS.SYS (argh!) and the utilities are very close but some are oddly different (PRINT.COM, SORT.EXE). Compaq DOS 2.11 is also close but not quite the same. PC DOS 2.1 is a little further off and clearly has an older DOS kernel without COUNTRY support.
My recollection is that I started by taking the v2.0 binary of MSDOS that was in the MIT-licensed release, and amending the source until it assembled to that binary. Since IO.ASM and IO2.ASM were missing, I disassembled the binary release of MSDOS.SYS and used that to create the missing files.
Once I’d done v2.0, I then did the same process for 2.11, using my reconstructed files and everything else unchanged from 2.11. I think I got it to assemble to match the MSDOS.SYS from Apricot MS-DOS 2.11, and then backed out any changes in files other than IO.ASM / IO2.ASM.
Respect to assemblers, I found different versions of Macro Assembler, published by Microsoft, Intel and Ibm. I haven’t tested all of them, or if they produce the same binaries. Maybe worth testing ?
… up to 6.14
macro assembler-3.1 (1987)
Intel’s assembler is significantly different. IBM’s was supplied by Microsoft and is very close to Microsoft MASM.
I went through a number of MASM versions but really we’re talking before 1984, so even MASM 3.0 is too new. Then again, Microsoft managed to put out a lot of versions before then — I tried MASM 1.00, 1.06, 1.10, 1.12, 1.25, and 2.04. Those are all from before 1984.
The mystery product that is likely to be needed would be Macro-86, the precursor to MASM. The manual is out there as part of the “utility software package reference manual for 8086 processors” but the disks have not been saved to my knowledge.
I don’t think MACRO-86 is anything other than MASM. The manual is from late 1981, at that point MASM 1.00 was out. I do not see any functional difference between what the manual for MACRO-86 says and what MASM 1.00 does.
It is curious that bitsavers has the manual filed under ‘cpm’ when it’s clear that M86.EXE ran on top of MS-DOS.
The packing list for Microsoft Pascal 2.00 dated January 5, 1982 includes this item: “M86.EXE, Microsoft MACRO-86 assembler”. Said M86.EXE identifies itself as “The Microsoft MACRO Assembler”, version 1.00. I wonder if MACRO-86 was the name of the TOPS-20 version and the DOS version was renamed since there was no need to distinguish it from MACRO-80.
Please excuse my ignorance, but… TOPS-20? Isn’t it an OS for the DEC PDP-10 superminicomputer? Was Microsoft developing its 8086 software under an emulator which ran on TOPS-20 or what?
Yes, that is exactly what Microsoft was doing.
I bet MASM on a TOPS-20 machine ran many times faster than the 8088 PC version, which was painfully slow.
The one benchmark I could find indicates that an unspecified PDP-10 model was about twice as fast as the SCP 8086 running at 8 MHz. The IO should be a lot better on the PDP-10 especially given the cross-assembler/compiler was run at night when all the other users were not there. There is a long description of the conversion of Microsoft Fortran-80 to 8086 which might describe the usual process of using the PDP-10.
In terms of oddball cross assemblers, the March 17, 1980 issue of Infoworld has an announcement for Microsoft X-Macro-86, their version of an assembler that ran on 8080/Z80 to produce 8086 code. The price was set at $300. It is another product that disappeared but running it must have been an exercise in patience.
I’d expect that an 8 MHz 8086 was already about twice as fast as a 4.77 MHz 8086. Given that MASM was written in Pascal, the quality of the compiler also could have played a significant role.
Intel’s original 8086 development machines all had 8085 CPUs, but I don’t have a good sense of how fast Intel’s ASM86 or PL/M-86 compiler was when running on an Intel MDS machine. Can’t have been too fast though.
First – thank you.
Second – I noticed the results of attempting assembly of ASM.ASM with various versions of MASM, spent some time trying to revamp the syntax … but largely set that aside when I realized that a version was on the PC-DOS 090 disk image.
That version is an earlier one, without source on that image, but it runs.
Then I searched a bit and found the ASM.COM that matches the source with v1.1 material.
The Seattle Computer Products tools are in this:
I sucessfully rebuilt DOS 2.1 using your file layout and .BAT files under Win10-64 by exploiting NTVDM-x64 which with re-builds of the SCP ASM and HEX2BIN &c are in an extended dirctory built on MSDOS as released:
Again, thank you to Michal for your contributions.
Glad to hear it builds under NTVDM-x64 too. It should really work in more or less any DOS compatible environment, the biggest problem for me has been early versions of MASM failing due to reliance A20 line wraparound or because there’s “too much memory”. Plus if the early versions legitimately ran out of memory, they didn’t always handle it very well. By about 1983 Microsoft had it more or less under control and the newer versions are much more stable.
I never seriously tried reassembling ASM.ASM with MASM because I had ASM.COM version 2.24 that was released years ago on an 86-DOS disk image. Interestingly it is almost but not quite identical to ASM.COM version 2.24 that is on the PC DOS 0.90 disk. I think the one on the PC DOS 0.90 disk has a slightly bigger stack (by 180 bytes), otherwise they are identical.
The old ASM.COM would not build ASM.ASM as is and I had to modify the source slightly, but it wasn’t too hard to do. The resulting ASM.COM version 2.44 can build itself just fine. But in the end I basically had no real use for it after I found out that MASM must have been used for all the core components of PC DOS 1.1.
Just wanted to note that I had issues assembling the command.asm file from the github page. Kept getting the error “?End of file encountered on input file”. Had to use the files from the CHM link.
The files on github are mangled, with wrong line endings and all kinds of nonsense. The trouble is that git’s idea of a text file and old MASM’s idea of a text file really aren’t all that similar, and some of the original files are, shall we say, a little strange. The CHM archives are much better because those files have not been mangled.
Pingback: Unidentified PC DOS 1.1 Boot Sector Junk Identified | OS/2 Museum