Hang with early DOS boot sector

While installing various versions of DOS for the DOS history series of articles, I was faced with a mysterious problem: Some versions of DOS would hang right away when booting from fixed disk, but not from floppy. I already knew that DOS 4.x is very sensitive to BIOS stack usage; if a BIOS needs more than 100 bytes or so of stack to process a disk read request, it will fail to boot DOS 4.x from fixed disk, even though the same DOS 4.x can access the same disk just fine when booted from floppy.

However, the hangs I was observing were happening with DOS 2.x and 3.x, and those do not have such tight stack usage requirements. I quickly realized that the problem is caused by a bug in the DOS boot sector: the boot sector code tries to optimize the loading of IBMBIO.COM and attempts to read a whole disk track at a time. That sounds like a good idea, but it’s not.

The boot sector is loaded at address 0:7c00h, or just under 32KB. The BIOS component of DOS (IBMBIO.COM) is loaded at address 70h:0 (in other words, 0:700h). The boot sector also sets up the top of stack at 0:7c00h, just below the boot sector code. There is therefore slightly under 30KB of room for IBMBIO.COM, which in the 2.x and 3.x versions of DOS is well under 20KB size (just 4.5KB in DOS 2.0 in fact). In theory there should be no problem.

Unfortunately, the author of the DOS 2.0 boot sector was too eager to optimize the loading and not thinking far enough ahead. Loading a track at a time sounds clever, except when it’s not… On a modern disk, there are usually 63 sectors per track, or 31.5KB of data. If the boot sector reads that whole track, it will destroy the stack and overwrite itself. But back when the boot sector code was written, almost all fixed disks had 17 sectors per track, which meant that no amount of testing would have caught the bug. Floppy booting is no problem either, since even 1.44MB diskettes only have 18 sectors per track.

Now, the above explains why an old version of DOS might hang when booted from a modern fixed disk. But it does not explain why some of my DOS 2.x and 3.x installs worked just fine, including a case with one install of DOS 2.0 booting fine and another persistently hanging.

After some head scratching, it turned out that the DOS partition size is key. Depending on how big the partition is, the FATs will have varying size, and IBMBIO.COM will start on a different sector relative to the start of the track. If IBMBIO.COM starts on, say, sector 30, the boot sector will load 33 sectors and DOS will boot fine. If IBMBIO.COM starts on sector 1, DOS will hang. Similarly if IBMBIO.COM starts on sector 60, the first read of four sectors will be fine, but the next read of 63 sectors will crash the system.

For reference, a 30-cylinder partition on a typical disk with 63 sectors per track and 16 heads tends to cause problems. A 60-cylinder partition (close to the 32MB partition size limit) tends to work, but the exact behavior depends on the DOS version.

The bug most likely affects all 2.x and 3.x versions of DOS prior to 3.3. Version 3.3 relaxed the restriction that IBMBIO.COM/IO.SYS must be contiguous (which enabled the track-at-once loading); this is documented by Microsoft in KB 66530. The removed restriction required changes to the boot sector and IBMBIO.COM/IO.SYS had to be loaded cluster by cluster.

In DOS 4.0, IBMBIO.COM/IO.SYS grew beyond 30KB and could no longer be loaded in one go at all, even if it were contiguous (that would in essence always cause the hang problem described above). There just wasn’t enough space between 70h:0 and 0:7c00h anymore. The BIOS component was therefore loaded in two stages, which avoided the problem. However, the staged loading caused the previously mentioned issue with tight stack space—but that’s a different story.

This bug does not appear to be well known. The most likely reason is that extremely few people attempt installing DOS 3.2 or older on a computer with a multi-gigabyte disk. DOS 2.x is simply not useful for running any “modern” DOS software, and DOS 3.0/3.1/3.2 has a serious drawback in that it does not support 1.44MB floppy drives.

This entry was posted in DOS. Bookmark the permalink.

23 Responses to Hang with early DOS boot sector

  1. Yuhong Bao says:

    Not to mention the 32MB partition limit turns the 58 sectors per track limit into a non-issue anyway.

  2. michaln says:

    Sorry, that makes no sense. Can you please elaborate? What 58 SPT limit are you talking about?

  3. Yuhong Bao says:

    7c00h – 700h / 512 bytes = 58

  4. Yuhong Bao says:

    Actually, it would be 57 sectors per track to allow for the stack itself.

  5. michaln says:

    There’s room for 58.5 sectors between 700h and 7C00h. The stack is extremely unlikely to need more than 256 bytes, so 58 sectors should be fine.

    But what does that have to do with the 32MB partition size limit?

  6. Yuhong Bao says:

    Well, DOS 3.2 and earlier did not support partitioning at all without third-party drivers.

  7. michaln says:

    That’s a rather misleading statement. DOS always supported partitioning (ever since hard disk support was added in DOS 2.0), but prior to 3.3 DOS could not access multiple DOS partitions at the same time. It was always possible to have multiple partitions present on a fixed disk, even multiple DOS partitions (but only one active/accessible).

  8. Yuhong Bao says:

    Yea, I think it was primarily for booting other OSes like Xenix.

  9. rauli says:

    Let’s suppose:
    1 – We have a hard disk partition [A] with MS DOS 3.0/3.1/3.2 installed on it, and crashing at boot (because of the issue described in this article).
    2 – We have MS DOS 3.30 installed on another disk [B] (another hard disk partition or a diskette).

    Replacing [A] boot sector with [B] boot sector… would make [A] boot?
    (replacing the boot sector, but maintaining the BPB part, of course)

  10. michaln says:

    It might help, but I can’t guarantee that it will work. I’d give it about 80% chance of success. If you do try, be very careful.

  11. rauli says:

    It works!
    But, if you take DOS 3.3 boot sector from a diskette, you have to maintain also byte at 01FD from the [A] boot sector (3rd from the end) which is 80h for a hard disk boot sector, and 00h for a diskette boot sector.
    I didn’t notice that byte, and it almost makes me quit…
    As I’ve said before, you also have to maintain the original BPB from [A] (for DOS 3.x it’s at bytes 0Bh to 1Dh, I think).
    After some rest I will try to boot DOS 2.x and 4 with this same method (using 3.3 boot sector). Just to try, but I think they will not boot.

  12. alex peter says:

    anyone know how to get more than 4 partitions from dos 3.2? assuming the drive can handle more than 4 32 meg partitions. I heard ast and nec came up with 8 32 meg partitions I think, but im looking for software |i can use with dos 3.2 that will give me more partitions. any takers?

  13. alex peter says:

    I want to run my old games on my Tandy 1k. 3.3 wont run them due to too much conventional mem ue but 3.2 does. Drawback is that I have an 8 bit ide card that can take 2 gig partitions but the dos it uses (6.22) definitely wont run my games. tandy had a very weird video set up that used conventional mem to run its video ram. (sucks) but If I can circumvent the problem by using 3.2 then I got it licked. Assuming that I can get some sort of partitioning sorftware or a version of fdisk that will give me more than the said 4 partition limitations in 3.2.

  14. michaln says:

    If they really supported more than 4 partitions, they probably had OEM DOS versions with adapted IO.SYS.

  15. Pingback: DOS boot hang update | OS/2 Museum

  16. ImperatorBanana says:

    I know you posted this a while ago, but *THANK YOU*! I’ve been using older versions of PC-DOS (2.10, 3.20) with modern replacement storage solutions on the PCJr (JrIDE+CF adapter, SDCartJr + SD Card) and noticed what seemed to be random configurations of partition size + DOS version + storage device would hang on boot or trigger other strange behavior that now makes perfect sense: stack smashing! A lot of the modern replacement storage solutions tend to be used with more modern DOS versions and present themselves as larger CHS values so this problem doesn’t seem to be very well known today either but again, as someone using older versions of DOS: thank you for publishing it!

  17. Michal Necasek says:

    Glad it helped! The problem was somewhat known back in the day (1987-ish) as it popped up with ESDI hard drives and such, but it was soon forgotten because newer DOS versions fixed it. Old DOS versions were written with the assumption that hard disks have 17 sectors per track or thereabouts, which was true for a number of years… until it wasn’t.

  18. ImperatorBanana says:

    I took a shot at disassembling the bootsector and writing an assemble-able version of it (at least in PC-DOS 2.10 with the geometry of the SD-Cart JR + 16GB flash drive: CHS=1024,255,63). Your description of the issue helped greatly with understanding what was really happening and labeling it all which was a nice fun project for the evenings, so thank you again!

    Once I got it assembling byte-for-byte (mostly…see the last statement below), I went forward with trying to patch it and, overall, the patch seems to work. It looks like there is a byte that tells it how many sectors IBMBIO.COM + IBMDOS.COM take up, so my logic was just (regardless of how many sectors are left on the track after the last directory sector) try to read that number of sectors. The read itself will technically fail if it hits the track boundary before reading all of the necessary sectors, but since the error message will indicate the number of sectors successfully read (unfortunately also including the sector it fails on in the count), I just confirm which error occurred, subtract the successfully read sectors from the number of sectors intended to read, update the RAM offset pointer and starting sector, and try the next read from the new starting position. The patch is about 4 bytes shorter than the original, so I put in some NOPs to keep everything else aligned.

    About the only thing I haven’t figured out is in MASM 2.0 how to get it to assemble the code to start at offset 0 of the output file (without it padding the beginning with a bunch of 0’s) while ensuring the memory access offsets are 7c00. I tried a few combinations of ORG and CODE SEGMENT AT values but had no luck. It’s pretty easy to just run a post processing pass to remove the leading 0’s though (in my github, the REFERENCE folder has an IBM C Compiler 1.0 compatible program to strip out the 0’s) so no worries there for me. They made some changes to the PC-DOS 3.2 bootsector so I haven’t yet gone through that to see if my patch logic can be applied similarly yet.

    Github if you were curious (has both the original and patched assemble-able and binary): https://github.com/RetroByten/DOS_BOOTSECTOR_DISASSEMBLY/tree/main/PCDOS210

  19. Michal Necasek says:

    Very cool!

    I don’t think you can get MASM to skip the zeros, or at least Microsoft did not know how to. Because in the MS-DOS 3.21 OAK, they assemble and link the boot sector and then use a DEBUG script to postprocess the resulting binary.

    The assembler as such could do it, the LEDATA OMF record specifies a starting address. But then the linker would add the padding anyway because MS LINK has no concept of segments not starting at zero.

  20. ImperatorBanana says:

    Oh neat, I’ll have to see if the OAK (or some descriptions of that debug script) are online somewhere to reference. It’s pretty awesome when you hit an issue, don’t really know how to solve it so you come up with an alternative workaround, and it turns out in real life that was pretty much the production workaround! Again, much thanks both for posting it and for the additional information on MASM/MS Link!!

  21. Michal Necasek says:

    I can just quote the relevant bits here. The build batch file for the boot sector did this:

    masm -Mx -t -I../../inc msboot.asm,msboot.obj;
    link @msboot.lrf;
    exe2bin msboot
    debug msboot.bin < debscr

    And the debscr script looked like this:

    m7d00 l 200 100
    rcx
    200
    w
    q

    The DEBUG script seems to make an awful lot of assumptions about what the commands do by default. But basically DEBUG loads the binary as if it were a .COM file (i.e. at offset 100h), then the script copies 512 bytes from offset 7D00h to 100h, and writes those 512 bytes to disk.

    ETA: The 'w' command as used there writes the count of bytes in BX:CX to the default file, which in this case is the file specified on the DEBUG.COM command file. And on startup, BX:CX contain the input file size, which is why BX does not need to be explicitly set. It's all very impenetrable without documentation.

  22. ImperatorBanana says:

    Oh goodness, thank you again! For whatever reason I was fully aware DOS (I think ~2.0 and up?) supported the “>” operator but didn’t pay enough attention to the manual to realize the “<" operator was also supported so it being that straightforward never crossed my mind! Also, thank you for the tidbit on BX:CX's initial state: I've been dumping the boot sectors / MBRs by writing the small int13 programs in debug (manually since I didn't know the scripting was possible with the "<" operator, now I know!) which destroy bx:cx so my "-w" command usage was always preceded by setting those registers. I probably would've guessed it was assuming a default state (the tutorial I followed for int13 ignored the setting of ES for the ES:BX buffer location and I came to the same conclusion), but for that script I would've ended up writing the rbx
    0
    anyway.

  23. Michal Necasek says:

    Yes, it was DOS 2.0 that added the standard input/output redirection. Where DOS 1.x was a clear CP/M workalike, DOS 2.0 (also) attempted to act like a minimalist/old UNIX.

    I didn’t know about the initial BX:CX values either. The trouble with DEBUG is that the built-in help is extremely terse (and nonexistent in old versions), and online documentation is no better. In the DOS 2.x days, DEBUG was documented in the DOS manual. By DOS 3.3, the DEBUG documentation was relegated to the Technical Reference, something that very few people had. A lot of the DEBUG functionality is quite non-obvious, and over time people forgot that DEBUG can actually do quite a bit.

    Possibly the most uptodate official DEBUG documentation is in the PC DOS 7 Technical Update.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.