A Brief Visit to Disk Geometry Hell

Several weeks ago I thought I’d install NetWare 3.12 in a virtual machine using the BusLogic SCSI controller emulation. While configuring a 1.5 GB virtual drive, I thought I should be safe and not run into any trouble with a “too big” disk.

But I was wrong. The NetWare 3.12 installer created the DOS partition and launched the NetWare OS, but trying to create a NetWare partition resulted in the following error:

NetWare 3.12 reporting an error when attempting to create a NetWare partition, caused by a geometry mismatch between the BIOS and driver.
NetWare 3.12 upset by a geometry mismatch

This error is naturally not specific to virtualization and happens on real systems as well. But why? The geometry of a SCSI disk is entirely fictional, so why is it a problem?

The core problem is that when IBM released the PC/XT in March 1983, the new machine introduced a hard disk BIOS service (INT 13h) which was very tightly modeled on the existing floppy-centric INT 13h service, and used strictly cylinders, heads, and sector numbers to address hard disk sectors. That may have made sense at the time, but in retrospect, it was a terrible mistake.

Even at the time, not all hard disks used cylinder/head/sector (CHS) addressing, although the ones IBM used in the PC/XT did. For example SCSI drives always used LBA (Logical Block Addressing). And on the other end of the stack, MS-DOS also treated every disk (floppy or hard) as a linear sequence of sectors. The low-level DOS disk code performed the translation but DOS itself only worked with LBA.

But IBM did what it did and the damage was done. When PC SCSI host adapters first appeared, they were forced by the BIOS INT 13h interface to make up some kind of CHS geometry. Adaptec (and BusLogic) used a geometry with 64 heads and 32 sectors per track, which fit well within the BIOS limits of 255 heads and 63 sectors per track, and had the pleasant property that one cylinder was exactly one megabyte.

But of course once SCSI disk sizes went past 1 GB (and that did not take long), the BIOS 1024 cylinder limit struck. Adaptec (and BusLogic) chose to use a geometry with 128 heads and 32 sectors per track (2 MB per cylinder) for disks up to 2 GB, and the maximum BIOS geometry (255 heads, 63 sectors per track) for larger disks.

Given that a SCSI disk does not have any inherent “physical” geometry, the CHS geometry presented by a HBA BIOS is made up. However, once it is used with a given disk, it leaves a mark in the disk’s partition tables. And that is a problem because if the geometry used to partition/format the disk does not match the geometry used to access it, there is a high likelihood the disk will not be accessible, and in the worst case, a severe data loss can result.

It is almost certain that the problem with NetWare is that the BusLogic driver (shipped with NetWare 3.12) makes up a CHS geometry different from what the BIOS did. That is, for a 1.5 GB disk, the BIOS by default chooses a X/128/32 geometry, while the NetWare driver uses X/64/32. It may be possible to reconfigure the HBA to use the old X/64/32 geometry, with the caveat that the disk won’t be fully accessible through the BIOS (only the first gigabyte will be).

How to Get Out?

When working with virtual machines, there are two simple solutions. Either use a smaller virtual disk with, say, 900 or 950 MB; then the BIOS and the NetWare BusLogic driver will agree on the geometry. Or set up a separate small-ish IDE disk to hold the DOS boot partition; then DOS/BIOS does not need to access the larger SCSI disk at all and any possible geometry mismatch does not matter.

The most tempting and complete solution, i.e. strangling the person(s) responsible for the mess, is unfortunately no good without the ability to travel back in time to 1982 or 1983.

This entry was posted in BusLogic, IBM, NetWare, PC architecture, Storage. Bookmark the permalink.

31 Responses to A Brief Visit to Disk Geometry Hell

  1. Richard Wells says:

    Netware with larger than one GB disks gets weird. One has to turn off extended BIOS translation for DOS so DOS thinks that the drive is 1GB and then Netware will be able to use all the non-DOS partition space correctly. See the old FAQ email threads for longer discussion of this: https://www.infania.net/misc/nov-faq/nov-scs1.txt

    If there are any special cases in regards to Buslogic, I don’t know. I was over in the Adaptec camp except for one rather flaky Ultrastor EISA controller and the IBM supplied controller in a Model 77.

    CHS in the BIOS was necessary to support the very dumb ST-506 drives planned. Dealing with a GB+ was silly. Even in 1988, only 45,000 drives with capacities larger than a GB were made in 8″ to 14″ sizes with mainframe connectors and an average price of $100,000. Not exactly something one would anticipate using with an XT. It wasn’t until 1990 when Seagate finally shipped 1,000 5.25″ SCSI drives storing more than 1 GB.

  2. nobody says:

    @Richard Wells:

    The BIOS design flaw was in exposing the CHS interface of the ST-506 to the OS via Int 13h. The BIOS could have presented a LBA interface via Int 13h and done LBA to CHS translation internally rather than leaving that for DOS to do.

  3. Michal Necasek says:

    Exactly. Converting between LBA and CHS addressing is trivial, and the BIOS could have easily done that. Once the CHS addresses leaked into the partition table, there was no going back (until EFI, which is a completely different can of worms). It’s of course ironic that partition tables do have the LBA values in them, but usually those are ignored because they weren’t always reliable.

    Basically, if DOS could do the conversion, so could the BIOS. It is possible/likely that it never occurred to the IBM guys that they were setting things in stone, just like it happened with many other PC/XT/AT design decisions.

  4. Michal Necasek says:

    Adaptec and BusLogic are more or less identical in this regard, after all BusLogic was a superset of AHA-154x. I think it was actually somewhat common with many operating systems that one could use a geometry that didn’t cover the entire disk, and all was good as long as the boot code was below 1024 cylinders. With NetWare it was very clear cut since only the DOS partition needed to be BIOS accessible, so as long as that was below 1024 cylinders, you were safe.

    The extended INT 13h interface was a good solution but probably needed to happen a few years earlier to have saved a lot of headaches.

    As for the timeline, I was more thinking about the date of introducing the Adaptec SCSI HBAs (1987-ish) and larger than 1GB SCSI drives appearing. That did not take very long. You’re absolutely right that in 1983, GB+ drives in common PCs were way out there. And the BIOS interface can deal with up to 8GB, so from that perspective it was good for 10+ years which really isn’t bad.

    A big factor was really that when the big drives showed up, there was no clear industry leader like IBM used to be, and getting everyone on the same page was much more difficult.

  5. Richard Wells says:

    IIRC, Netware 3.2 accepted the various extended BIOS geometries as is. Presumably, that means one of the Netware 3.12 patches permitted the same. Instead of experienced CNEs doing the work, anyone would install 3.2 on any surplus WfWG machine and thus could not be expected to track through readme files to find out how to reformat the drive.

    Seldom does future proofing for non-existent hardware work. I expect that if IBM had tried for an LBA mode hiding CHS, it would not have been beneficial. IBM might have chosen an conversion method that turns out to be slow and BIOS replacements with conversions better suited to real hardware would have still been needed. Almost the same as what happened in reality except that the XT and AT MFM drive accesses would have needed to pass through another layer and many of the interleave altering speed-up utilities would have been impossible. A new BIOS design to handle new hardware designs should have been deferred until the next generation PC when the requirements were available. (Failures with the actual PS/2 design notwithstanding.)

  6. nobody says:

    @Richard Wells

    Since DOS used LBA internally, LBA to CHS conversion needed to be performed at some level in the software stack. There’s no reason to presume that moving the translation layer from DOS to the BIOS would have any performance impact.

    I’m not sure why you say interleave modification tools would have been impossible in an alternate universe where the original Int 13h had LBA. Interleave modification tools worked by copying the data from a cylinder to a scratch area, low-level formatting the cylinder with the desired interleave, then restoring the data from the scratch area. In an Int 13h LBA world, the BIOS would still need to provider a cylinder-based format API, so the only complexity for an interleave tool would be needing to determine which LBAs were stored on which physical cylinders, which seems fairly straightforward provided the BIOS’ LBA to CHS translation formula was well known.

  7. Michal Necasek says:

    Exactly. Changing the interleave is a “logically” invisible operation, because the sector with the same number still holds the exact same data… only it’s in a different place on the disk. It works regardless of how the sectors are addressed.

    I can’t even remember how many times I moved a hard disk to a different machine and it wouldn’t boot because the geometry was different. The problem persisted much longer than it should have, in part because Windows NT (I’m sure for reasons that seemed good at the time) insisted on using the CHS BIOS services as much as it could instead of reading everything via INT13X. The problem only really went away once every drive was well past 8GB and everyone started using the maximum geometry. Except maybe for “exotic” OSes like SCO OpenServer and such.

  8. Yuhong Bao says:

    NTLDR did not even support Int13 extensions until the NT5 version.

  9. Vlad Gnatov says:

    > The problem only really went away once every drive was well past 8GB and everyone
    > started using the maximum geometry.
    No, problem didn’t go away after 8Gb. After that was 137Gb and depending on bios, 8.4G/33.8Gb. See https://www.win.tue.nl/~aeb/linux/Large-Disk-4.html

  10. nobody says:

    Windows did so much strange stuff with boot-phase disk access that I wonder if it was intended as a weak form of DRM to make Windows installations non-portable. There’s no good reason for doing things like recording the boot partition’s starting cylinder number in the registry (as opposed to reading it from the partition table at boot time) or refusing to boot if a Windows disk is moved to a different type of ATA/AHCI/RAID adapter.

  11. Chris M. says:

    The reason for Windows balking at switching machines (INACCESSABLE_BOOT_DEVICE blue screen) was because pre-NT 6.x basically hard coded what drivers and boot drive was used in the registry. The system only loaded the basic-most set of drivers and a minimal part of the registry in real mode using Int 13h before switching to protected mode. The UniATA driver gets around a lot of the problems and allows at least pre-Windows 2000 installs to be portable between machines. Windows 2000/XP are troublesome due to how PnP works on them: http://alter.org.ua/en/soft/win/uni_ata/uniata_pnp.php

    Microsoft had a knowledge base article that had generic PnP registry entries one could add before moving a drive to another machine, but it only supported bog standard onboard IDE controllers from common chipsets built into Windows. Anything like a PCI IDE or SATA card needing its own drivers was out of luck.

  12. Yuhong Bao says:

    “I wonder if it was intended as a weak form of DRM to make Windows installations non-portable.”
    The licensing/activation code in Windows is a different thing.

  13. Michal Necasek says:

    We’re primarily talking about the BIOS interfaces here, and those “barriers” (other than 8GB) aren’t relevant for that. Yes, if you plug a 200GB drive into a machine whose BIOS doesn’t understand LBA-48, it might not be able to boot, but that’s nothing new under the sun. The problem with made-up geometries is that they can mess things up even in a system that is perfectly accessible of accessing the entire disk on the hardware level.

  14. Michal Necasek says:

    Another way to put it is that in the boot phase of Windows NT up to and including 5.x, there simply is no PnP. It was possible to work around many of the limitations with some effort, but out of the box on a typical XP install, the system is bolted down to a specific IDE controller PCI ID and won’t boot on another machine even if the hardware is 100% compatible.

    In this aspect, XP was the absolute worst in the industry, even OSes that don’t pretend to do any PnP whatsoever (like OS/2) are less finicky and easier to work with. Which makes me think that Microsoft did it on purpose.

    I remember that back in the day when I had Windows 2000 installed on a SCSI disk and upgraded the SCSI HBA, it of course wouldn’t boot and the official solution was “reinstall Windows”.

  15. Chris M. says:

    SCSI seemed tame with the disk geometry madness compared to 90s BIOS with IDE drives. I recall Award having “CHS”, “Large”, and “LBA” as options. The “LBA” translation seemed to be the industry standard and drives formatted with it worked among BIOS vendors. It got so bad that machines we built at the computer shop all got a label with the exact geometry set on the back.

  16. Michal Necasek says:

    Yeah, with SCSI you had just one geometry (the made-up BIOS one, aka “logical” CHS), whereas with IDE you (on disks > 500 MB) had two, the made-up BIOS one, and the “physical” CHS geometry used at the IDE interface. Many larger drives had LBA, but initially not all of them did.

    I think pretty much all BIOSes had several options because they sort of had to. Different operating systems needed different translation, and the different translations also had varying loss of drive capacity. An IDE drive with LBA is functionally identical to SCSI, except for the part where it can still be accessed via CHS addressing.

    Some (many?) IDE drives also had programmable physical CHS geometry, which brought even more fun into the mix.

  17. Vlad Gnatov says:

    I agree about 137G, but 33G limit definitely was an issue, at least on some bios’es and some (not very exotic) OS’es. For example FreeBSD has used int13 chs calls by default in boot code up to 5.4, so boot partition had to be located below 33G and in the worst case, below 8G.

  18. Michal Necasek says:

    That’s curious. The INT 13h limit is 8G, so I wonder why 33G would matter. Although for a BIOS + OS the 33G limit might well matter.

  19. Richard Wells says:

    SCSI had its own developmental issues. The original 6 byte CDB only permitted 21 bits of logical blocks while the 10 byte upgraded to a full 32 bits of logical blocks, effectively replacing a maximum capacity of 1 GB with 2 TB. A number of other flaws were corrected over the course of the 80s. There is also a current limit in that Windows 7 and Server 2008 do not correctly handle bad sectors on SCSI drives greater than 2 TB. The difference for IDE was it had become a mainstream drive interface rapidly and many purchasers got a close view of iterative development.

    The XT and AT were designed before SCSI reached a stable state so following SCSI’s design would be impossible. Doing a LBA only DOS where the relevant pieces of code were moved from DOS to the BIOS and additional information regarding each drive would also be placed in the BIOS is an interesting thought experiment. I expect that it would have added $50 to the XT’s build cost. That may vary depending on how much space would be needed to define optimal formats including interleave and skew; I expect something more efficient than IBM’s 256 byte track definition. ST-506 drives can’t provide the information so it would have to be in the BIOS.

  20. Julien Oster says:

    $50 for adding the mapping code to the BIOS instead of the DOS kernel? Is the LBA->CHS mapping function less trivial than I imagine?

  21. Julien Oster says:

    I never worked on early hard disks on that level, does/did DOS actually know about the interleave factor and low-level formatting? This (German) article linked below suggests that for the original XT controller at least, that seemed to have been abstracted away by the controller.

    Low-level formatting I recall being initiated by an external program. The article also contains a small sequence of DEBUG commands to do so on the XT controller, only interacting with the controller’s I/O ports to trigger the formatting. That does not mean that this had been the same before and after that, of course.

    https://www.classic-computing.org/tag/st506/

  22. Chris M. says:

    Functionally IDE drives combined the dedicated controller card and hard drive into one unit hence “Integrated Drive Electronics”. They retained the ST-506 software interface, thus the reliance on drive geometry to access sectors. The BIOS knows nothing of low level formatting and interleave. The debug commands to low level format MFM/RLL drives was to run the controller card’s ROM based format routine.

    MS-DOS relying on LBA internally is no surprise. It was designed to be platform and storage device independent.

  23. Richard Wells says:

    With ST-506 on the 5170, the formatting program sets up the interleave and the sector headers in a 256 byte buffer when calling the INT 13h function for formatting a track. The IBM AT Tech Ref includes a sample of the track format buffer showcasing how to do interleave. The interleave is implied by the order the sectors are listed.

    The LBA to CHS code itself isn’t that big. Int13Deblock from OpenDOS weighs in about 2800 characters. But there would be other code moved out of the OS like that which loops through each track and sets up the sector headers and some code to handle bad sectors that LBA DOS would be insulated from. Very rough estimate; obviously I haven’t built a BIOS with all the code DOS has that directly accesses CHS values for hard drives.

  24. Michal Necasek says:

    Yes, many disk controllers had their own format utility launched through DEBUG. DOS didn’t really have to deal with it.

    Interleave was likewise completely transparent to DOS and BIOS. It re-orders the sectors on a track, but when you use DOS/BIOS to access them, you always go by logical sector ID. Optimal interleave speeds things up, but does not affect anything else. In fact programmatically discovering the interleave factor wouldn’t be entirely trivial.

  25. Yuhong Bao says:

    Interestingly, DOS 2.x and 3.x always use 16-bit sector numbers to address primary (not extended) partitions, requiring them to reside in the first 32MB of the disk.

  26. vbdasc says:

    @ Yuhong Bao
    “Interestingly, DOS 2.x and 3.x always use 16-bit sector numbers to address primary (not extended) partitions, requiring them to reside in the first 32MB of the disk.”

    Except Compaq DOS 3.31, of course. Also, as far as I can recall, these 16-bit sector numbers were used only for loading IO.SYS/IBMBIO.COM and MSDOS.SYS/IBMDOS.COM during boot, so only these two system files really needed to reside below the first 32Mb, and only if the partition was bootable.

  27. Yuhong Bao says:

    It is also used in the IO.SYS/IBMBIO.COM code for example in that only the low 16-bit of the partition start sector is read from the MBR.

  28. vbdasc says:

    @Michal Necasek
    “Some (many?) IDE drives also had programmable physical CHS geometry”

    The ATA command which does this (“Initialize drive parameters”) is present and mandatory even in the earliest ATA standard, which means that ALL IDE drives should have this capability.

  29. Michal Necasek says:

    Yes, but some drives (e.g. Conner CFS1275A) store the programmed values in non-volatile memory.

    It is true that the INITIALIZE DRIVE PARAMETERS command exists on all IDE drives. It was initially used with ST-506 style controllers which had to be told what the drive’s geometry was.

  30. Chris M. says:

    Reading the Linux large disk HOWTO, I realized that SCSI card vendors seemed to keep things more confusing then IDE drives with regards to translation. The < 1GB translation using x/64/32 was universal between controller cards, but over 1GB it depended on the vendor. Some did x/128/32 (like Buslogic), while others went right to x/255/63.

    https://www.win.tue.nl/~aeb/linux/Large-Disk-10.html#ss10.2

    Thankfully the majority of SCSI equipped platforms (not PCs) didn't have to deal with this nonsense. Oddly Apple got bitten by the 8.4GB limit on early IDE equipped Power Macs. I'm still wondering if this was some weird CHS only addressing limit (even though LBA was long since included in the ATA standards) since those machines didn't use Int 13h. Circa 1992 Amigas with IDE didn't suffer from these limits at all.

  31. Michal Necasek says:

    SCSI HBAs had the “problem” that they had no geometry to start with. Yes, BusLogic (and Adaptec?) had the extra step where they just doubled the heads from 64 to 128, so it handled twice the disk size. Big deal. SCSI also had exactly the same 8G (1024 cylinder really) BIOS limit as IDE, since that was a software limitation, not related to the underlying technology. Same with partition tables, operating systems had to deal with the geometry forced on them externally.

    I see something about 8GB partition size on some Macs, so maybe that’s where the limit came from, not IDE? As far as I know IDE never had a ~8GB limit, prior to the ATA standard there were 22 address bits (14/2/6 bits for C/H/S) for ~2GB of disk space, and ATA-1 extended that to 28 address bits for ~128GB (14/8/6 bits for C/H/S). The old ST-506 drive interface could only select heads 0 to 3, hence the different limitation.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.