Timing In Software Is Too Hard?

I recently attempted to install RedHat Linux 3.0.3 (that’s the one from 1996, not RHEL 3.0) in VirtualBox. I thought I’d use the BusLogic SCSI emulation and the newer 1.3.57 Linux kernel. It did not work at all.

Red Hat 3.0.3 BusLogic Panic

The problem was that the BusLogic SCSI driver, version 1.3.1 by the late Leonard N. Zubkoff, wouldn’t load. It failed with the following error message: ‘INQUIRE INSTALLED DEVICES ID 0 TO 7 FAILED – DETACHING’. That in turn caused the kernel to panic as it was unable to mount the root filesystem. The real problem turned out to be caused by a rather interesting collection of bugs in the Linux BusLogic driver.

The timeout handling in the old Zubkoff BusLogic drivers (BusLogic.c and BusLogic.h)(older Linux versions used an entirely different driver!) is most charitably described as very naive. The routine BusLogic_Command()  calculates a local variable called TimeoutCounter based on the kernel variable loops_per_sec (nothing like completely different coding styles to improve code readability). The TimeoutCounter is then used to calculate the maximum number of times hardware status registers will be polled.

This approach usually works reasonably well for short periods of time (at most a few milliseconds), but it is not accurate because the speed of port I/O operations is not fixed. Unfortunately, here it’s used even for timeouts meant to last “approximately 60 seconds”. Why proper OS timing services weren’t used here is a question no one alive can probably answer.

INQUIRE INSTALLED DEVICES is a convenience controller command which sends TEST UNIT READY commands to all SCSI devices attached to the bus, and as such may take many seconds to complete. Hence the desired 60-second timeout.

One problem is that the driver assumes that the speed of port I/O operations is directly proportional to CPU clock speed, which is most certainly not true. On the contrary, the port I/O speed remains more or less constant because it’s tied to the bus (ISA, PCI) frequency and not CPU frequency.

A related but much worse problem is that the logic calculating TimeoutCounter from loops_per_sec is far too optimistic (i.e. buggy). The loops_per_sec value is either shifted right by 4 or shifted left by 2 to obtain TimeoutCounter. However, not even a slightest attempt is made to ensure that the loops_per_sec value is in the implicitly assumed range, i.e. TimeoutCounter won’t end up as zero after shifting.

To add insult to injury, TimeoutCounter is a signed variable even though loops_per_sec is not. Obviously values like 0x20000000 will suddenly turn into negative integers after being shifted left by two. That then completely confuses the rest of the code which checks for TimeoutCounter being greater or equal to zero using a signed comparison.

And that’s exactly what happens on my system: The CPU is detected as having about 3434 BogoMIPS, which corresponds to a loops_per_sec value 500,000 times larger, or 0x66575740. When shifted left by two, this results in 0x995D5D00… which, sadly, is a negative number and the BusLogic driver timeout logic completely falls apart.

This is a classic example of a latent logic bug which can’t be discovered during normal testing. At the time the driver was written, the loops_per_sec values were in the expected range on all test systems. But fast forward a few years and poof, suddenly the “tested and proven” code breaks. Only a code review by an experienced engineer might help… perhaps.

It would be fairly easy to make the driver timeout logic more robust, but that’s an exercise left for another day.

This entry was posted in Bugs, BusLogic, Linux, SCSI. Bookmark the permalink.

32 Responses to Timing In Software Is Too Hard?

  1. Andreas Kohl says:

    How it works when using Adpatec 1542 driver instead? I was trying here Caldera 1.0 without success, but Caldera’s 1.2 autoprobing finds here the disk on Adaptec 154x. But I was not able to boot from the CD connected to Buslogic – is this still a limitation?

  2. Raúl Gutiérrez Sanz says:

    Just to mention more timing issues, remember Windows 95 problem booting on CPUs over 400 MHz and all the DOS software compiled by Turbo Pascal which just displays “Runtime error 200” and terminate, even on a Pentium.

  3. Andreas Kohl says:

    So I’m now running here Caldera Network Desktop 1.0 as a guest with BusLogic drivers. This distribution is similar to Redhat 2.1. During installation the older BusLogic driver (kernel 1.2.13) is complaining about timeouts, but it seems to work so far.

  4. Michal Necasek says:

    Can’t say anything about Caldera 1.0 without trying it.

    What I can say in general is that the BusLogic is highly compatible with the AHA-154x, but drivers can distinguish between the two. The native BusLogic mode supports bus-master transfers anywhere in 32-bit address space while the 154x is limited to 24 bits (16MB). So nearly every OS has drivers for both AHA-154x and BusLogic HBAs. In some cases, the Adaptec drivers only work with Adaptecs (NT, OS/2), in others either the 154x or the BusLogic driver works with BusLogic HBAs (Solaris, Linux, various BSDs, really just about every driver not written by Adaptec).

    The fun part is that with many of the old OSes, the Adaptec and BusLogic drivers have different sets of bugs. Sometimes one is broken, sometimes other, in the worst case it’s both. From release to release the drivers change slightly, or sometimes completely, such as the BusLogic driver in Linux 1.3 vs 1.2 or so. The OS needs to be examined on a case by case basis, unfortunately.

    Perhaps a generalization but the drivers for NT and OS/2 do seem to be somewhat better written than the Unix-y ones.

  5. Michal Necasek says:

    Yes, but at least some of those are genuinely tough problems. The BusLogic driver only needs to time about 1 second or 60 seconds, and for such long timeouts it is a) bad to spin, and b) OS timing services can and should be used. The worst problem in old systems is usually timing in the 1 millisecond range, long enough that it matters but short enough that the OS may not offer suitable services. That’s the case of the infamous Runtime error 200 for example.

  6. Yuhong Bao says:

    “The older Adaptec 154x driver works on both Adaptec and BusLogic HBAs. In the NT 3.1 release, there are separate BusLogic (buslogic.sys) and Adaptec 154x (aha154x.sys) drivers and the Adaptec one no longer works on BusLogics. In one beta build, there’s the updated manufacturer-specific Adaptec driver but no BusLogic driver yet. ”
    Makes me wonder what other OSes did something similar happen?

  7. Michal Necasek says:

    In a way… OS/2 2.1 (I believe) shipped with AHA154X.ADD written by Adaptec. This driver does not work on BusLogic HBAs. In OS/2 Warp, BTSCSI.ADD was added to support BusLogic HBAs.

    FWIW, Adaptec’s DOS driver (ASPI4DOS.SYS) also won’t work on BusLogics. Of course BusLogic supplied their own drivers so it doesn’t matter much.

  8. Yuhong Bao says:

    Of course, old versions of both probably did work on BusLogic.

  9. This is like the soundblaster drivers for OS/2 that needed patches to work with other vendors like the MediaVision ThunderBoard…..

    So annoying!

  10. Michal Necasek says:

    Well, yes and no. Since the BusLogics were generally a superset of the AHA-154x, it was always possible to detect the difference; they were never a 100% clone. Adaptec understandably did not want to support other people’s hardware in their drivers, hard to blame them. And when an OS shipped both drivers (like NT), it was actually a good thing that the Adaptec drivers didn’t try to load on the BusLogics.

    With the OS/2 SB drivers, I’d blame the clones for not being properly compatible… I had one of those things (can’t remember exactly, but it wasn’t a MediaVision) and had all sorts of trouble in DOS games, too. A real shame since the brand name SoundBlasters were crap audio quality wise, unlike the MediaVisions.

  11. Yuhong Bao says:

    What seems to be fun is the XP setup floppies which includes the Adaptec driver but not the BusLogic.

  12. Michal Necasek says:

    Yeah, that’s a bit weird. It survived all the way to XP SP3, I wonder if they really tested it then.

    It may have something to do with the fact that by the time XP came out, BusLogic was long gone and Mylex also gone or on its way out, while Adaptec was still alive and kicking. Too bad because the BusLogic HBAs with 32-bit memory accesses are a lot more suitable for XP than an ISA adapter.

  13. Yuhong Bao says:

    The problem is of course that the Adaptec driver in XP is the one that can’t be used on BusLogics.

  14. Michal Necasek says:

    That was never the case though, the Adaptec drivers were careful not to load on BusLogic HBAs and vice versa.

  15. Yuhong Bao says:

    Actually I think the problem is that NT by then was very different from when NT 3.1 was released. NT 3.1 did not even have PCI support at all. This means that even if the Adaptec 154x driver can be made to work with BusLogics again it would be a bad idea.

  16. Michal Necasek says:

    I don’t know about “bad idea” but certainly “not so easy”. Yes, the BusLogics were significantly more capable with 32-bit addressing, and that applied to all the non-ISA variants I believe (certainly EISA and VLB, probably MCA). Adaptec’s EISA AHA-174x also had 32-bit addressing, but done very differently IIRC. And by the time Windows XP came out, the 16MB limitation was quite significant. Not a killer with some sort of buffering, but annoying.

    Interestingly, the BusLogic driver (buslogic.sys) included in Windows 2000 is quite different from the one in NT 4 and earlier. It appears that for inclusion in W2K, Mylex significantly reworked the driver.

  17. Yuhong Bao says:

    What I am talking about however is PCI BusLogics emulating an ISA Adaptec 154x card, like in VirtualBox.

  18. Michal Necasek says:

    Trouble is, that would then behave like an ISA device, and I get the impression that the non-PnP ISA support in XP barely works.

  19. Yuhong Bao says:

    When it is actually an ISA device, which is not the case in VirtualBox.

  20. MiaM says:

    My impression is that non-PnP ISA devices works in XP if you load the NT4 drivers. Then you lose the relevant power saving functions in XP.

    Not sure about disk controllers though, especially those that you would boot from.

    I’ve for example successfully used the excellent Turtle Beach 20/48 capable sound card in XP this way. One of the ISA cards with the best sound quality.

  21. Michal Necasek says:

    Hmm, that would be something to try. The source code for the aha154x.sys driver is on the DDK, and about the only difference between the NT 4 and XP version is power management. But that’s potentially a big difference. What I saw with XP is that the AHA-154x driver that comes with XP works with an Adaptec ISA HBA, but needs to be manually told the I/O address, IRQ, and DMA channel, despite the fact that the driver can detect all those.

    Which Turtle Beach card was that? TB made a lot of them, and some were just generic Sound Blaster compatibles while others (e.g. TB Multisound) were pretty unique custom designs.

  22. Yuhong Bao says:

    Not the point though. The point is that it would be very hard to make XP think that a PCI device is an ISA device.

  23. MiaM says:

    Michal Necasek Can’t remember the name, but it was the non-SB-clone card that were available with and without a Kurtzweil 2000 synth based MIDI board. Maybe it was something like Multisound Fiji or Multisound Pinnacle? (I know that I could google this…).

    I specifically used this card right up to about the time I bought a M Audio Audiophile 24/96 card, due to that the sound qualite of the Turtle Beach card were better than most other cards. (I’ve almost always only used sound cards to playback MP3/FLAC music and watch movies / tv shows, so can’t say anything about midi and game play though).

    Btw before I had that Turtle Beach card, I used a MediaVision Pro Audio Spectrum 16 card. Not sure if I used XP or Windows 2000, but the NT4 drivers did work (for audio) but you had to also install the drivers fot the SCSI controller on the card, otherwise the audio (and maybe the whole OS) would freeze for a second or two each time you changed any volume/mixer setting.

    I think that the power management can just be treated as an add-on and if the API’s are missing (not sure how the OS identifies this, maybe some version thing in the driver?) it just disables everything that would rely on the power saving functions, like sleep/hibernate and similar things.

    Yuhong Bao: Wasn’t the point kind of “can we get Adaptec drivers to work with Buslogic card”? 🙂

  24. Michal Necasek says:

    The one with a Kurzweil synth was the Pinnacle, and indeed the one without was Fiji. There were three generations of TB Multisound, and for various business reasons they all had different synths — the old ones had E-mu Proteus on them, the middle gen had ICS WaveFront, and the final generation had Kurzweils. The Pinnacle/Fiji had 20-bit DAC and ADC, quite decent, too. See more here.

  25. MiaM says:

    It must be a Fiji card then. Yeah, I remember that the sound quality using FooBar 2000 improved slightly using more than 16 bits playback. (Even though the source for MP3 files usually are 16 bit audio, the algorithms afaik outputs data that has more than 16 bits in the decoding process. For cards with no more than 16 bits FooBar 2000 also has a diether mode).

  26. Yuhong Bao says:

    Thinking about it, this is what happens when something like Adaptec 154x became a de facto standard but there was no effort at formalizing it as an important standard. I think the PCI SIG didn’t help either.

  27. Michal Necasek says:

    True, the AHA-154x was probably the most widely supported SCSI HBA ever. The BusLogic 32-bit extensions pretty seamlessly made it work in EISA/PCI machines, with bus-mastering to the full 4GB address space. The AHA-154x device model is interesting because it’s simple and at the same time can be very high-performance.

    The PCI SIG no doubt reflected the interests of its members. Graphics people were there early on, so VGA got 100% PCI support including funny stuff like palette snooping. Sound card vendors initially did not care and when they tried to make Sound Blaster compatible PCI cards a few years later, they had to invent four or six different, incompatible, and unreliable crutches in order to do so.

  28. Yuhong Bao says:

    And of course by the time XP came out it was probably too late to fix the problems.

  29. MiaM says:

    My conclusion is that the best thing would probably be if some large company made a crappy product as the first of it’s kind, and then produced a plethora of different products that required their own driver versions at the same time as other companies made their own products. I’m thinking about network cards, and I think that the crappy 3C500/3C501 NIC probably saved us from a lot of problems with software hardcoded for one kind of hardware.

    P.S. PCI soundblaster compatible cards seems like a product that were only relevant within a really short time span.

    Were there even made any computers without ISA slots before Win9x became obsolete as a gaming platform? Were SB compatibility even important win Win9x games?

  30. Yuhong Bao says:

    The Intel 810 chipset came out in 1999 without ISA slots.

  31. Michal Necasek says:

    Hehe, the 3C501 achieved immortality with the following comment in its Linux driver: “This is a device driver for the 3Com Etherlink 3c501. Do not purchase this card, even as a joke.”

    SB compatibility was never important in Win9x (Win32) games. But DOS games were, since they were actively developed at least until 1996-1997. 1999 was the year when ISA slots officially died, very much in the Win9x era. And yes, while PCI cards with SB compatibility were relevant only in a short time period, they were still relevant and there were probably at least a dozen different chips which did that, with varying degrees of hackiness.

    ISA sound cards were manufactured until 2000 or so, and they were entirely adequate for typical use (and SB compatible). Much like parallel ATA CD-ROMs they were not killed by the superiority of newer solutions but by their interface going away.

  32. Yuhong Bao says:

    For fun, look up the posts about 3C501 by Donald Becker on Usenet.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.