The OS/2 Museum recently acquired two horribly slow old Western Digital IDE drives model WD93044-A. These were WD’s first foray into IDE hard disks, combining a rather outdated Tandon RLL drive chassis (3.5″ half-height stepper drives) with WD’s own controller chips.
One of the drives was rattling suspiciously so I set it aside. The other seemed to work just fine and apart from one bad sector, didn’t really exhibit any issues. I was able to use it in Linux through a Promise PCI IDE controller (and run
ddrescue on it).
About three weeks later, the drive just wouldn’t work with the same Linux machine. It was not recognized by the Promise IDE controller and it was not recognized by Linux. It spun up just fine and made some kind of brief noise during detection, but was never found.
At this point I got very uncertain about which drive had been working before and plugged in the other drive. It was detected just fine, but
ddrescue somehow found more errors than last time. I went back to photos I took initially and ascertained that the now-working drive was the one with the weird rattle that I hadn’t tested before, and the not-detected drive was definitely the one that worked not long ago. What happened there?
I had no idea what I might have done to break the drive. I didn’t drop it, I didn’t connect it wrong, it just sat untouched on a desk for a while (compared to its age—over than 30 years—it was a blink of the eye).
Then, I’m not even sure why, something occurred to me. This is a dumb enough drive that it doesn’t even auto-park. If the entire drive was read and then turned off, the heads might be somewhere at the end of the drive. What if the drive needs to be forced to seek to track zero in order to work?
So I plugged the drive into a DOS machine on a secondary controller where it would be untouched by the BIOS. I verified that the drive does not in fact respond to the IDENTIFY DRIVE command and plays somewhat dead.
Next I fired up DEBUG and issued a RECALIBRATE command:
o 177 10
writing 10h (RECALIBRATE command) to port 177h (command port on the secondary controller).
And sure enough, the disk started rattling, kept going for a few seconds, and then started working again!
I am not entirely certain what the problem was. What I do know is that in any machine this drive was expected to work with, the BIOS would issue the RECALIBRATE command during initialization.
The ATA standard does not say anywhere that a RECALIBRATE command needs to be issued to a drive during initialization. And the vast majority of drives do not need that. Yet if the WD93044-A did require a RECALIBRATE command in order to initialize properly, no one might have even realized that because every system it was tested in did in fact issue a RECALIBRATE.
This may be a curious case of incompatibility between an old IDE drive and a new IDE host system caused by the host system not doing something the drive silently relies on. It’s a nasty situation because in the modern Linux system, the problem is difficult to diagnose and fix. Yet plugging the drive into an old DOS machine and booting up is enough to get it working again.
Unfortunately, I was still not able to get the drive working with Linux even after “reviving” the drive. On the Linux system, I was able to boot into FreeDOS, run DEBUG, and issue
o 101f 10
to send RECALIBRATE through the Promise Ultra TX2 PCI controller. After a Ctrl-Alt-Del, the Promise IDE controller recognized the WD drive… but Linux again didn’t, and confused the drive such that after a reboot, the Promise IDE controller would not recognize the drive anymore. Until I manually sent a RECALIBRATE.
I don’t even know what’s going on anymore… all I can say is that the drive boots DOS just fine and works in an old 486 (and it really benefits from a disk cache), but the Linux system refuses to work with it even when the IDE controller finds the drive. Even when the same drive worked in the Linux system not too long ago.
And the RECALIBRATE command definitely has something to do with the problem, but maybe it’s not the whole story.
Update (July 26, 2021): It wasn’t the whole story. The problem turned out to be incorrect configuration of the drive. The drive was jumpered as ‘master’ (jumper in the leftmost position) rather than ‘single’ (no jumper). This clearly confuses Linux. The problem is that when the WD drive is configured as drive 0, it assumes that there is a drive 1 responding when selected. When there is in fact no drive 1, the IDE channel may look to the host as if there were no drive at all since all register reads return with all bits set, or worse, as if there were a drive 1 that is permanently busy.