Recalibration Needed

The OS/2 Museum recently acquired two horribly slow old Western Digital IDE drives model WD93044-A. These were WD’s first foray into IDE hard disks, combining a rather outdated Tandon RLL drive chassis (3.5″ half-height stepper drives) with WD’s own controller chips.

An antique Western Digital IDE drive (1990)

One of the drives was rattling suspiciously so I set it aside. The other seemed to work just fine and apart from one bad sector, didn’t really exhibit any issues. I was able to use it in Linux through a Promise PCI IDE controller (and run ddrescue on it).

About three weeks later, the drive just wouldn’t work with the same Linux machine. It was not recognized by the Promise IDE controller and it was not recognized by Linux. It spun up just fine and made some kind of brief noise during detection, but was never found.

At this point I got very uncertain about which drive had been working before and plugged in the other drive. It was detected just fine, but ddrescue somehow found more errors than last time. I went back to photos I took initially and ascertained that the now-working drive was the one with the weird rattle that I hadn’t tested before, and the not-detected drive was definitely the one that worked not long ago. What happened there?

I had no idea what I might have done to break the drive. I didn’t drop it, I didn’t connect it wrong, it just sat untouched on a desk for a while (compared to its age—over than 30 years—it was a blink of the eye).

Then, I’m not even sure why, something occurred to me. This is a dumb enough drive that it doesn’t even auto-park. If the entire drive was read and then turned off, the heads might be somewhere at the end of the drive. What if the drive needs to be forced to seek to track zero in order to work?

So I plugged the drive into a DOS machine on a secondary controller where it would be untouched by the BIOS. I verified that the drive does not in fact respond to the IDENTIFY DRIVE command and plays somewhat dead.

Next I fired up DEBUG and issued a RECALIBRATE command:
o 177 10
writing 10h (RECALIBRATE command) to port 177h (command port on the secondary controller).

And sure enough, the disk started rattling, kept going for a few seconds, and then started working again!

A stepper motor. Ugh.

I am not entirely certain what the problem was. What I do know is that in any machine this drive was expected to work with, the BIOS would issue the RECALIBRATE command during initialization.

The ATA standard does not say anywhere that a RECALIBRATE command needs to be issued to a drive during initialization. And the vast majority of drives do not need that. Yet if the WD93044-A did require a RECALIBRATE command in order to initialize properly, no one might have even realized that because every system it was tested in did in fact issue a RECALIBRATE.

This may be a curious case of incompatibility between an old IDE drive and a new IDE host system caused by the host system not doing something the drive silently relies on. It’s a nasty situation because in the modern Linux system, the problem is difficult to diagnose and fix. Yet plugging the drive into an old DOS machine and booting up is enough to get it working again.

Unfortunately, I was still not able to get the drive working with Linux even after “reviving” the drive. On the Linux system, I was able to boot into FreeDOS, run DEBUG, and issue
o 101f 10
to send RECALIBRATE through the Promise Ultra TX2 PCI controller. After a Ctrl-Alt-Del, the Promise IDE controller recognized the WD drive… but Linux again didn’t, and confused the drive such that after a reboot, the Promise IDE controller would not recognize the drive anymore. Until I manually sent a RECALIBRATE.

I don’t even know what’s going on anymore… all I can say is that the drive boots DOS just fine and works in an old 486 (and it really benefits from a disk cache), but the Linux system refuses to work with it even when the IDE controller finds the drive. Even when the same drive worked in the Linux system not too long ago.

And the RECALIBRATE command definitely has something to do with the problem, but maybe it’s not the whole story.

Update (July 26, 2021): It wasn’t the whole story. The problem turned out to be incorrect configuration of the drive. The drive was jumpered as ‘master’ (jumper in the leftmost position) rather than ‘single’ (no jumper). This clearly confuses Linux. The problem is that when the WD drive is configured as drive 0, it assumes that there is a drive 1 responding when selected. When there is in fact no drive 1, the IDE channel may look to the host as if there were no drive at all since all register reads return with all bits set, or worse, as if there were a drive 1 that is permanently busy.

This entry was posted in Bugs, IDE, Western Digital. Bookmark the permalink.

32 Responses to Recalibration Needed

  1. David C. says:

    I’m wondering, given the old age and type of the drive if the heads may have drifted out of alignment with the tracks, causing intermittent failures. This happened to me on an old Seagate ST296N SCSI drive.

    In my case, running a low-level format utility (which should be perfectly safe on a drive with stepper motors) fixed the problem. The new tracks all align perfectly with the heads and the drive works fine.

    I don’t know if this has ever been possible with an IDE drive, but if it is, you might want to give it a try.

  2. Michal Necasek says:

    The thing is, it’s not intermittent. The behavior only depends on the host the drive is plugged into. Drive works perfectly fine in a 486, can’t be detected in a much newer Linux machine. I put the drive back in the 486, works fine again. I’d be perfectly willing to believe that the drive is failing, but then it should be failing in every host.

    I’m certain the RECALIBRATE command is doing something. The drive seems to be expecting certain commands to come from the host and when that’s not happening, it gets upset.

    As for a low-level format, it might be possible on this kind of drive but I’m not sure. It’s dumb enough that it could do something, but again I don’t think it’d cure this particular issue.

    FWIW, the drive has one bad sector and it’s quite consistent, it’s just that one sector and the rest of the drive reads fine.

  3. Michal Necasek says:

    Actually not only is low-level formatting possible with some IDE drives, it can even damage them 🙂 Seagate explicitly warns against that, IIRC for their ST-157A family. A low-level format will trash the bad sector reassignment on those drives.

    On most IDE drives, a low-level format simply does nothing. But there are exceptions.

  4. Michal Necasek says:

    The two drives I have are still working more or less fine (a couple of bad sectors, but hey that’s after 30 years). But they are SLOOOOOOOOW. And loud. Without a disk cache, things like starting Windows 3.1 are a very drawn-out and loud process.

    There were definitely far better alternatives on the market at the time. Conner and Quantum come to mind. Different worlds.

  5. Octocontrabass says:

    Perhaps the solution here is to install Linux on the 486.

    Western Digital was kind enough to provide a low-level format utility called “ISPFMT” for these drives, and it seems like it’s pretty readily available. That should resolve alignment issues, if there are any.

  6. rasz_pl says:

    1 hackaday”IDE BUS SNIFFING AND HARD DRIVE PASSWORD RECOVERY” covered snooping on the data/command exchange using Open Bench Logic Sniffer, $50 32 channel Logic Analyzer, sadly no longer manufactured/available. Nowadays cheapest 32 channel ones start at $200. You can ~accomplish this task with $5 “CY7C68013A-56 EZ-USB FX2LP USB2.0 Develope Board Module Logic Analyzer EEPROM” (yes with a typo in the name, same hardware as first Saleae logic). It only offers 16 channels meaning some guessing whats happening will be involved :(. Would let you know what exactly are different BIOSes doing with drives without decompilation.

    2 low-level format – how would that happen on a PATA drive that never lets you see real drive surface? Would have to be a secret command similar to what ASPIFMT does sending SCSI Command 4 “FORMAT UNIT” (never actually called low-level format by T10 Technical Committee). Not even PC3000 does low level format, afaik nothing user accessible does. PC3000 does a thing called “SelfScan” – regenerate P-List and zero out G-List (sector mapping, bad sector list, sector translation). Its a fire and wait until drive reports success/error type of command, non interactive no progress bar, drive internal type of deal.
    Low-level format would be writing Track and Sector markers to a raw disk medium like floppies or C800 low level format on MFM/RLL drives (altho this didnt do Track markers because fixed position tracks due to stepper motor/no voice coils yet). “SelfScan” and SCSI “FORMAT UNIT” can only scan for defects and reinitialize P-List/G-List/T-List, they all operate at translation level, one above the physical sector hardware level.
    Programs like SEATOOLS have a command called ‘low level format’, but all it does is zero fill to trigger internal SMART bad block relocation.

    Octocontrabass ISPFMT – this is interesting. It directly lists CONNER CP342, CP3022 drives. Wonder how it performs its tasks, single undocumented command? Afaik both drives still use stepper motor arm actuators meaning you could in theory low level format them like ST-506 with undocumented low level raw track access.

    Wiki https://en.wikipedia.org/wiki/Servowriter

    Berkeley paper “A Tutorial on Control Design of Hard Disk Drive Self-Servo Track Writing” by Jianbin Nie and Roberto Horowitz:
    > Modern HDDs generate position feedback signals from special magnetic patterns called servo patterns which are written in designated areas on the disk surface known as servo sectors. The generated feedback signals are called position error signals (PES)… the servo sectors are created at the time of manufacturing and are never overwritten or erased. Then, the closed-loop servomechanism decodes the position information written in these sectors to accomplish adequate control tasks.

    IBM patent US7079347B2 “Method and apparatus for providing a marker for adaptive formatting via a self-servowrite process” from 2002:
    >Traditional servo writing has been performed in a clean room environment with external sensors invading the head disk assembly to provide the precise angular and radial position information to write the servo patterns. For example, an external clock head was typically disposed on the disk outer diameter.
    >Currently, determining the number of available tracks from the self servowrite process requires a trial and error process or requires that the track count be sent ahead from the servowriter to the function test station.

    Hitachi website describing same patent https://www2.hgst.com/hdd/library/whitepap/tech/servowrite.htm

    Would this mean voice coil drives are in a way hard sectored? As in multiple servo patterns per track determine sector positions?

  7. rasz_pl says:

    Another pretty sweet resource about HDD characteristics and internal structure, Henry Wong “Discovering Hard Disk Physical Geometry through Microbenchmarking” blog post https://blog.stuffedcow.net/2019/09/hard-disk-geometry-microbenchmarking/

  8. rasz_pl says:

    Did it just eat my “waiting for moderation post? 🙁

  9. Michal Necasek says:

    No, it was just waiting for moderation. Which is not always instant 🙂 Sorry for the confusion.

  10. Michal Necasek says:

    For me, decompiling the BIOS is probably simpler/faster/cheaper than a logic analyzer which I don’t have and don’t know how to use 😀

    As for LLF: “A PATA drive” turns out to encompass a shockingly large variety of implementations. Anything non-ancient will definitely ignore any attempt at low-level format. But some old IDE drives were just ST506 RLL designs with an IDE controller tacked on. Quite a few old IDE drives had stepper motors; the two manufacturers I know of that I’m pretty sure never had any stepper IDE drives were Conner and Quantum (probably CDC/Imprimis too). WD and Seagate had stepper IDE drives, so did several other vendors.

    This document explains old Seagate (and Imprimis/Seagate Swift) drives and it’s pretty clear that LLF did a lot on those drives. The previously mentioned ISPFMT utility also sounds very much like a real LLF of the WD stepper drives.

    The CP342 is definitely voice coil/embedded servo, AFAIK there never were any Conner stepper drives. The mention of “J8 1-2 for slave to original CONNER CP342, CP3022” is just a jumper setting on the old WD drives.

    I do think that embedded servo IDE drives are effectively hard-sectored. I am not quite sure what are modern enterprise SCSI drives that all have embedded servo information yet can be reformatted to a different sector size (nowadays one of several formats, ~20 years ago a fairly wide range). I could be wrong but my understanding is that the servo information primarily tells the drive on which track the head is, not necessarily which sector.

    Actually I’m not even sure the “hard-sectored” terminology really applies to non-antique drives. Truly hard-sectored floppy/hard disks had some kind of hardware-induced pulse at the start of each sector, as opposed to soft-sectored ones where some kind of ID mark had to be written/decoded to indicate the start of a sector.

  11. LightElf says:

    Just my $0.01. Some old ATA drives used “Format Track” command for defect-list manipulation.

  12. rasz_pl says:

    I got confused because usually “waiting for moderation” posts are still visible to the poster, but when I made second one with a link to that awesome Microbenchmarking blog the first one disappeared.

    LightElf wasnt “Format Track” an ESDI command? 40h is READ VERIFY SECTOR(S) in ATA.

    SGATFMT states its “Lo-level format (MFM, RLL, ESDI, and some SCSI)” so analogous to C800 debug jump. Guessing it only works on SCSI drives that have 506 bridge inside, Adaptec ACB-4000A Atari drives for ST etc.
    “AT (IDE) drives can be divided into three separate scenarios: Early, Swift and ZBR.”
    Early lists ST157A: “SGATFMT lists these drives only as a fall back option” but still suggests its able to write raw sectors, undocumented commands?
    Swift, for example ST1239A : “Future revisions of SGATFMT are planned to support a re-sectoring lo-level format for SWIFT AT drives in physical mode.” “true physical mode”, again sounds like secret commands for mode switching/raw access.
    finally ZBR “these drive are ALWAYS in translation mode and immune to a re-sectoring lo-level format.”

    it has ST280A Wren II on the list, which looks like voice coil drive. I cant get my head around low level formatting a drive with crucial markers stored on every track. Afaik low level formatting MFM/RLL drive works by just blasting full track content at a time, how would you know where to pause not to overwrite Servo Sectors?

    >ISPFMT utility also sounds very much like a real LLF of the WD stepper drives.

    again suggests those drives have undocumented low level access commands

    > is just a jumper setting on the old WD drives

    oh, now I understand, “_slave_ to original CONNER” so obvious /slaps forehead

    >enterprise SCSI drives that all have embedded servo information yet can be reformatted to a different sector size

    afaik SCSI “FORMAT UNIT” command is only able to manipulate translation tables, like some SATA drives with ability to switch between emulated 512 and physical 4K sectors.

    Servo Sectors encode both track number and angle. You are right, its not true Hard Sectoring. I somehow got this idea after looking at “Sector Angular Position” chapter of “Discovering Hard Disk Physical Geometry through Microbenchmarking”. All generated Angular position plots sure look deliberate and fixed, plus inability to re-sector. At most one could maybe use PC3000 to reshuffle sectors around in the translation table.

    Its really funny to me how primitive ST157a layout looks like, its barely above a HD floppy. Its almost like Seagate didnt really try all that hard to improve density in 1989. 650 vs 135 tpi, 10000 vs 17434 bpi. Even Commodore 1541 from 1982 already implemented ZBR. ST157a could probably lose one platter by going ZBR.

  13. Michal Necasek says:

    50h is the AT/ATA FORMAT TRACK command. WD’s ISPFMT.EXE definitely issues that. But it also issues a lot of weird E0h commands that look very drive specific.

    The ST157A is indeed extremely primitive, and it wasn’t the only such IDE drive (WD Centaur, Microscience/Maxtor 8051A, probably all Kalok drives, and more). Seagate needed 6 heads for 40 MB, while Conner only needed 4 with their old CP-341/342 and at about the same time the ST-157A came out, Conner already had the CP-3044 which achieved 40 MB with just a single disk. To be fair, the ST157 chassis was already about 2 years old when the ST157A came out, but still. The CP-3044 had 1,400 TPI compared to 824 TPI in the ST157A (says DISK/TREND REPORT).

    Note that Conner also wasn’t using ZBR in their old drives. Some CDC/Imprimis Wren drives started using ZBR circa 1987, but those had a dedicated servo surface. In 1990, there were surprisingly few drives that had both embedded servo and ZBR, and essentially none that had all of embedded servo, ZBR, and voice coil actuator. I wonder why, because ZBR was well known by then, and it seems like a really obvious way to get more data on the same surface. Maybe the variable bit rate was a problem.

    I honestly don’t know how it works with SCSI drive formatting. For example the Barracuda 180 (2001) supports “512 to 4,096 bytes per sector in even number of bytes per sector”. Is it all just reshuffling translation tables? Not sure.

  14. rasz_pl says:

    I found Format Track (50h: Vendor Specific) specification

    >The Format Track command formats a single logical track on the device. Each good sector of data on the track will be initialized to zero with write operation. At this time, whether the sector of data is initialized correctly is not verified with read operation. Any data previously stored on the track will be lost.
    >In LBA mode, this command formats a single logical track including the specified LBA.

    it takes track and Cylinder/LBA as parameters. All it does is zero fill. Another nail in the head of low level formatting IDE drives 🙁

  15. Michal Necasek says:

    The reason why it’s marked “vendor specific” in the spec is of course because different drives do very different things in response to FORMAT TRACK.

    I’m not sure what you were looking at but ATA-1 is pretty explicit: “The implementation of the Format Track command is vendor specific. The actions may be a physical reformatting of a track, initializing the data field contents to some value, or doing nothing.” That kind of language in ATA-1 should typically understood as “this is what drives in the field are actually doing”. The text further says that FORMAT TRACK should at least overwrite the existing data since users (i.e. drive management utilities) may rely on that.

  16. LightElf says:

    “Format Track” 50h command needs parameter buffer (512 bytes), and at least some drives of discussed age actually supports that functionality.
    “One 16-bit word represents each sector, the words being contiguous from the
    start of a sector. Any words remaining in the buffer after the representation
    of the last sector are filled with zeros. DD15-8 contain the sector number.
    If an interleave is specified, the words appear in the same sequence as they
    appear on the track. DD7-0 contain a descriptor value defined as follows:
    00h – Format sector as good;
    20h – Unassign the alternate location for this sector;
    40h – Assign this sector to an alternate location;
    80h – Format sector as bad. “

  17. Octocontrabass says:

    Most of the E0h commands used by ISPFMT line up with the ESDI commands to change the data strobe offset. Looking in a disassembly I see a lot of 6200h (early offset one), 6300h (late offset one), and 6000h (restore offset to zero).

    No idea what E001h is supposed to do, though. ESDI lists it as a vendor-unique soft switch.

  18. Michal Necasek says:

    Yes, and we know that WD’s ESDI controller used the E0 command as a direct ESDI command passthrough.

    The Centaur drives use the same controller chips that were used in WD’s RLL/ESDI controllers, so it’s entirely plausible that some of the firmware logic was reused. There is also some possibility that even though the command codes are the same, it means something totally different.

  19. rasz_pl says:

    FORMAT TRACK I cited was from >2000 WD specs 🙁
    So the one in early ATA specs matches ‘IBM PC AT Fixed Disk and Diskette Drive Adapter’ MFM controller documentation. Makes sense considering how early drives were build. The mystery to me is how/if early voice coil drives also supported low level formatting, like the ST280A listed by SGATFMT. Its probably all documented very well on some old Russian hdd recovery forums.

  20. Michal Necasek says:

    The ST280A has a voice coil actuator… but it’s also a Wren II half-height, and I’m pretty sure all Wren drives have a dedicated servo surface. The Wren II HH definitely does. So those drives should be capable of reformatting the data surfaces.

  21. Fernando says:

    Just a though but probably you could try disabling DMA and using PIO, probably even decreasing the PIO mode in your Linux machine, could be that the computer it’s not communicating correctly with the drive.

  22. Michal Necasek says:

    The drive can’t do any DMA to begin with, so that shouldn’t be a problem.

  23. Michal Necasek says:

    So this is interesting: I don’t know how seriously it can be taken, but in the first word of IDENTIFY DRIVE, the WD 93044-A, Seagate ST157A, and Kalok KL343 report themselves as soft-sectored. All Conner and Quantum IDE drives, as well as newer WD and Seagate drives, report themselves as hard-sectored. It would make sense if the soft-sectored ones could be low-level formatted (not just pretend).

  24. Yuhong Bao says:

    ” Its almost like Seagate didnt really try all that hard to improve density in 1989. 650 vs 135 tpi, 10000 vs 17434 bpi. ”
    Of course, DOS 3.3 with the 32MB limit was still popular back then too.

  25. Octocontrabass says:

    Given how much it acts like an ESDI controller with an ESDI drive attached, it would make sense for it to be soft-sectored.

    It’s too bad WD hid the chips on the other side of the PCB. I’d like to see if there’s a whole ESDI controller in there.

  26. Michal Necasek says:

    There are more or less the same chips that a WD ESDI controller would have, but then I don’t know exactly what an ESDI drive would have. I don’t think it’s really an ESDI drive + controller, that wouldn’t make much sense, they didn’t need the ESDI drive/controller interface for anything.

    I’ll try to put up a photo of the WD Centaur PCB later today.

  27. Octocontrabass says:

    Did you park the drive when you turned it off? Maybe the real reason it only worked the first time is that the drive’s firmware gets confused when the heads aren’t where it thinks they should be.

  28. Michal Necasek says:

    No, but it should not be necessary to park the heads before powering off the drive, only before moving it. At any rate, when I forcibly recalibrate the drive, the heads will be at a known location… but then Linux confuses the drive somehow anyway.

    My suspicion is that the drive expects a certain sequence of commands and gets upset when it doesn’t get it. In the old days BIOSes were fairly uniform at what commands they sent to drives during POST, but modern OSes don’t do that.

  29. MiaM says:

    Re low level format:

    I’m not 100% sure but I somehow got the impression that Spinrite somehow manages to reformat some IDE drives (like it did with MFM drives).

    I don’t think that will do anything to make the drive work in the Linux computer.

    How new/old is the Linux kernel? Maybe Linux insists on using LBA and when that fails it doesn’t even report that it found the drive? Or perhaps the drive doesn’t respond to the drive type inquiry the way Linux wants. (As an example, the 60MB 5.25″ half height Connor drive that were for example fitted in some Compaq Deskpro 386/20 reports one more sector than the drive actually has. That results in error messages though, at least on Linux kernel versions from about 20 years ago).

  30. Michal Necasek says:

    The Linux kernel in question was something in the late 3.x or early 4.x series, not recent but also not that old. And no, Linux does not require LBA. The problem turned out to be incorrect jumper settings on the drive, which can clearly confuse Linux into thinking the drive is not there (drive is jumpered to “master” position but with no slave, once the slave is selected, the drive seemingly vanishes from the bus).

    IDE low-level formatting can do anything. From a real LLF like on ST-506 MFM/RLL drives all the way to nothing, although the IDE spec recommends that the sector data should at least be overwritten. If Spinrite managed to LLF an IDE drive, it was because that’s what the drive decided to do, not because of some Spinrite magic.

  31. MiaM says:

    I would assume that Spirite does it’s thing based on doumentation obtained from the hard disk manufacturers, or possibly some trial and error / reverse engineering and whatnot. Since it was about the only program in it’s class and it afaik was well respected by everyone back in the MFM days I see no really strong reason for hard drive manufacturers to not cooperate (likely requiring an NDA). But this is just a qualified guess. I haven’t ran Spinrite on anyting newer than regular MFM / ST-506 disks.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.