A couple of months ago the OS/2 Museum got hold of a 13.6 GB Fujitsu MPE3136AT IDE drive from 1999. The drive was working… more or less. It behaved quite strangely; the drive was detected and readable, but seemed oddly slow. It should have been capable of Ultra DMA transfers but delivered data at just under 2 MB/sec.
Looking through Linux
dmesg output, it was apparent that the system was trying to communicate with the drive at Ultra DMA speeds, but kept falling back to slower PIO speeds due to CRC errors. Oddly, the drive vendor was also shown as FUBITSU, rather than FUJITSU as one would expect.
And looking at the data the drive returned, it was clear that it was somehow corrupted. For example messages in the boot sector clearly had some letters wrong, but only some.
What could possibly cause such a problem?
The thing to remember is that when a drive reads a sector off the medium, the data is validated through CRC or other methods. If the CRC does not match, the drive reports an error, but that was not happening here. Either the data recorded on the drive was very strange, or it was getting corrupted somewhere between the drive medium and the host system.
On closer look, there was a pattern to the corruption. Looking at the easily recognizable ASCII text in the drive’s boot sector and elsewhere, it was clear that every even byte was fine, and some but not all odd bytes were corrupted. Knowing that IDE has a 16-bit wide data path, that makes some sort of sense.
The buffers (DRAM) on the drive could be bad, but the drive firmware should have noticed that. The IDE cable could be bad, but the same cable/controller/host was used with other drives with zero problems, so that was highly unlikely to be the source of the trouble.
Remember the CRC errors mentioned earlier? The drive was calculating different CRC from the host. That means the drive sent something other than the host received. Assuming that the data path from the host CPU up to the end of the drive cable is good, and everything on the drive side from the medium at least up to and including the drive buffers is also good, that does not leave many places where the data can get corrupted.
Let’s think about the corruption a bit. How can FUJITSU turn into FUBITSU? Why is only one letter wrong? Let’s compare the good and bad hex ASCII codes:
F U J I T S U 46 55 4A 49 54 53 55 <-- good -------------------- 46 55 42 49 54 53 55 <-- bad F U B I T S U
To go from ASCII J to B, it just takes bit 3 to flip from one to zero. And look: F, T, and U (also in odd bytes) did not change because they have bit 3 clear already. If bit 3 in every odd byte, i.e. bit 11 in a 16-bit data path, was consistently forced to zero, we’d get exactly this kind of corruption. Even bytes are untouched, and odd bytes change if and only if they initially have bit 3 set.
But why would the bit be forced to zero? Is there some obvious damage visible on the drive that I missed when initially plugging it in? Why yes, there is! That bent pin on the back of the IDE connector does not look right at all:
Here’s what it looks like from the connector side:
It’s not entirely clear from the photo but the pin was bent at 90 degrees, it was lying flat against the rear side of the connector.
Now, what does that pin do? Let’s take a look at the pinout on Wikipedia; note that the image shows the cable pinout, so we have to mentally flip it left to right. The bent pin is the fifth from the end in the bottom row, i.e. pin 10, which happens to be data bit 11. That is entirely consistent with the data corruption we’ve seen—bit 11 is not connected and always reads as zero! The corruption happens effectively on the cable between the drive and host, and the Ultra DMA CRC checks are designed to catch exactly that kind of problem. And they did catch the problem… only the host “cleverly” scaled down the transfer speed to a mode which performs no CRC checks, and happily delivered corrupted data.
Now we understand exactly why the data was getting corrupted, but how did the pin get bent that way? I’m honestly not sure—it was bent already when I got the drive, and because it was completely out of the way, I didn’t notice anything unusual when plugging in the cable.
It is unusual for a single pin in the middle to be bent, normally it’s several pins on either end getting bent when plugging in and especially unplugging the cable. I can only guess that the pin got somewhat bent initially and then someone forced the data cable in really hard, pushing the pin partially back out of the connector and bending the part that was still within the connector completely flat.
Needless to say, fixing the drive was not very hard. With careful needle-nose pliers action, I straightened the pin out and pulled it forward. Then I plugged in the IDE cable while making sure the pin couldn’t be pushed back again.
After completeing the surgery, the drive started working normally. It was able to operate at Ultra DMA speeds with no CRC errors, and the corruption was gone. Problem solved!