Nobody Expects…

…the Spanish Inquisition!

Well, that too, but also nobody expects that a bland, run-of-the mill Novell NE2000 NDIS driver would crash/hang just because it runs on 486 or later CPUs.

I wanted to try the “basic” DOS redirector shipped with Microsoft’s LAN Manager 2.0 (1990) and more or less by a flip of a coin I decided to use the NE2000 NDIS driver that came with the package. Previously I had no trouble with Microsoft’s NE2000.DOS driver shipped with LAN Manager 2.1 and Microsoft’s Network Client 2.0.

But the old LAN Manager NE2000.DOS driver (16,342 bytes, dated 11-19-90, calls itself version 0.31) loaded and then promptly hung as soon as Netbind was started:

Netbind hangs with LAN Manager 2.0 NE2000 driver

At first I naturally suspected some problem with the card configuration or the NIC hardware, but what I found was much more surprising.

The reason the driver hung actually wasn’t related to networking at all. The driver hung in a routine that was clearly trying to detect the CPU type. How can someone screw something so simple so badly? Well…

The problem perhaps illustrates the abusive relationship between Intel and Microsoft. Intel told developers how to detect the CPU generation (before CPUID simplified things). Microsoft went ahead and completely ignored that advice, perhaps feeling safe in the knowledge that Intel wouldn’t dare break Microsoft’s code. Except in this case Intel had no choice.

A bit of background: The NE2000 driver has a good reason to detect whether it’s running on a 286 or later CPU. If it does, it can use REP INSW and REP OUTSW instructions to get data to and from the card. Even if the NE2000 were in an 8-bit slot, the driver could still use REP INSB and REP OUTSB, significantly superior to pushing data one byte at a time in a loop.

The driver has a less pressing but still sensible reason to detect a 386 CPU. It can use REP MOVSD for memory copies, which might have some noticeable impact on the network driver performance.

The driver has in my opinion absolutely no need to detect whether the CPU is a 486, but it does, and that’s where things go wrong. The CPU detection in the driver is interesting and flawed enough that I’ll quote it in full (the labels and comments are mine):

sgdt_buffer     db 6 dup(0FFh)

detect_cpu      proc near
                push    bp
                mov     bp, sp
                push    sp
                pop     ax
                cmp     sp, ax          ; Will be equal on 80186+
                jz      short not_8086
                mov     ax, 1           ; Indicate 8086
                stc
                jmp     short quit
not_8086:
                ; 286 won't write last byte
                sgdt    fword ptr ds:sgdt_buf
                sar     ds:sgdt_buf+5, 1 ; Low bit to carry flag
                jnc     short not_286   ; Will be set on 286  
                stc
                mov     ax, 11h         ; Indicate 286
                jmp     short quit
not_286:
                mov     ebx, cr0        ; Read CR0, save in EBX
                mov     eax, ebx
                xor     eax, 20000000h  ; Try flipping WT bit
                mov     cr0, eax
                mov     eax, cr0        ; Read new CR0 value
                mov     cr0, ebx        ; Restore original value
                cmp     eax, ebx        ; Did CR0 actually change?
                jnz     cr0_differs
                clc
                mov     ax, 20h         ; CR0 unchanged: 386
                jmp     short quit
cr0_differs:
                mov     ax, 40h         ; CR0 changed, must be 486+
                clc
quit:
                mov     sp, bp
                pop     bp
                retn
detect_cpu      endp 

The first part of the code is actually quite common and takes advantage of the fact that on the 8086/8088, PUSH SP actually pushed the new value of SP on the stack, while all later CPUs push the old value. The detection code, if anything, demonstrates why the 8086 behavior didn’t make sense: A sequence of PUSH SP / POP SP actually changed SP, and on newer CPUs it does not.

The next part is where things start getting problematic. The code assumes that if the CPU is not an 8086, it must be at least an 80286 and will be able to execute a SGDT instruction. It is true that PCs with 80186 were not at all common, but this code would crash and burn on an 80186 because SGDT does not exist there.

On a 286 and later, SGDT will execute happily, and due to rather questionable design, it is not a privileged instruction. Now, the SGDT instruction is a bit funny, or rather Intel’s documentation of SGDT/SIDT is. The code is written to assume that a 286 will either store the sixth byte as all ones (which I believe is what happens) or not write it at all. The code also assumes that a 386+ SGDT will always write six bytes, which in fact happens despite what Intel’s documentation might say.

The code further assumes that the high byte of the 32-bit GDT base address won’t have its low bit set. That’s actually a poor assumption, because although it will be true after CPU reset, the CPU might be running in V86 mode or it might have switched to protected mode and back, and there’s no telling what GDTR might contain. Sure, if you are absolutely certain the PC can’t have more than 16 MB RAM, then the high byte of GDTR probably will be clear, but it’s just not a safe assumption.

Using the SIDT instruction would have been slightly better because at least in real mode it has to point at the IVT, but even then it might confuse the detection in V86 mode. In other words… there’s a reason why this method of detecting a 386 isn’t often used.

That said, the worst that can happen is that the driver thinks it’s running on a 286 when it’s really running on a 386 or newer processor, and it might run a little slower but most likely no one will even notice.

But now we get to the real problem, which is 486 detection. Again, I don’t know why the code is even trying to detect a 486, since the NE2000 driver really does not care whether the CPU is a 386 or 486. I can only assume the detection routine was copied and pasted from somewhere else.

At any rate, the detection routine tries to flip the WT bit in the CR0 register; if the bit does not change, the CPU must be a 386, if it does change it’s assumed to be a 486.

This is an ostensibly bad idea because moves to and from CR0 are privileged instructions (unlike SGDT). Such detection would be fine in, say, initialization code for OS/2 or NT, but it’s not that great in a piece of DOS code that may run with a memory manager etc. But that’s not the worst problem with it.

Now, it is important to underscore that this driver is timestamped November 1990. It could well be a couple of moths older, and almost certainly is. There were not a lot of 486s around in mid-1990, but there were some, and Microsoft certainly would have had a few.

Whoever wrote the code clearly looked at Intel’s initial i486 datasheet from April 1989, Intel Order Number 240440-001. On page 24, it says that setting the CR0. WT bit will enable internal cache write-through and invalidates. Since the early 486 models had no write-through cache, the bit was actually a no-op but still could be flipped.

Except… whoops. Intel’s original 486 design was to enable cache by setting the CR0.CE bit (that is bit 30 of CR0), which is perfectly logical, only it turned out to be a really bad idea. Because nearly all existing 386 code then promptly disabled the cache when updating CR0 and clearing the CE bit.

Intel therefore revised the 486 such that the CE (Cache Enable) and WT (Write Through) bits were renamed to CD (Cache Disable) and NW (Not Write-Through) and their meaning was inverted. Existing 386 code that wrote those bits as zero would then keep caching and write-through enabled. The new meaning of the bits was well documented on page 18 of the updated i486 datasheet from November 1989 (Intel Order Number 240440-002).

For the detection code in the NE2000 driver this change had the unfortunate side effect that the bit combination of CD clear, NW set became invalid, while previously CE clear, WT set was valid. In the typical scenario where the CD and WT bits are both clear (i.e. caching and write-through enabled) when the NE2000 driver CPU detection code runs, flipping the WT bit produces an invalid combination and GP-faults when the CR0 register is written. That is exactly why I saw the driver hang.

Now here’s the weird thing: The combination of cache enabled, write-through disabled was always documented as invalid, even on the earliest i486. Flipping the WT/NW bit only works if the cache is disabled. Either the detection could crash even on those early 486s (possible) or the invalid combination was actually accepted (also possible).

It is very likely that the NE2000.DOS driver shipped with LAN Manager 2.0 saw at best minimal testing on 486 machines. It is possible that it was only tested on early revision 486s, or not at all. The CPU detection code was safe enough on 386 and lower CPUs, only on a 486+ it was prone to crashing.

As usual, this shows the danger of knowing too much. If the authors weren’t trying to show how clever they were by trying detect a 486 in (probably) 1989, the code would have worked. If they actually followed Intel’s guidelines and detected a 486 by trying to flip the EFLAGS.AC bit, the code would have worked too.

Instead this is another example of poorly tested or perhaps entirely untested code ending up in production software, lying in wait until users upgrade their hardware, and then springing a nasty surprise on them.

Many programmers didn’t like Intel’s official CPU detection code because it was kind of big and clunky, and wasn’t always perfect, but at least it worked better than the abomination in the old Microsoft NE2000 driver.

This entry was posted in 486, Bugs, Intel, Microsoft. Bookmark the permalink.

7 Responses to Nobody Expects…

  1. Yuhong Bao says:

    I think Intel also changed the opcode for CMPXCHG, which used to be the same as IBTS/XBTS.

  2. Michal Necasek says:

    Yes. Because that (recycling the opcode) upset code trying to detect old broken 386 steppings.

  3. Antoine says:

    You are, as usual, quite right on the topic on the 486.
    On a slightly side topic, there is something else where this code is not completely correct: as it happened, INSB/W and OUTSB/W were introduced in the 80186/80188 (and I clearly remember having read code like this for this exact reason, taking advantage of that feature); in other words, detection of the 80286 and hence the protected mode features was not strictly necessary either…
    In fact, while reading, I realize you know it (see your comment about the 80186+, which is not correct: a 186 pushes SP the same way as an 8086 and unlike a 286; the classical code to detect a 186 involves shifts for 32 bits or more, not push SP.) So in fact, there is no problem of a potential failure of executing SGDT on a 186.
    The real problem is that the code (or the coder) is not knowledgeable enough and quite simply, ignores the 80186/8 processors (treating them as mere 8088s) and go directly for the big fish… and the fails spectacularly!

  4. Michal Necasek says:

    You’ve piqued my curiosity about the 80188/80186. It’s pretty clear that a lot of confusion (including that NE2000 driver CPU detection) stems from the fact that there were almost no PCs with an 80188/186 processor. Not zero but very few. Another compounding factor is that Intel’s CPU documentation tends to ignore the 186 entirely, and only talks about the 8086 and 286 as if there were nothing in between.

    The situation with 186/188 CPUs is perhaps nicely illustrated by the fact that in my hardware pile there’s probably at least a dozen 80186/188 processors, but they are all in SCSI HBAs, Ethernet NICs, maybe even some hard disks. Because that’s where those CPUs (or perhaps more like SoCs?) went.

    I find it somewhat difficult to believe that the 186 would have the same (real mode) instruction set as a 286 and generally the same microcode (shifts, writes to offset 0xffff) but differ in the PUSH SP behavior. Not impossible, just difficult to believe.

  5. Yuhong Bao says:

    There are probably no 80186/188 PCs where a NE1000/NE2000 driver would be useful anyway. NEC V20/V30 is much more likely.

  6. MiaM says:

    As I understand it, an NEC V20 also has INSW so would benefit if the detection would treat it as a 286.

    I also doubt that there were any PCs with an 80188 or 80186 that would have any reason to run a NE2000 driver.

    The 80188 and 80186 were released in 1982. At the time it was kind of not known that 100% PC hardware compatibility would be an important feature for x86 based computers, so the extra stuff built in in an 80188/80186 not only differs from the separate stuff in a PC but is also hardcoded in a way that conflicts with the address map on a PC (at least unless you turn off a bunch of the built in stuff, which kind of makes an 80188 or 80186 a bad idea).

    It’s interesting though that if the 80188 and 80186 had been a bit different but with a feature set to what they actually ended up with, they could had been the first “PC on a chip”. Instead they ended up with “something else x86 based on a chip” which is why they were used in so many “advanced embedded” systems and also some non-PC-compatible MS-DOS / CP/M-86 computers. It’s really nice to not need separate chips for timers, interrupt controllers and whatnot.

    A product using those processors that many of us probably have fond memories of where the USRobotics modems.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.