Learn Something Old Every Day, Part VII: 8087 Intricacies

The other day I investigated a report that a C runtime library modification causes programs to hang on a classic IBM 5150 PC with no math coprocessor. The runtime originally contained two separate routines, one to detect the presence of an FPU and the other to detect the FPU type.

Someone noticed that the code in the two routines looked really similar and decided to merge them. The reworked code runs just fine on 386 and later processors, with or without FPU (I’m unsure of its status on 286 machines). But it does not work on an FPU-less 8088; it causes the system to hang.

The old code looked like this:

    push  BP                 ; save BP
    mov   BP,SP              ; get access to stack
    sub   AX,AX              ; start with a preset value
    push  AX                 ; allocate space for ctrl word
    fninit                   ; initialize math coprocessor
    fnstcw word ptr -2H[bp]  ; store cntrl word in memory
    pop   AX                 ; get control word
    mov   AL,AH              ; get upper byte
    pop   BP                 ; restore BP

If the routine returned the value 3, a math coprocessor was found, otherwise there wasn’t one.

The new code looks like this:

        push    BP                  ; save BP
        mov     BP,SP               ; get access to stack
        sub     AX,AX
        push    AX                  ; allocate space for status word
        finit                       ; use default infinity mode
        fstcw   word ptr [BP-2]     ; save control word
        fwait
        pop     AX
        mov     AL,0
        cmp     AH,3
        jnz     nox87
        ...

It’s almost the same, but hangs on an 8088 without an 8087. Why does that happen?

The reason has everything to do with how an 8087 FPU was interfaced to the 8086/8088 CPU. The FPU snooped the CPU’s bus transactions and observed all instruction fetches. The CPU used the QS0/QS1 pins to inform the 8087 about the state of its prefetch queue. The FPU essentially executed instructions in lockstep with the CPU.

When an FPU instruction was found in the instruction stream (prefixed with one of the ESC opcodes), the FPU executed the instruction. The CPU usually first executed a WAIT (aka FWAIT) instruction, which monitored the CPU’s TEST input, which was connected to the FPU’s BUSY signal. As long as the FPU reported that it was busy, the CPU would wait.

And of course, the problem is that with no FPU present, the nonexistent FPU appears to be permanently busy… and a WAIT instruction never completes.

Intel significantly changed the CPU/FPU interface for the 286/287, then again slightly for the 386 (which could use either a 287 or a 387 FPU), and yet again for the 486 when the FPU was built into the CPU chip and no external interface was involved anymore. These changes subtly altered the behavior of the FPU related instructions. One of the changes is that newer CPUs no longer hang on a WAIT instructions if no FPU is present. But back to the old IBM PC…

IBM 5150 FPU Support

I thought I’d check the original IBM Technical Reference. First I examined the BIOS to see what it does with the FPU. The answer was clear — nothing. The original PC BIOS contains no mention of 8087, FPU, math, or anything like that. The BIOS equipment word (stored in the BIOS Data Area or BDA) supposedly has bit 1 set when an FPU is present. But in the original PC BIOS, that bit is “not used”.

Newer versions of the PC and XT BIOS are only very very slightly different: Bit 1 in the equipment word is documented as “math coprocessor”. But the BIOS still doesn’t do anything with it. So how is that supposed to work?

Easy — the bit value comes straight from a DIP switch on the IBM PC system board. The BIOS does not attempt any validation, which means the FPU bit in the BIOS equipment word might not be very trustworthy on the PC and XT.

(As an aside, the PC/AT BIOS does include FPU detection and sets the bit in the BIOS equipment word. It also has a little bit of work to do because on the PC/AT, the FPU uses IRQ 13 for math error reporting; the IRQ has to be unmasked if an FPU is found.)

In any case, I searched the Technical Reference for documentation relevant to the FPU. After a while I realized that there’s nothing about the FPU in the original 1981 PC Technical Reference. Nothing at all except… on the first page of the system board schematics, there are two big chips/sockets in the middle. One is labeled ‘8088’ and the other ‘SOCKET’. There is no question that the nameless ‘SOCKET’ is in fact a socket for an 8087 chip.

Why these shenanigans? Why put a socket on the board and then pretend it doesn’t exist? When IBM started selling PCs in late 1981, the Intel 8087 was a very new chip (it was released a couple of years after the 8086) and it was only available in very limited quantities, if at all. IBM clearly didn’t think they could offer the chip when the PC was launched, and perhaps when the Technical Reference was finalized, it wasn’t even clear if and when the 8087 could be offered to customers. Therefore IBM decided to not say anything at all, even though 8087 support was designed in. It’s still all there in the board schematics, but probably only people looking for 8087 support would clearly recognize it.

Looking at the schematics I noticed one interesting detail: The 8087 is wired to use NMI and not a regular interrupt for math error reporting on the IBM PC (something that Intel explicitly did not recommend) and the DIP switch controlling FPU presence also enables the NMI line. That is, unless the DIP switch is correctly set to indicate FPU presence, the FPU won’t be able to raise interrupts.

How Does It Really Work?

But then I started wondering how the FPU detection really works on an 8088. The FPU not present case is easy enough to follow. Let’s look at the detection code again:

    push  BP                 ; save BP
    mov   BP,SP              ; get access to stack
    sub   AX,AX              ; start with a preset value
    push  AX                 ; allocate space for ctrl word
    fninit                   ; initialize math coprocessor
    fnstcw word ptr -2H[bp]  ; store cntrl word in memory
    pop   AX                 ; get control word
    mov   AL,AH              ; get upper byte
    pop   BP                 ; restore BP

With no FPU, the FNINIT and FNSTCW instructions simply do nothing. The CPU effectively skips them, and the 8087 is not there to react in any way. Therefore the word on the stack will be unmodified (zero) and a later comparison will fail.

But how does it work if the FPU is present? The FNSTCW instruction is documented to take about 15 cycles on the 8087. If the CPU does not wait for the FPU at all, how does the CPU make sure that the FPU control word value is read after the FPU wrote it, and not before? The code does not attempt to insert any delay or anything, so why does the CPU not race ahead and read from memory before the FPU wrote it? Note that “canonical” 8087 detection code from Intel likewise does not suggest that any kind of explicit waits are needed.

I am not entirely sure of the answer. But I suspect that it has something to do with another synchronization mechanism between the FPU and CPU, completely separate from the BUSY signal. This mechanism uses the RQ/GT (request/grant) pins on the CPU and FPU.

Although the 8087 tracks the CPU execution and at least partially decodes instructions, it does not mirror the 8086/8088 state. When an instruction like the FNSTCW above writes to memory, the FPU cannot figure out the address, because it doesn’t know what the value of the BP register is. Therefore the CPU actually generates the address on the local processor bus, and the FPU uses that address to read or write data.

To do this safely, the FPU needs to be able to own the bus, so that no one else tries to execute bus transactions at the same time. It is quite possible that when the FNSTCW starts executing, the 8087 almost immediately grabs the bus. The CPU would then be unable to execute the POP instruction because that has to read from memory. This would allow the FPU to to safely complete the FNSTCW instruction before the CPU can read the results (and the result must always be stored in memory).

I’m not quite sure how it works for the FNINIT instruction, which does not use memory and the CPU won’t generate any dummy memory cycles for it. The FNINIT instruction is relatively fast (5 cycles typical on the 8087). I suspect that it might also lock the bus to hold the CPU back, but could not find this documented.

Dual Sync

The 8087 clearly has two synchronization mechanisms. The BUSY signal is used for “long term” synchronization, and it is meant to allow the 8087 execute numeric instructions in parallel with the CPU.

Many numeric 8087 instructions take hundreds of cycles to execute, and Intel clearly thought it would be useful to let the CPU execute while the FPU is busy. The BUSY synchronization is visible to programmers through the WAIT instruction, and may need to be explicitly coded (e.g. if an FPU instruction stores a value that the CPU reads, the FPU has to make sure the FPU instruction was completed).

But there’s also another synchronization mechanism using the RQ/GT signals. This mechanism is not visible to the programmer and serves as immediate synchronization to prevent the FPU and CPU from stepping on each other’s toes.

I also suspect that the RQ/GT mechanism is used to synchronize the no-wait control instructions such as FNINIT and FNSTCW. Due to the nature of the 8087 interface, these instructions must not use the WAIT mechanism to avoid hangs. These hangs can happen both when there is no 8087 as well as when there is a pending FPU fault/interrupt (because a pending FPU fault causes the 8087 to be busy—as it must, because the fault needs to be handled before the next FPU instruction alters the state).

While the WAIT synchronization mechanism is well documented, given that programmers had to be aware of it, the RQ/GT mechanism is not documented in much detail because programmers had no control over it and it worked transparently.

386 Note

On the 80387 and later FPUs, a FINIT instruction won’t hang the system if no FPU is present. But there is another scenario to contend with: There may be a pending unmasked FPU exception. This is admittedly a corner case, but in environments such as DOS, it cannot be prevented.

On the 8087, math exceptions were asynchronous and were reported whenever they occurred. On the 387, that is not the case, and exceptions are only reported when WAIT is executed. It is therefore possible for the system to be in a state where the FPU has a pending exception, and the exception remains pending potentially indefinitely.

The FPU detection code using the waiting form of FINIT may therefore trigger a FPU exception; this may be harmless, but it could cause exception handling code to be run before it is fully initialized. The non-waiting form of FNINIT takes care of this problem, because it dismisses any pending math exception.

What Did Intel Say?

I could not find any hint of how software should detect an 8087 in the Intel 8086/8087/8088 documentation. The 8087 as such is well documented, but its detection is not. It is possible that the documentation was written before Intel realized that user-installable 8087 upgrades would be a thing.

In the Intel 80287 manual, there is sample code for detecting the FPU. An excerpt is below:

start:
;
;       Look for an 8087, 80287, or 80387 NPX.
;       Note that we cannot execute WAIT on 8086/88 if no 8087 is present.
;
test npx:
        fninit              ; Must use non-wait form
        mov     [si],offset dgroup:temp
        mov     word ptr [si],5A5AH ; Initialize temp to non-zero value
        fnstsw  [si]        ; Must use non-wait form of fstsw
                            ; It is not necessary to use a WAIT instruction
                            ;  after fnstsw or fnstcw.  Do not use one here.
        cmp     byte ptr [si],0 ; See if correct status with zeroes was read
        jne     no_npx      ; Jump if not a valid status word, meaning no NPX
;
;       Now see if ones can be correctly written from the control word.
;
        fnstcw  [si]        ; Look at the control word; do not use WAIT form                              
                            ; Do not use a WAIT instruction here!
        mov     ax,[si]     ; See if ones can be written by NPX
        and     ax,103fh    ; See if selected parts of control word look OK
        cmp     ax,3fh      ; Check that ones and zeroes were correctly read
        jne     no npx      ; Jump if no NPX is installed
;
;       Some numerics chip is installed.  NPX instructions and WAIT are now safe.

The comments very clearly state that WAIT must not be executed on an 8086/8088 with no 8087, and that explicit synchronization isn’t required after FNSTSW or FNSTCW.

I can only guess that in the 1980s, people already got burned by adding WAITs to FPU detection code, only to have it hang on an 8088 PC.

This entry was posted in 8086/8088, Development, IBM, Intel, PC history, x87. Bookmark the permalink.

7 Responses to Learn Something Old Every Day, Part VII: 8087 Intricacies

  1. Andrew Jenner says:

    Executing WAIT on an 8088 or 8086 with no 8087 isn’t a fatal hang – the CPU will continue to service IRQs while waiting. So there is another method on PC hardware to determine if an 8087 is present – just execute 8087 instructions with WAITs, and hook the timer interrupt with some code that detects if the instruction pointer is at the WAIT for the first instruction for too long (the interval between timer IRQs with the default timer frequency of 18.2Hz is much longer than even the most long-running 8087 instruction). Assuming that first instruction is only executed once, the timer hook can then set an “FPU not present” flag and continue execution in a way that avoids executing more 8087 instructions. I’m not sure if any PC software ever did it this way, though.

  2. Michal Necasek says:

    You’re right that WAIT also functions as a kind of HLT on the 8086/8088, but I don’t think that’s really very useful. An interrupt handler plus the code to install and remove it would be significantly more complex and larger than the handful of instructions using FNINIT/FNSTCW. Besides, using interrupts requires a lot of platform specific knowledge, which FNINIT does not need.

    Last but not least, the timer interrupt (as you note) ticks so slowly that a no-op FNINIT/FNSTCW is going to execute far faster in the no-8087 case.

    I’m pretty sure the method you suggest would work, but compared to the no-wait FPU instructions it seems worse in just about every way 🙂

  3. The detection code provided by Intel in the 80287 manual[1] has some subtleties…

    It performs an fninit/fnstsw pair and then compares the result to zero, even though the condition codes in the status word are documented as being “indeterminate” following FPU initialization. It took a few re-readings before I noticed that actually only the low byte of the status word was being checked and the condition codes (being in the upper byte) weren’t part of the comparison.

    But the byte comparison still includes bit 6 (‘reserved’), despite the text immediately before the code sample claiming that “The example also avoids depending on any values in reserved bits” :-/

    [1] 80286 and 80287 Programmers Reference Manual, 1987. Page 3-2 and Figure 3-1 in the 80287 NPX book.

  4. Michal Necasek says:

    Actually… in the version I’m looking at there are two detection examples, one is using FNINIT/FNSTCW (Figure 3-1) and the other using FNINIT/FNSTSW (Appendix B). Both only check one byte, but the logic is of course different.

  5. Fernando says:

    My memory of that days (which admittedly are very imperfect) is that nothing autoconfigured itself, all was configured with dip switches, jumpers, a program asking the user what he had or some type of configuration file. Memory size, number of diskettes, type of video, hard disk parameters, printers, etc. I can’t remember of anything that autoconfigure itself, even baud rate was later, probably 286 or 386.
    If some parameter was bad and your computer halt, you just restart it and change the parameter.
    So I can see the engineers of Intel not even thinking about auto detection before the 80286.

  6. MiaM says:

    Fernando: I might misremember but afaik a PC Bios would actually do something automatic re parallel ports. IIRC if you had a parallel port on a video card that would be LPT1, otherwise one of the “non-videocard” parallel ports would be LPT1. The one on a video card would have a separate address different from the two possible “non-videocard” ones.

    Otherwise you are correct, almost every hardware things would need some sort of manual configuration, unless some parts of the configuration were assumed to be set in a certain way (i.e. IRQs for serial ports – few software packages allowed the user to select non-standard IRQs making it painful to use more than two serial ports).

  7. Michal Necasek says:

    Serial and parallel ports are detected by the BIOS. Most everything else has to be configured via DIP switches or BIOS setup, or the hardware is ignored by the BIOS (like network cards or sound cards).

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.