This post ended up being much longer than originally intended because halfway into writing it, I found that 286 and later CPUs don’t behave the way I had assumed they would…
While investigating a bug related to a program using floating-point math on a 386SX system with no FPU, I started pondering how exactly FPU detection works on 286 and newer CPUs. Although math co-processors became standard some 30 years ago, on old PCs they were an uncommon and expensive add-on, and a 66 MHz 486SX2 would still have a usable yet FPU-less processor in the mid-1990s.
The CPU/FPU interface and FPU detection on the 8086/8088 was discussed before. To recap, the 8086/8087 interface is a little odd because it is in fact a generic co-processor interface. The 8086 was launched in 1978; probably sometime in 1979, the Intel 8089 I/O Coprocessor arrived; the 8087 only appeared in 1980.
The ESC instruction (opcode range D8h-DFh) was used for communication with a co-processor on the 8086. While the CPU didn’t exactly execute the instruction, it had to know how to decode it. The ESC instruction used a standard ModR/M byte to indicate an optional memory operand, which the CPU needed to be able to write to or read from the co-processor.
If there is no co-processor attached to an 8086, the ESC instructions simply do nothing because the co-processor isn’t there to read or write any data. However, the WAIT instruction designed for synchronization will (in a typical 8088/8086 PC design) hang indefinitely because the missing co-processor acts as if it were permanently busy. For that reason, FPU detection must use the non-waiting FNINIT/FNSTSW sequence (or an equivalent) to avoid hangs on 8086-class machines.
Additional information about what things look like from the 8087’s perspective has been recently published.
As an aside, it should be noted that the IBM PC had a mechanism to report FPU presence (or absence) in the BIOS equipment word (INT 11H). However, this detection relied entirely on a user-settable DIP switch on the PC and PC/XT motherboard. An article in PC Tech Journal in June 1985 (Machine Specifics, by Ted Forgeron) notes that IBM’s own manual gave users incorrect instructions, telling them that the DIP switch needed to be in the ON position to signal FPU presence. In reality, the DIP switch setting to report FPU presence was OFF. As a consequence, the BIOS FPU presence bit could not be trusted in PC and XT systems (on the PC/AT, BIOS detected FPU presence during POST) and software needed to explicitly check for FPU presence to be certain.
By the time the 80286 was rolled out in 1982, Intel had effectively given up on a generic co-processor interface (no co-processor other than the 80287 is known). Although Intel’s 286 documentation mentions the ESC instruction here and there, it is not listed in the instruction reference at all (unlike WAIT). ESC is only indirectly documented in the 80287 programming reference. The situation was the same with the 80386/80387 documentation; no ESC in the 386, only FPU instructions in the 387, according to Intel.
Unlike the 8086, the 286 and later had a convenient ability to simplify floating-point instruction emulation (a non-trivial topic on the 8086). The EM bit in the Machine Status Word (MSW, later the low 16 bits of CR0 register), when set, causes the ESC opcode to trigger a Coprocessor Not Available fault (exception 7).
Intel’s idea clearly was that on systems with no FPU, the EM bit should be set. Which is all well and good, except firmware or operating system still needed to figure out how to set the EM bit based on FPU presence or absence. The hardware itself offered no aid to detect an FPU; it had to be done in software. And to detect whether an FPU was present, Intel suggested executing an FNINIT/FNSTSW or similar sequence.
So… based purely on Intel’s programming documentation, in order to detect the presence of an FPU, one had to execute FPU instructions—defined for a chip that might or might not be present. How could that possibly work?!
286/287 Interface
Intel’s documentation was a lie, as usual. Although the 80286/80287 interface (and the very similar 80386/80387 interface) was quite different from the original 8086/8087 interface in detail, it was conceptually not that different.
The original 80287 was very closely related to the 8087 and used the same execution unit (NEU, or Numeric Execution Unit) as the 8087. However, the bus interface (BIU, or Bus Interface Unit) was significantly different.
Rather than the 287 FPU snooping on the CPU’s bus and looking for ESC opcodes on its own, it responded to I/O cycles on reserved ports 00F8h, 00FAh, and 00FCh. As with the 8087, the 287 was still connected directly to the 286’s data bus in order to exchange data, but all memory accesses were performed through the 286, not directly by the 287 (as was the case with the 8087). The 8087 could become a bus master and directly access memory; the 287 could only communicate with the CPU and had to ask the processor to perform memory accesses on its behalf.
It is obvious that the 286 (not 287) still had to decode ESC opcodes, regardless of what Intel’s programming documentation says (or rather doesn’t say). When no FPU was present, the I/O cycles generated by the CPU had no effect, and the FPU never asked for any data transfers… except see below.
There was one other significant change brought by the 286. On the 8086/8087, users had to code WAIT instructions before every floating-point instruction (most assemblers did that automatically). That was because the FPU couldn’t respond to the next floating-point instruction while it was still busy with a previous one.
The 286/287 no longer required these explicit WAIT instructions. As Intel put it (page B-2 of the Intel 80286 and 80287 Programmer’s Reference Manual, 1987): the 80286 automatically tests the BUSY line from the 80287 to ensure that the 80287 has completed its previous instruction before executing the next ESC instruction.
That is an interesting fact, because it requires the 286 (not 287!) to understand which instructions require the BUSY line testing and which ones don’t.
Parallelism
The 8086 was designed to allow the CPU to execute in parallel with a co-processor, using the WAIT instruction for synchronization.
First of all, please note that there is some confusion about the WAIT instruction, also sometimes called FWAIT. (F)WAIT is sometimes classified as an FPU instruction; it is not really, as it is executed purely by the CPU. Unlike FPU instructions, (F)WAIT does not communicate with the FPU at all; it only observes the BUSY signal input to the CPU. Of course this line was blurred since the 486, when the FPU was added to the same chip as the CPU.
Why are there two mnemonics for the same instruction? As always, there is a reason. The WAIT instruction is opcode 9Bh and that’s that. FWAIT, however, may be assembled as opcode 9Bh or as the sequence 90h, 9Bh (that is, NOP / WAIT). The two-byte sequence is emitted when producing floating-point code that can be emulated on 8086/8088 systems. Since those systems have no built-in facility for FPU emulation, floating-point instructions as well as WAIT need to be replaced with software interrupts. And because a software interrupt needs at least two bytes, the extra NOP is necessary to leave enough space. (With an FPU emulator, there is no parallelism and no waiting is needed; however, WAIT still needs to not hang the system!)
One might say that the 8087 was well suited to parallel execution because it was slow. Simple floating-point addition or subtraction took around 100 clock cycles. Division took more than 200. The FYL2X and FYL2XP1 instruction could take around 1,000 cycles.
How useful this parallelism was in practice is another question. Most FPU instructions were closely followed by another FPU instruction and the CPU could not do a whole lot in between. When executing a lengthy FPU instruction, the CPU almost certainly needed the actual result and couldn’t just forge ahead. That said, the CPU was able to handle things like hardware interrupts while the FPU was busy. In a multi-tasking system, the CPU might be able to switch to a different task, as long as that task didn’t use the FPU as well.
On the FPU itself, there were two classes of instructions: math and control (or administrative). The NEU took care of the slow (or very slow) math instructions (FADD, FMUL, FSQRT etc.). The BIU executed the control instructions like FINIT, FLDCW, FSTSW, or FSAVE/FRSTOR.
There was also parallelism between the BIU and NEU within the FPU. For example, the FNSTSW instruction could be executed, at least on the 8087 and presumably 80287, while the NEU was busy–which was reflected in the BSY bit of the FPU Status Word (FSW).
In general, the programmer had to explicitly synchronize the CPU and FPU execution by using the (F)WAIT instruction or using the waiting forms of CPU instructions. However, certain control instructions required no explicit synchronization because they already did the work internally. This is how it was described by Intel (80287 Numeric Processor Extension (NPX), 1987, page 2-49):
There are several NPX control instructions where automatic data synchronization is provided; however, the FSTSW /FNSTSW, FSTCW /FNSTCW, FLDCW, FRSTOR, and FLDENV instructions are all guaranteed to finish their execution before the CPU can read or alter the referenced memory locations.
The 80287 provides data synchronization for these instructions by making a request on the Processor Extension Data Channel before the CPU executes its next instruction. Since the NPX data transfers occur before the CPU regains control of the local bus, the CPU cannot change a memory value before the NPX has had a chance to reference it. In the case of the FSTSW AX instruction, the 80286 AX register is explicitly updated before the CPU continues execution of the next instruction.
In other words, for some FPU control instructions, the FPU effectively held the CPU busy during the ESC opcode execution. This ensured that the CPU couldn’t modify any operands the FPU might still read, and at the same time the CPU couldn’t access memory written by the FPU before the FPU was done.
If one thinks about the 8086/8087 architecture, it is obvious that the BIU had to execute in lockstep with the CPU. As a consequence, control instructions could be executed without any waiting because if the CPU was ready to execute the next ESC opcode, the BIU had to be done with any previous ESC opcodes, even though the NEU could still be busy.
This is also why assemblers supported both waiting and non-waiting forms of these instructions. For example FNSTSW (non-waiting form) could start and finish executing while the NEU was busy. While that may have been useful in some cases, if the programmer wanted to read the FPU Status Word (FSW) as it was after completing the previous FPU calculation, FSTSW (the waiting form) had to be used.
Some control instructions internally synchronized between the BIU and NEU. For example, the FNSTENV and FNSAVE instructions could be executed even if the FPU was busy, however the state would not be saved until the FPU was done (i.e. the NEU was no longer busy).
The FNINIT instruction performs an FPU reset. For that reason, FNINIT could also be executed without waiting and might abort any NEU operation still in progress.
What If There’s No FPU?
Here’s an example of FPU detection logic from Intel’s 287 documentation:
FND_287:
FNINIT ; initialize numeric processor.
FSTSTW STAT ; Store status word into location
MOV AX,STAT ; STAT.
OR AL,AL ; Zero Flag reflects result of OR.
JZ GOT_287 ; Zero in AL means 80287 is present.
; No 80287 Present
SMSW AX
OR AX,0004H ; Set EM bit in Machine Status Word
LMSW AX ; to enable software emulation of
JMP CONTINUE ; 287.
; 80287 is present in system
GOT_287:
SMSW AX
OR AX,0002H ; Set MP bit in Machine Status Word
LMSW AX ; to permit normal 80287 operation
CONTINUE: ; and off we go
In principle, the FSTSW instruction in the example ought to have been FNSTSW, otherwise the code would likely hang on an 8086/8088 system with no FPU. Then again, the code is obviously written for a 286 (using LMSW/SMSW instructions), so running it on an 8086 wasn’t a concern.
The example also clearly shows how software is responsible for setting the MSW. The hardware can’t do it; software must detect the FPU presence or absence and act accordingly.
The manual includes a curious note about the sample: It assumes that the system hardware causes the data bus to be high if no 80287 is present to drive the data lines during the FSTSW (Store 80287 Status Word) instruction. More about that later.
Intel’s documentation is pretty clear on what happens when an FPU is present. FNINIT resets the FPU, FSTSW stores the status word which will always have a zero value in the low 8 bits.
If there’s no FPU however… things get interesting. If one takes Intel’s 286/287 documentation literally, the detection can’t ever work because with no FPU, there are no valid instructions to execute (remember, ESC is not documented as a valid 286 instruction).
Obviously that’s not how it works in reality. The 286 is not entirely different from the 8086 and ESC is still a CPU instruction. The CPU can execute ESC instructions just fine, but if there’s no FPU, ESC is a no-op… but only mostly.
That’s why there’s that note about the data bus having to be driven high. If there’s no FPU to execute F(N)STSW, who would write to memory? On an 8086/8087 system, it is clear that the 8087 handles all writes. No 8087, no memory writes by ESC opcodes. But the 286/287 is different. Unlike the 8087, the 287 does not become a bus master in order to access memory. All memory accesses are performed by the 286 on behalf of the 287. This is obviously required for memory protection to work.
I don’t have a 286 on hand at the moment, but I do have a 386 system (Intel 80386DX-33) with no math co-processor plugged into the socket on the board. I can confirm that the FNSTSW m16 instruction does write to memory even if there is no FPU. On my system, it writes FFFFh. I cannot tell if that is what the CPU writes because there is no FPU, or (much more likely) that is the usual “unused” bus value which typically results when attempting to read from nonexistent memory or I/O ports.
Clearly, ESC opcodes are not just NOPs. The 80386 knows that FNSTSW m16 writes one word to memory, and writes it on behalf of the FPU. If the FPU is not there, the CPU still writes to memory.
Co-processor Segment Overrun
Let’s take a detour to examine one odd aspect of the x86 architecture which evolved with every early CPU generation.
The 286 Case
The 286/287 needed to solve a new problem that didn’t exist on the 8086, namely memory protection. The FPU must not be allowed to access memory past segment limits, just like the CPU is not allowed to (otherwise memory protection would go out of the window).
For every ESC instruction which accesses memory, the 286 knows where the access starts, but clearly not where it ends. Because the 286 does not know how big FPU instruction operands are, it needs the Processor Extension Segment Overrun interrupt, also known as Co-processor Segment Overrun interrupt (number 9). If the starting address is outside of segment limits, the 286 immediately triggers a General Protection Fault (interrupt number 13). But if the memory access is only partially outside of segment boundaries, the 286 won’t find out immediately.
I do not know exactly how it is implemented, but I suspect that the 286 keeps track of the segment base and limit that the most recent FPU instruction was accessing, and it also knows the starting address of the memory access. As the FPU accesses subsequent words of the memory operand, the 286 keeps checking if the access is within segment limits. If it is not, the dreaded Processor Extension Segment Overrun (Interrupt 9) occurs.
Why dreaded? Because Interrupt 9 is one of the very few non-restartable exceptions. The 286 manual warns that the only FPU instruction which can be safely executed when Interrupt 9 occurs is FNINIT, which implies that the FPU state is lost. Because Interrupt 9 occurs asynchronously, it may be even triggered after a task switch, in the context of a task different from the one that initiated the faulting FPU instruction.
In any case, if Interrupt 9 occurs on a 286, the process which triggered it is effectively beyond salvation.
The 386 Case
On the 386, the Coprocessor Segment Overrun (no longer called Processor Extension Segment Overrun) still exists, but it takes real work to trigger. It only occurs “if the 80386 detects a page or segment violation while transferring the middle portion of a coprocessor operand to the NPX”. Emphasis on “middle”. In other words, the 386 knows exactly how long FPU instruction operands are, but there are edge cases it does not handle.
It is clear that the 386 validates the start and end of an FPU operand (remember, it can be up to 108 bytes long in the case of FSAVE!). There are pathological cases where the operand wraps around the addressing limit such that the starting and ending addresses are both valid, but one or more of the middle addresses is not. This can happen if the segment limit is slightly smaller than the wrap-around limit (e.g. addressing limit is FFFFH and segment limit is FFFDH), or pages are misaligned with respect to the segments such that there is a small “gap” at the start or end of the addressing limit which falls into an invalid page.
On the 80386, Interrupt 9 is similarly non-restartable and generally very bad news. However, an operating system can entirely avoid Interrupt 9 caused by page faults, and minimize the likelihood of triggering it by going past segment limits. In addition, because it requires addressing wrap-around, Interrupt 9 will never be triggered on a 386 by normal, reasonably written software.
The 486 Case
In the 80486, Intel simplified the Processor Extension Segment Overrun quite a lot—it no longer exists at all. This implies that the 486 must be capable of fully verifying a memory access before a FPU instruction starts performing its operation. Any protection violations trigger a General Protection Fault or a Page Fault, just like non-FPU instructions.
Clearly, the 486 must understand FPU instructions quite well. Then again, since the FPU is either built-in or entirely absent, that’s not too surprising.
What Does the 386 Know?
It is clear that the 386 knows much more than Intel lets on about 387 instructions.
The following 386 instructions write to memory in the absence of a 387: FSTSW, FSTCW, FSTENV, FSAVE.
The following 386 instructions do not write to memory in the absence of a 387: FIST, FST.
It is rather interesting what the FSTENV and FSAVE instructions do. The FSW/FCW/FTW as well as (in case of FSAVE) the FP registers are stored as all ones—clearly that is data which would come from the FPU, if it were there.
But even without an FPU, FSTENV and FSAVE store the FP instruction and data pointers! In other words, the 386, not the 387, tracks this information. Which, in retrospect, is how it has to be, for two reasons.
One reason is that FSTENV/FSAVE can store the pointers in four different formats—all combinations of 16-bit/32-bit and real/protected mode. While the 287 had the FSETPM instruction, on the 387 it’s a no-op. Yet the 386/387 knows which format to store the information in. If the 386 is in charge, that simplifies things quite a bit.
The other reason is that the 80386 needed to be able to work with the 80287, a stopgap measure necessitated by the fact that the 387 wasn’t available for about two years after the 386 was released. If the 386 tracked the instruction and data pointers, it could work with a 287 which had no clue about 32-bit addressing.
It is clear that what started as a generic co-processor interface on the 8086 turned into a single-purpose FPU interface on the 80386, and to a lesser extent it must have been that way on the 80286 already.
Unsurprisingly, the 386 does even more. For example, attempting to execute an FLD instruction on an invalid address will fault in protected mode, even if no 387 is present. However, executing FST does not fault, presumably because the write never happens.
On the other hand, FNSTSW can trigger faults even with no FPU. That is unsurprising; as shown above, FNSTSW writes to memory regardless of whether an FPU is present to not.
It is clear that the 386 took over some of the responsibilities of the original 8087 BIU. The 386 has significant knowledge of FPU instructions. FPU control instructions are to some extent implemented by the 386, although the 387 still needs to supply or accept numeric data.
What Does a 486SX Know?
The Intel 486SX is a rather odd case for two reasons. It is the last mainstream processor without a built-in FPU, and unlike earlier CPU generations, it cannot have an FPU added (that is not the case with Cyrix 486S, which can work with an external add-on FPU).
Examining an AMD Am486SX-66 (not known to be distinguishable from Intel parts in software), and later confirming with a genuine Intel 486SX, it is apparent that the 486SX behavior is not very different from a 386. Even though it cannot be equipped with an FPU, the CPU still does a lot of FPU-related work.
Like the 386, the 486SX tracks FP instruction/data pointers and validates memory operands. Like the 386, the 486SX writes to memory when FSTSW, FSTSW, FSTENV, or FSAVE is executed. It is very likely that the microcode is not vastly different between the 386 and 486.
Unlike a 386, the 486SX also reports protection faults on the FST instruction. This may be related to the fact that the 486 no longer generates Coprocessor Segment Overrun, which implies that memory accesses must be pre-checked and validation is not postponed until the FPU actually starts accessing memory.
Also unlike a 386, the FIST and FST instructions do write to memory on a 486SX.
One behavioral difference I found between an AMD Am486SX2-66 and an Intel i486SX (S-spec SX683) is that the former writes FP instruction/data pointers in FSTENV/FSAVE and the latter does not (only writes FFh words). Such differences are not surprising when one wades deep into undocumented behavior.
Other Vendors
My one system with an IBM 486BL2 processor behaves slightly differently. The behavior is generally similar to a 386, but the values written to memory do not have all bits set. On my test system, the high byte of each word was FFh, but the low byte was inconsistent, though never zero. Therefore, one cannot rely on e.g. FNSTSW to always write FFFFh to memory on systems with no FPU.
On the other hand, a Cyrix Cx486S seems to behave much like an AMD Am486SX2-66.
Safe FPU Detection
How to properly detect an FPU then, without running into problems on systems that don’t have one? Here’s one possible approach (16-bit, able to deal with 8086/8088):
check87 proc near
push bp
mov bp,sp ; establish stack frame
xor ax,ax ; initialize with known value
push ax
fninit ; reset FPU
fnstcw word ptr [bp-2] ; save FPU control word
pop ax ; move FCW into AX
mov al,0 ; assume no FPU
cmp ah,3 ; 00h or FFh if no FPU
jnz nox87
mov al,1 ; indicate FPU present
nox87: mov ah,0 ; clear AH
mov sp,bp ; clean up stack
pop bp
ret
check87 endp
The key points are:
- FNINIT (not FINIT) must be used because the FPU may be in an unknown state and a WAIT instruction may hang
- Storage for the FPU status word must be initialized with a known value
- FNSTCW must be used instead of FSTCW
- After FNSTCW, no WAIT is needed for synchronization
On an FPU-less 8088/8086 system, FNSTCW will not write anything to memory, which is why the value on the stack must be initialized. On a 286 and later with no FPU, the FNSTCW instruction writes (usually) FFFFh to memory. If a real FPU is present, the actual FCW is stored and the high byte will be 03 after FNINIT.
Summary
While detecting the presence of an FPU is well understood, detecting its absence is much less obvious. It relies on CPU behavior which is effectively undocumented on 80826 and later processors. While the 8086 had a generic co-processor interface, the 286 and later have significant knowledge of x87 FPU instructions. That includes the 486SX, which cannot be equipped with an FPU. Even when there is no FPU present, FP instructions on the 80286 and later are far from no-ops and may behave in surprising ways.
The 287 detection code has a typo beyond missing the “N” part:
FSTSTW STAT ; Store status word into location
See the repeated “ST”?
> Intel’s documentation is pretty clear on what happens when an FPU is present. FNINIT resets the CPU, FSTSW stores the status word which will always have a zero value in the low 8 bits.
This should read “resets the FPU”.
> This is in obviously required for memory protection to work.
Superfluous “in”.
> Because Interrupt 9 is one of the very few no-restartable exceptions.
“Non-restartable”.
Fixed the typos, thanks. The “FSTSTW” comes straight from the Intel manual so I’m inclined to leave it, as weird as it is. Maybe an original Intel typo?
Ludloff helped me answer a related question recently concerning FCW and it’s different reset values across generation. After an evening of manual diving, the notes can be found in https://sandpile.org/x86/initial.htm for FCW.
The 386 can work with both a 287 and a 387. The way these were distinguished was for the 387 to reset to an FCW value which forced #ERROR to be asserted, and for this to be sampled to become CR0.ET which was writeable at the time.
“Because Interrupt 9 occurs asynchronously, it may be even triggered after a task switch, in the context of a task different from the one that initiated the faulting FPU instruction.”
I would be very curious to know how multi-tasking systems handled this, like what did Xenix do? That seems like a serious show stopper, how did the OS know what that actual faulting process was?
The 386 microcode does some FPU detection at line 9B9 which is during CPU startup (i.e. before jumping to the first instruction of the BIOS). It sets bit 4 of CR0 (the ET or “Extension Type” bit) based on one of the inputs from the FPU. The documentation states “The 80387 holds its ERROR# output low after reset, whereas the 80287 holds its ERROR# output high.” So it would be natural to assume that it’s the ERROR# line that it’s checking at startup. However, my microcode disassembly currently has “JNBUSY” on line 9B9 because in the implementation of the other FPU opcodes, “JNBUSY” is used to break out of the loop at the start of the instruction and continue to the code that actually sends data to or receives data from the FPU. So it seems like what it’s actually testing at startup is the BUSY# line rather than the ERROR# line. It would be interesting to know what real hardware does here! Though it might require some hardware hacking to set these lines independently.
There are 3 microcode conditional jump instructions specific to the FPU, currently disassembled as JNBUSY (0x42), JPEREQ (0x4e) and JBUSY# (0x4f). It would make most sense if these corresponded in some permutation to the 3 inputs to the CPU from the FPU (BUSY#, ERROR# and PEREQ). The code is a little convoluted but the JBUSY# and JPEREQ instructions are used to determine if the FPU-busy loop ends with an exception, an interrupt (in conjunction with the JNOINT instruction) or goes back to waiting. I need to think about this some more to figure out if there is a permutation that makes sense, or if random logic in the CPU is doing something with these inputs before they reach the microcode conditional jump engine.
With the way lazy FPU context switching works, the OS can or rather must keep track of which task “owns” the FPU (the interaction of MSW.TS bit with FPU instructions). So the OS would know. The same problem actually applies to regular math exceptions, they may be detected when a different process is trying to use the FPU for the first time. But the OS knows who the exception belongs to and can deliver it safely. With Interrupt 9 the real problem is that recovery is likely impossible because execution cannot be restarted.
I know the 386 can distinguish between the 287 and 387 in hardware through the ET bit, that part is documented well enough. But does the 386 “know” that there is or isn’t an FPU attached? Or is that not clear at this point?
I’d hope that if the CPU knew, Intel would expose it in some bit, and also that they wouldn’t have to tell people, hey, make sure some data bus bits are tied high, because (by implication) we’ll write to memory whatever happens to be on the data bus. Although I know it’s not always so straightforward in practice.
It appears the 386 doesn’t actually know. It would have obviously been a lot easier if there had simply been a “coprocessor present” pin that got pulled high if a coprocessor were plugged in.
It looks it does not know. It is up to hardware designer to sneak a pullup somewhere to make the detection work according to the The 80386 Hardware Reference Manual 1986 edition:
The 80386 samples its ERROR# input during initialization (after RESET goes low and
before execution of the first instruction) to determine the type of coprocessor present. The 80387 holds its ERROR# output low after reset, whereas the 80287 holds its ERROR# output high. Therefore, if the 80386 samples ERROR# low, it assumes that an 80387 is present. If it samples ERROR# high, it assumes that *either an 80287 is present or that a coprocessor is not used.*
If the 80386 determines that either an 80287 is present or a coprocessor is not used, it must then execute a routine to determine the presence of an 80287 in order to set its internal status. Figure 5-4 shows an example of a recognition routine. In order to use this routine, *the designer must connect a pullup resistor* to at least one of the lower eight bits of the data bus if a coprocessor is not use
It looks like some old BIOSes do:
FNINIT
OUTPORT($F0 ,0) reset BUSY manually ???
*MEMP = 0x5A5A
FNSTSW MEMP
IF *MEMP != 0 THEN NO_FPU
FNSTCW MEMP
IF (*MEMP & 0x103F) == 0x3f THEN 20287 is there
The tiny detail that’s missing in the post about the i486SX:
There was an i487SX that could be plugged into a board with the i486SX as a “coprocessor upgrade”. BUT the i487SX is NOT a coprocessor like the 8087, 80287, and 80387. It is a full i486DX with a slight variation in the pinout to completely disable the existing i486SX.
Thinking about it some more, I think “JNBUSY x” (ALU/jump operation 0x42) actually means “if (!BUSY# && !ERROR#) goto x;” – they’ve combined those two signals in random logic in order to maximise the speed of the 387 instructions in the common case (where the FPU is not busy and there is no error). Then “JBUSY# x” (ALU/jump operation 0x4f) probably means “if (ERROR#) goto x;”. But then why use JNBUSY instead of JBUSY# for the startup-time 287/387 test? Probably just because the microcode is shorter if the jump happens in the !ERROR# case. I’m not sure why there is a check that PEREQ is not asserted before NM# (interrupt 0x10) happens, though. It doesn’t seem like the combination of ERROR# and PEREQ should ever happen because that would cause the instruction to stall (though the CPU would still respond to IRQs). But perhaps it can happen transiently. I still need to look through the microcode of all the FPU instructions to see if any of them do anything differently which might shed some more light on these signals.
Ah, I thought everyone knew that 🙂 Yes, for the purposes of this discussion, the 487SX is a 486DX with a built-in FPU.
Might PEREQ + ERROR# happen in case of a coprocessor segment overrun? Based on the Intel documentation, that essentially leaves the FPU in an unusable state, which is why it has to be reset. Other than possibly that, I would also expect ERROR# to be triggered either before or after accessing memory, soh PEREQ should be inactive.
In the Intel manuals there are also some notes about the 287 being somewhat timing sensitive, warning that the CPU has to deal with PEREQ (I think it was) within some number of clock cycles or things can go wrong, but unfortunately I don’t recall exactly where I saw that.
What’s interesting is that Intel’s 286 literature still talks about a generic co-processor interface, but it sounds very much like the ESC opcode was already hardcoded for the 287. But it is plausible that the decision was made relatively late in the design cycle and the 286 literature wasn’t updated to reflect that.
Note that PC/AT clones do not tie the ERROR and BUSY lines directly from the 386 to the 287/387, meaning the CR0.ET bit would have to be set another way.
It seems a lot of things in the 286 (and 386) design could have been far simpler with a few constraints:
– Not tolerating address wraparound of code/data/stack segments
– Not tolerating a (word) unaligned stack
– Require segment start addresses to be word aligned (or, even better, dword aligned)
– Require segment sizes to be multiples of 2 or 4 bytes
– When paging is enabled, require segments to start on a page boundary
– When paging is enabled, generally require segments to be multiples of 4 bytes and the start address to be dword aligned
There really isn’t much of a use case for not having these things. Restricting this would have meant it only needed to be support in Virtual 8086 mode, and things in V86 mode are quite a bit simpler (for example, no segmentation, just paging).
On the other hand, the whole coprocessor architecture is needlessly complicated. Intel was clearly planning for something fancy with lots of different types of coprocessors that never materialised. The 486 finally simplified this quite a bit, but I often wonder how much die space was burned up on the 386 to support (a) the edge cases listed above and (b) all the quirks of supporting both 287 and 387 coprocessors.
As far as actual users of the NPX… they were quite rare when 286s and 386s were contemporary, outside of specialised users like people running AutoCAD, and a lot of peope who wanted more floating-point computation power were using a different platform back then.
“When paging is enabled, require segments to start on a page boundary”
Doesn’t the 386 already have a granularity bit for the segment limit?
All these “unnecessary” things are THE reason for people to ditch M68k or ARM and pick Intel. If you have 10 apps and only one of them doesn’t work you wouldn’t upgrade till all of them would become supported. Only much later software (and hardware!) have become powerful and flexible enough to paper over differences in hardware.
Most of the big architectural shifts involved great increases in accessible memory so new replacement programs that did a lot more removed the need to require support for older programs.
It takes a long time to build up a useful program library. Designing a 286 or 386 that broke 8086 code would be a market disaster.
Victor,
286 protected mode was already wildly incompatible with 8086 code. It would have not made any difference to have some slight restrictions on it.
In retrospect, having a Virtual 8086 mode on the 80286 would have been a great decision, but they didn’t do that.
The 80386 had a bit more concern to running 8086 code unmodified whilst in protected mode, and it did need to run 80286 code unmodified, but again these problems could have been avoided with a more sensible approach to designing the 80286. Things like 80286 segments that can start unaligned on a word boundary make no sense, as do segments with a non-word aligned segment limit. Tolerating address wraparound on an unaligned boundary in protected mode makes no sense either.
In fact, any reasonable OS or language runtime must maintain word alignment to avoid performance hit on 286+ (even on 8086, but it was not popular on PC). So segments must start on even addresses and stack must be word-aligned. Forcing this on hardware level would make sense at least for protected mode. However, FPU can operate on much larger structures in memory and it can happen that they cross the page boundary on 386+
The problem with “do something for protecting mode only” is that it doesn’t really save you much silicone (if any) do that in only in protected mode.
Also you have to remember that it was in parallel to development of disaster known as iAPX432… I guess the idea of going from bit-addresses to byte-addresses was a compromise.
And 80386 is much more sane, because it was the first CPU explicitly developed for PC, but the majority of warts that we are discussing here were already implemented in silicone by that time.
>The problem with “do something for protecting mode only” is that it doesn’t really save you much silicone (if any) do that in only in protected mode.
It does not save any silicone. It saves software authors from doing something stupid and hard to diagnose (imaging a program that sometimes gets unaligned segment and starts working twice as slow). But I understand this was not a priority for Intel
Intel made for easy conversion of existing code even if that code might perform poorly without further modification. Happened with the 8080 to 8086, happened with the 8086 to 80286. Given the very rapid pace of 286 development, the result was probably the best that could obtained.
One minor wrinkle for the 286 would be the need to run 186 and 188 real mode code so most of the new functions worked the same in real mode as protected mode. I doubt the 286 had the transistor budget to have completely distinct real mode and protected mode processors.
Josh Rodd:
Given the severity of the early 80386 errata (talked about on pcjs as well as Raymond Chen’s blog about the B1 stepping and elsewhere) it’d be right inline for some of those edge cases to be broken anyway.
Michal Necasek:
If the FPU is present but disabled (e.g., because the user chose that response to the FDIV bug) is that case any different or just a tweak to the kernel’s detection routine to allow for it? It shouldn’t broken userspace at the time since it would be analogous to the 486SX case.
And would you have to care about DOS-compatible vs. native mode here or is that irrelevant to the detection?