Learn Something Old Every Day, Part XX: 8087 Emulation on 8086 Systems

Not too long ago I had a need and an opportunity to re-acquaint myself with the mechanism used for software emulation of the 8087 FPU on 8086/8088 machines.

As mentioned elsewhere, the 8086 CPU (1978) had a generic co-processor interface first utilized by the Intel 8089 I/O processor (1979) and later the Intel 8087 FPU (1980), initially called the Numeric Processor Extension or NPX.

The 8087 was a somewhat expensive add-on, assuming that a given system actually had a socket to plug the 8087 into (IBM PCs did, but other 8086/8088 systems did not necessarily have one). There was a largish class of software which could significantly benefit from the 8087 (e.g. spreadsheets), but in the era of shrink-wrapped software, there was a significant incentive to ship software which could use an 8087 when present, yet would still run on a bare 8086/8088 machine with no FPU.

There was also a desire to develop and test floating-point software without having to install an 8087 into every system. Given the initial limited availability of 8087 chips, it was in Intel’s best interest to give developers a way to write 8087 software without requiring 8087 hardware.

Intel released the E8087 software emulation package together with the 8087 chip. This is evidenced by the original Numerics Supplement to The 8086 Family User’s Manual from July 1980, Intel order no. 121586-001. The Numerics Supplement outlines how the E8087 package works. Actually there were two packages — the full E8087 library, and also a “partial” PE8087 library which implemented just enough functionality for Intel’s PL/M language tools. Intel’s PL/M compiler was the first high-level language translator capable of utilizing the 8087.

Because the 8086 had no facility for emulating an FPU (unlike the 80286 and later processors), the emulation mechanism was somewhat complex and required tight cooperation of assemblers/compilers, linkers, and run-time libraries.

Assembler/Compiler – Intel Original

The assembler or compiler generated “emulatable” 8087 code. The translator in fact produced normal 8086/8087 code, but the object modules included special fix-ups for every 8087 ESC instruction and for (F)WAIT.

Early on, Intel established the convention that the WAIT mnemonic was translated directly to the WAIT opcode, while the FWAIT mnemonic could be emulated.

A key fact is that the language translator did not directly produce 8087 emulation code. It only prepared object modules for emulation while emitting regular 8087 instructions, and the actual decision whether to emulate or not was made at link time.

Linker – Intel Original

During the linking process, the decision to emulate or not was made. The user could link with a no-emulation library (8087.LIB), in which case the linker effectively left the object code alone.

Much more interesting things happened when linking with emulation libraries (E8087.LIB or PE8087.LIB). In that case, the special fix-ups caused the linker to replace the 90/Dx (NOP/ESC) or 98/Dx (WAIT/ESC) sequences with software INT instructions.

In Intel’s original implementation, ESC opcodes D8h-DFh were replaced with INT 18h-1Fh, as shown in the Intel ASM86 Reference Manual, order no. 121703-003 (1983).

Note that eight separate interrupt vectors were required to replace eight possible ESC opcode bytes. The emulator may (and likely does) use a single routine to handle all eight interrupt vectors, but the 8 vectors are needed to preserve the 3 bits of FPU opcode information from the ESC instruction.

Microsoft’s DOS Implementation

Intel’s 8087 emulation mechanism was adopted by Microsoft and with a few changes implemented in their DOS development tools. It was also used by several other vendors of DOS development tools (Borland, Watcom, and others).

For obvious reasons, Microsoft needed to change the range of software interrupts used by the 8087 emulator. Instead of interrupts 18h-1Fh, the DOS emulator uses vectors 34h-3Dh. Yes, that’s 10 vectors instead of 8. While Intel replaced WAIT instructions with NOP for emulation, Microsoft emulated WAIT instructions as well, and Microsoft also had a provision to emulate FPU instructions with ES segment override.

Emulator + 8087

Microsoft added one significant improvement compared to Intel’s original emulator. If the program with a built-in emulator was run on a system with an 8087 present, the emulator detected that during startup. Whenever an emulated instruction was executed (via INT 34h-3Dh), the emulator replaced the software INT instruction with the original NOP or WAIT plus the corresponding ESC opcode, and returned to execute the real floating-point instruction.

This mechanism had a minimal performance impact (emulated instructions were replaced with real 8087 instructions the first time they were executed) and ensured that programs with the emulator ran at effectively 100% speed on systems with an 8087, yet the same binary could still run on a system with no FPU.

This was often used for binaries shipped to end users, since the program could take advantage of an 8087 but didn’t require it.

MASM Implementation

The oldest implementation of Microsoft’s FPU emulation mechanism I could find was in MASM 1.12 and 1.25 from 1983 (no, I don’t understand the version numbering, and I am not sure which is older). Note that these assemblers do not support the .8087 directive yet and do not accept 8087 instructions by default, unlike Intel’s ASM86. To assemble FPU instructions, the /R switch must be used. To generate emulation-ready code, the /E switch must be used as well.

I prepared the following miniature test module:

_TEXT   SEGMENT PUBLIC 'CODE'
ASSUME  CS:_TEXT
start:
        finit
        fwait
        fstsw   [bp]
        fstsw   ds:[bp] ; MASM 1.x/2.x gets this wrong
        fstsw   [bx]
        fstsw   es:[bx]
;       fstsw   ss:[bx] ; MASM 1.x/2.x can't do this
;       fstsw   cs:[bx] ; MASM 1.x/2.x can't do this
        wait
        ret

_TEXT   ENDS
END     start

Then I assembled the module using MASM 1.25 with help of the the EMU2 emulator (setting EMU2_LOWMEM=1 so that early MASM versions would not hang):

c:\emu2>emu2 TOOLS\MASM125.EXE /E /R emu.asm;
The Microsoft MACRO Assembler , Version 1.25
 Copyright (C) Microsoft Corp 1981,82,83

Warning Severe
Errors  Errors
0       0

Then I disassembled the result with the Watcom disassembler, showing the object code emitted by the assembler but also the fix-ups the assembler added as a result of the /E switch:

Module: A

Segment: _TEXT PARA USE16 00000017 bytes
                        ; FPU fixup FIDRQQ
0000  9B DB E3          finit
                        ; FPU fixup FIWRQQ
0003  90                nop
0004  9B                fwait
                        ; FPU fixup FIDRQQ
0005  9B DD 7E 00       fstsw           word ptr [bp]
                        ; FPU fixup FIDRQQ
0009  9B 3E DD 7E 00    fstsw           word ptr ds:[bp]
                        ; FPU fixup FIDRQQ
000E  9B DD 3F          fstsw           word ptr [bx]
                        ; FPU fixup FIERQQ
0011  9B 26 DD 3F       fstsw           word ptr es:[bx]
0015  9B                fwait
0016  C3                ret

Routine Size: 23 bytes,    Routine Base: _TEXT + 0000

No disassembly errors

The /R /E switch combination causes MASM to produce almost the same object code as /R alone, but adds fix-ups to all FPU instructions and FWAIT.

Attempting to assemble the commented out instructions results in the following errors:

c:\emu2>emu2 TOOLS\MASM125.EXE /E /R emu.asm;
The Microsoft MACRO Assembler , Version 1.25
 Copyright (C) Microsoft Corp 1981,82,83

 0015  9B 36: DD 3F                     fstsw   ss:[bx]do this
 E r r o r   ---        84:8087 opcode can't be emulated
 0019  9B 2E: DD 3F                     fstsw   cs:[bx]do this
 E r r o r   ---        84:8087 opcode can't be emulated

Warning Severe
Errors  Errors
0       2

Notice that the old MASM version in fact couldn’t handle the FSTSW DS:[BP] instruction either, although it did not report an error. It just effectively dropped the DS: prefix, which would cause incorrect execution, since addressing through BP uses the SS segment register by default.

The problems were clearly noticed and fixed in Microsoft MASM 3.0 (1984), which can deal with the lines commented out for the old assemblers (and no, Microsoft’s MASM 2.0 has no 8087 support, because the version numbering was a complete mess):


c:\emu2>emu2 TOOLS\MASM300.EXE /E /R emu.asm;
Microsoft MACRO Assembler  Version 3.00
(C)Copyright Microsoft Corp 1981, 1983, 1984


49722 Bytes free

Warning Severe
Errors  Errors
0       0

Disassembling the MASM 3.0 output, we now see the following:

Module: A

Segment: _TEXT PARA USE16 0000001F bytes
                        ; FPU fixup FIDRQQ
0000  9B DB E3          finit
                        ; FPU fixup FIWRQQ
0003  90                nop
0004  9B                fwait
                        ; FPU fixup FIDRQQ
0005  9B DD 7E 00       fstsw           word ptr [bp]
                        ; FPU fixup FIARQQ
0009  9B 3E DD 7E 00    fstsw           word ptr ds:[bp]
                        ; FPU fixup FIDRQQ
000E  9B DD 3F          fstsw           word ptr [bx]
                        ; FPU fixup FIERQQ
0011  9B 26 DD 3F       fstsw           word ptr es:[bx]
                        ; FPU fixup FISRQQ
0015  9B 36 DD 3F       fstsw           word ptr ss:[bx]
                        ; FPU fixup FICRQQ
0019  9B 2E DD 3F       fstsw           word ptr cs:[bx]
001D  9B                fwait
001E  C3                ret

Routine Size: 31 bytes,    Routine Base: _TEXT + 0000

No disassembly errors

We can now observe six different fix-ups:

FIDRQQ – normal FP instructions
FIWRQQ – FWAIT
FIARQQ – FP instructions with DS segment override
FICRQQ – FP instructions with CS segment override
FIERQQ – FP instructions with ES segment override
FISRQQ – FP instructions with SS segment override

The names of the fix-ups certainly look strange. Although they are normal symbol names, they are quite unlikely to be used by normal programs.

Microsoft (unlike Intel) never supplied a standalone 8087 emulator for use with MASM; only Microsoft’s high-level language libraries came with the emulator. One of the first Microsoft products with 8087 support was Microsoft Pascal version 3.04 (February 1983). At least since 1981, MS Pascal used symbol names ending with QQ for implementation internals—this used ancient conventions where symbols were limited to 6 significant characters and a double underscore was not yet used for reserved symbols. I am not sure if other Microsoft languages used the same convention, but certainly the fix-up names fit right in with Microsoft Pascal internals.

Linker and the Fix-Ups

At least in Microsoft’s initial implementation, the linker did not need any special support for floating-point emulation all. All the magic was achieved through carefully coordinated cooperation between the language translators and run-time libraries.

How does that work? The fix-ups refer to library symbols. These are absolute symbols with carefully chosen values. For example, FIWRQQ (FWAIT fix-up) has the value 0A23Dh. Why is that?

The assembler emits FWAIT as NOP/WAIT, opcode sequence 90 9B. When interpreted as a little-endian 16-bit value, it is 9B90h.

  09B90h ; NOP/WAIT
+ 0A23Dh ; FIWRQQ value
--------
  13DCDh

The high bit is discarded and the byte sequence 90 9B in the object file is replaced with CD 3D in the final executable. And that of course is INT 3Dh.

The same approach is used for normal floating-point instructions. For example, FINIT is assembled as 9B DB E3 in the object file, with a FIDRQQ fix-up. The value of FIDRQQ is 05C32h, resulting in the following math:

  0DB9Bh  ; WAIT/ESC
+ 05C32h  ; FIDRQQ value
--------
  137CDh

Of course, the CD 37 opcode is in fact INT 37h. Note that opcode sequences 9B D8-DF turn into INT 34h-3Bh.

Things get a little more complicated when segment overrides are present. For example, FSTSW CS:[BX] emits the following byte sequence: 9B 2E DD 3F. The assembler emits not one but two fix-ups; FICRQQ corresponding to the first byte of the instruction, and FJCRQQ corresponding to the second. Now, the FICRQQ value is 00E32h, and FJCRQQ is 0C000h.

Applying the FICRQQ fix-up works as follows:

  02E9Bh  ; WAIT/CS override
+ 00E32h  ; FICRQQ value
--------
   3CCDh

We end up with INT 3Ch, which indicates an FPU instruction with a segment override.

For comparison, an instruction with SS segment override uses the FISRQQ fix-up (value 00632h) in combination with FJSRQQ (value 08000h). For example, FSTSW SS:[BX] is emitted as 9B 36 DD 3F and the math looks like this:

  0369Bh  ; WAIT/SS override
+ 00632h  ; FISRQQ value
--------
   3CCDh

Note that CS and SS segment overrides both result in INT 3Ch. The ES segment override (which was clearly implemented my Microsoft first, before the others) uses the FIERQQ fix-up with no corresponding FJxRQQ companion, and likewise produces INT 3Ch.

How does the emulator figure out which override it was? That’s what the second fix-up is for

FJARQQ – DS override, value 04000h
FJSRQQ – SS override, value 08000h
FJCRQQ – CS override, value 0C000h

Note that a single interrupt vector was sufficient for dealing with the ES override because instead of replacing the WAIT/ESC pair, the WAIT instruction and the segment override are replaced, leaving the ESC opcode in place.

When support for CS/DS/SS overrides was added, Microsoft took advantage of the fact that the instruction following the INT 3Ch will always be the ESC opcode, which has the top 4 bits (really five) fixed. For supporting segment overrides other than ES, the linker adds a value indicating the segment register to the ESC opcode. The emulator then extracts the information from there.

In case of a CS segment override, the FJCRQQ fix-up adds 0C0h to the ESC opcode, such that (for example) DD opcode becomes 9D. For a SS segment override, DD opcode becomes 5D, and for DS segment override, DD becomes 1D. For ES segment override, the DD opcode remains unchanged. Thus the emulator can decode which segment override was present.

A possibly helpful and illustrative explanation of how the fix-up values are derived may be found here. In simple terms, the fix-up value is the difference between the desired two-byte opcode sequence (INT xx) and the two-byte opcode sequence the language translator placed in the object file (e.g. WAIT/ESC).

Debuggers

Once the emulation mechanism was in place, it was possible to extend the support to other tools, such as debuggers. For example, Microsoft’s SYMDEB recognizes the software interrupts and provides special decoding for the emulated instructions. Here’s what it looks like:

c:\emu2>emu2 SYMDEB.EXE emu.exe
Microsoft Symbolic Debug Utility
Windows Version 3.00
(C) Copyright Microsoft Corp 1984, 1985, 1986
Processor is [80286]
-u

0BB7:0000 CD37E3         INT    37 FINIT
0BB7:0003 CD3D           INT    3D FWAIT
0BB7:0005 CD397E00       INT    39 FSTSW        [BP+00]
0BB7:0009 CD3C1D7E00     INT    3C FSTSW        DS:[BP+00]
0BB7:000E CD393F         INT    39 FSTSW        [BX]
0BB7:0011 CD3CDD3F       INT    3C FSTSW        ES:[BX]
0BB7:0015 CD3C5D3F       INT    3C FSTSW        SS:[BX]
0BB7:0019 CD3C9D3F       INT    3C FSTSW        CS:[BX]
0BB7:001D 9B             WAIT
0BB7:001E C3             RET

The example above shows how the floating-point instructions get transformed by the fix-ups applied by the linker. SYMDEB’s ability to recognize emulated floating-point instructions is extremely convenient, because the debugger does not attempt to disassemble the “junk bytes” following some of the INT 3xh instructions; these will be used by the emulator but never executed. And obviously FSTSW [BX] is far easier to understand than INT 39h followed by a byte of 3Fh.

Newer debuggers may invert the logic and disassemble the emulated instructions as if they were true floating-point instructions, perhaps showing the fact that they’re actually software interrupts in a comment.

No-Emulation

How does it work if object code was prepared for emulation but emulation is not desired? The no-emulation library defines the same FIDRQQ, FIWRQQ, etc. symbols, but their value is zero. When the linker applies the fix-ups, the object code remains unchanged.

This is very convenient, because libraries can be emulation-ready and can be used with or without FPU emulation.

Note that there is a slight penalty for generating emulation-ready code. For FWAIT, the language translator must emit NOP instructions to make enough room for the software interrupts. 8087 code that does not need to be emulated can dispense with those NOPs. For typical floating-point code, this likely makes very little difference, because most floating-point instructions must be preceded by WAITs anyway.

Intel Implementation

Let’s now go back to the original implementation which Microsoft copied with a delay of two or three years.

As far as I was able to find out, Intel’s first assembler with 8087 support was the 8080-based ASM86 V3.0 from 1980 (the year the 8087 was introduced). The way ASM86 V3.0 worked was in principle identical to Microsoft’s MASM versions from 1983.

Back in 1980, Intel was a nice and helpful company (unlike the soulless behemoth it turned into), and the emulation mechanism was partially documented. Although the documentation is actually misleading—perhaps it reflects an earlier variant of the implementation where language translators emulated zero(ish) opcode bytes and the fix-ups turned them either into FPU instructions or into software INT instructions.

The actual ASM86 V3.0 from 1980 works a lot like Microsoft’s MASM 3.0 from 1984, although the fix-up names are quite different, and the implementation isn’t identical. Here’s a table of Microsoft names with their Intel equivalents:

FIDRQQ – M:_WST
FIWRQQ – M:_WT
FIARQQ – M:_WDS
FICRQQ – M:_WCS
FIERQQ – M:_WES
FISRQQ – M:_WSS

Intel mentions that the symbol names with a colon cannot be produced by normal language translators and are therefore effectively reserved symbols, guaranteed to not clash with user-chosen symbol names.

In Intel’s implementation, M:_WST was the exact equivalent of Microsoft’s FIDRQQ, but the actual value was different since Intel used a different interrupt range. M:_WST produced INT 18h-1Fh. M:_WT turned a WAIT into a NOP and did not use a software interrupt.

Unlike Microsoft, Intel used separate interrupt vectors (14h-17h) to indicate segment overrides ES, CS, SS, and DS, in that order. Thus Intel did not need two fix-ups for instructions with a segment override, but required more interrupt vectors for emulation.

Intel also implemented five additional fix-ups: M:_NST, M:_NCS, M:_NDS, M:_NES, M:_NSS. These were used for the no-wait form of instructions, such as FNSTSW or FNINIT. Note that Microsoft never supported these and even e.g. MASM 5.1 cannot deal with FNSTSW when assembling emulation-ready code (MASM 6.x pretends it does but just emulates straight FPU instructions with no emulation fix-ups).

Another slight difference was that Intel’s ASM86 V3.0 accepted 8087 instruction without needing to be prodded by the user, and always produced fix-ups needed for FPU emulation.

It is fair to say that Intel’s initial 1980 implementation was more complete than Microsoft’s copy from 1983 (which could not deal with segment overrides or no-wait instructions) and even the improved 1984 Microsoft version (which could handle segment overrides but not no-wait instructions).

Summary

The 8087 emulation mechanism is a work of art. Language translators only need to emit a handful of special fix-ups when generating floating-point instructions. Simple linkers need no special support at all and only apply the fix-ups provided by libraries. The decision whether 8087 instructions should be emulated or not is postponed to link time.

Almost all of the magic is contained in the emulation library, which provides the special fix-ups, and most importantly, provides code to emulate the 8087 floating-point instructions, which is far from a trivial task.

Intel first implemented 8087 emulation in 1980. Microsoft implemented their own variant circa 1982-1983. Microsoft’s implementation was clearly based on the Intel original, although it is not identical and has a different set of deficiencies and advantages.