PSA: Actual hardware behavior takes precedence over vendor documentation. Or, as they say… trust but verify.
A reader recently complained how Intel and AMD do not implement the SGDT and SIDT instructions the same way. AMD documentation states that these instructions ignore any operand size prefixes and always store full 32 bits of base address. Intel documentation on the other hand states that with 16-bit operand size, SGDT/SIDT stores 24 bits of the base address and the 4th byte is zeroed, and while using 32-bit operand size, all 32 bits of the base address are stored.
What a mess, right? How is a poor developer supposed to write code that works on all CPUs, and why the heck is AMD inventing its own things? Yet the reality is a bit different…
The truth is much simpler and more logical: AMD did not invent incompatible behavior, AMD processors merely behave the same way as Intel CPUs. Does that mean AMD’s documentation is incorrect? Of course not—it is Intel’s documentation which is an elaborate fiction and AMD’s straightforward documentation describes what AMD and Intel processors do and always have done since 1985.
Hard to believe? Perhaps. But Intel’s documentation on this point has been going from contradictory to flat out wrong for several decades now. As far as the OS/2 Museum could ascertain, Intel’s 32-bit CPUs always ignored the operand size prefix and stored the full 32 bits of GDT/IDT base when executing SGDT/SIDT instructions; the CPU behavior never changed, although the documentation did.
Testing a small sample of random processors is perhaps not a convincing proof, but there is at least one fairly widespread piece of software which relies on always storing the full 32 bits and won’t work otherwise.
Microsoft’s own Win32s (tested version 1.30c) contains code which executes SGDT in a 16-bit code segment which resides within WIN32S16.DLL (the name is a clear hint about the bitness). But the CPU runs with paging enabled and the GDT is mapped at a virtual address beyond 2GB, i.e. with the top bits always set. If the SGDT instruction clears the top byte, Win32s will crash:
To trigger the crash, it may be necessary to run Windows 3.11 for Workgroups with either 32-bit disk access or 32-bit file access enabled (when both are turned off, the crash does not appear to happen; turning on either of the two provokes the crash).
When SGDT emulation follows Intel documentation, Win32s is known to break as shown above. And of course the fix is to emulate what Intel processors do, not what Intel documentation says.
Windows 3.11 for Workgroups + Win32s 1.30c + Freecell is not a wildly exotic combination. It worked on Intel CPUs back in the 1990s, and it still works on Intel processors today when hardware virtualization is used and instructions like SGDT are executed directly by the CPU (not emulated).
So where did the fictional SGDT/SIDT documentation come from? It would be reasonable to assume that perhaps the documentation was written before the implementation was finalized, but that may not be the case. The Intel 386 reference manual from 1986, as well as Intel’s 1992 i486 PRM, are simply schizophrenic.
The 386 manual states the following in the LGDT/LIDT instruction documentation:
The LGDT and LIDT instructions load a linear base address and limit value from a six-byte data operand in memory into the GDTR or IDTR, respectively. If a 16-bit operand is used with LGDT or LIDT, the register is loaded with a 16-bit limit and a 24-bit base, and the high-order eight bits of the six-byte data operand are not used. If a 32-bit operand is used, a 16-bit limit and a 32-bit base is loaded; the high-order eight bits of the six-byte operand are used as high-order base address bits.
The SGDT and SIDT instructions always store into all 48 bits of the six-byte data operand. With the 80286, the upper eight bits are undefined after SGDT or SIDT is executed. With the 80386, the upper eight bits are written with the high-order eight address bits, for both a 16-bit operand and a 32-bit operand. If LGDT or LIDT is used with a 16-bit operand to load the register stored by SGDT or SIDT, the upper eight bits are stored as zeros.
The above concisely describes the actual behavior of 32-bit Intel processors. But wait, that’s not all. The SGDT/SIDT instruction documentation says this:
SGDT/SIDT copies the contents of the descriptor table register to the six bytes of memory indicated by the operand. The LIMIT field of the register is assigned to the first word at the effective address. If the operand-size attribute is 32 bits, the next three bytes are assigned the BASE field of the register, and the fourth byte is written with zero. The last byte is undefined. Otherwise, if the operand-size attribute is 16 bits, the next four bytes are assigned the 32-bit BASE field of the register.
There’s even a compatibility note in the SGDT/SIDT documentation:
The 16-bit forms of the SGDT/SIDT instructions are compatible with the 80286, if the value in the upper eight bits is not referenced. The 80286 stores 1’s in these upper bits, whereas the 80386 stores 0’s if the operand-size attribute is 16 bits. These bits were specified as undefined by the SGDT/SIDT instructions in the iAPX 286 Programmer’s
A keen reader has already noticed that the SGDT/SIDT documentation makes no sense whatsoever—it claims that a 16-bit instruction stores 32-bit base and a 32-bit instruction stores 24-bit base. Perhaps a simple typo, perhaps a hint that the whole thing is suspect.
The bigger problem of course is that the LxDT and SxDT documentation flatly contradict each other. SGDT documentation says that operand size matters, but LGDT documentation says that (for SGDT) it does not matter. At least one must be wrong.
The 486 PRM from 1992 cleaned up the SGDT/SIDT documentation:
If the operand-size attribute is 16 bits, the next three bytes are assigned the BASE field of the register, and the fourth byte is undefined. Otherwise, if the operand-size attribute is 32 bits, the next four bytes are assigned the 32-bit BASE field of the register.
Now it’s merely incorrect but no longer obvious nonsense. There is still a contradiction in that the SGDT/SIDT documentation claims in the main body that 16-bit operand size leaves the fourth BASE byte undefined, while the compatibility note says it’s written with zeros. And the 486 PRM still has LGDT/LIDT and SGDT/SIDT instruction documentation contradict each other. To improve programmer confusion, the 486 PRM also presents the following pseudocode for SGDT/SIDT:
DEST ← 48-bit BASE/LIMIT register contents;
That’s right, the pseudocode is correct, and the text which immediately follows it says something else! What to believe, that is the question… (Answer: The CPU itself!)
The Pentium documentation (order no. 241430-004, 1995) presents the same self-contradictory information as the 486 PRM.
The Pentium Pro documentation (order no. 232691-001, December 1995) improved things, kind of. The LGDT/LIDT instruction documentation no longer describes SGDT/SIDT behavior. And the pseudocode in SGDT/SIDT documentation now matches the text. Curiously, the pseudocode starts with “IF instruction is IDTR” when it should have been “IF instruction is SIDT”. Again, let’s chalk that up to a simple typo.
Anyway, the Pentium Pro SGDT/SIDT documentation is consistent with itself! Yay! But it’s consistently wrong… oops. Then again, one out of two ain’t so bad, right?
The latest Intel SDM as of this writing (325383-062, March 2017) still presents the same incorrect information. Along the way, there was a split such that SGDT and SIDT are documented separately. Probably as a result of that, the pseudocode for SGDT now starts with a nonsensical “IF instruction is SGDT” statement—well duh, what other instruction did you think SGDT was? (There’s analogous redundant pseudocode for SIDT.) Information about 64-bit support was added years ago, and a few revisions back, the new CR4.UMIP bit was documented. When it comes to 16-bit operand size, the description and pseudocode is just as wrong as it’s been since 1995.
The whole thing looks like a misunderstanding caused by asymmetry between SxDT and LxDT instructions. Intel and AMD both agree that for LGDT and LIDT, operand size does make a difference, and if 16-bit operand size is used, only 24 bits of the base address are loaded (with the top bits being zeroed). If that is taken into account, then SGDT/SIDT doesn’t need to do anything special.
Pure 16-bit software will always run with the top 8 bits of GDT/IDT base as zero because LGDT/LIDT can’t do anything else, and therefore SGDT/SIDT simply needs to store the actual 32-bit base as it is, regardless of operand size. In other words, for pure 16-bit code, it does not matter if SGDT stores 24 bits of base address plus a byte of zeros or 32 bits of base address, the result will be identical.
But if 16-bit code runs in a 32-bit environment, SGDT/SIDT will store the full 32-bit base. That seems logical and useful because storing 24 bits of a 32-bit address would corrupt the data (see Win32s).
It is very interesting to see how the documentation got more wrong over the years. It started out as partially correct but inconsistent and the inconsistencies were ironed out, unfortunately removing the correct information and only leaving the erroneous documentation in place. Sometimes Intel gives a distinct impression that processor architects and documentation writers don’t talk to each other much.