SGDT/SIDT Fiction and Reality

Posted on May 4, 2017 by Michal Necasek

PSA: Actual hardware behavior takes precedence over vendor documentation. Or, as they say… trust but verify.

A reader recently complained how Intel and AMD do not implement the SGDT and SIDT instructions the same way. AMD documentation states that these instructions ignore any operand size prefixes and always store full 32 bits of base address. Intel documentation on the other hand states that with 16-bit operand size, SGDT/SIDT stores 24 bits of the base address and the 4th byte is zeroed, and while using 32-bit operand size, all 32 bits of the base address are stored.

What a mess, right? How is a poor developer supposed to write code that works on all CPUs, and why the heck is AMD inventing its own things? Yet the reality is a bit different…

The truth is much simpler and more logical: AMD did not invent incompatible behavior, AMD processors merely behave the same way as Intel CPUs. Does that mean AMD’s documentation is incorrect? Of course not—it is Intel’s documentation which is an elaborate fiction and AMD’s straightforward documentation describes what AMD and Intel processors do and always have done since 1985.

Hard to believe? Perhaps. But Intel’s documentation on this point has been going from contradictory to flat out wrong for several decades now. As far as the OS/2 Museum could ascertain, Intel’s 32-bit CPUs always ignored the operand size prefix and stored the full 32 bits of GDT/IDT base when executing SGDT/SIDT instructions; the CPU behavior never changed, although the documentation did.

Testing a small sample of random processors is perhaps not a convincing proof, but there is at least one fairly widespread piece of software which relies on always storing the full 32 bits and won’t work otherwise.

Win32s

Microsoft’s own Win32s (tested version 1.30c) contains code which executes SGDT in a 16-bit code segment which resides within WIN32S16.DLL (the name is a clear hint about the bitness). But the CPU runs with paging enabled and the GDT is mapped at a virtual address beyond 2GB, i.e. with the top bits always set. If the SGDT instruction clears the top byte, Win32s will crash:

Win32s crashing due to incorrect SGDT implementation

To trigger the crash, it may be necessary to run Windows 3.11 for Workgroups with either 32-bit disk access or 32-bit file access enabled (when both are turned off, the crash does not appear to happen; turning on either of the two provokes the crash).

When SGDT emulation follows Intel documentation, Win32s is known to break as shown above. And of course the fix is to emulate what Intel processors do, not what Intel documentation says.

Windows 3.11 for Workgroups + Win32s 1.30c + Freecell is not a wildly exotic combination. It worked on Intel CPUs back in the 1990s, and it still works on Intel processors today when hardware virtualization is used and instructions like SGDT are executed directly by the CPU (not emulated).

Documentation History

So where did the fictional SGDT/SIDT documentation come from? It would be reasonable to assume that perhaps the documentation was written before the implementation was finalized, but that may not be the case. The Intel 386 reference manual from 1986, as well as Intel’s 1992 i486 PRM, are simply schizophrenic.

The 386 manual states the following in the LGDT/LIDT instruction documentation:

The LGDT and LIDT instructions load a linear base address and limit value from a six-byte data operand in memory into the GDTR or IDTR, respectively. If a 16-bit operand is used with LGDT or LIDT, the register is loaded with a 16-bit limit and a 24-bit base, and the high-order eight bits of the six-byte data operand are not used. If a 32-bit operand is used, a 16-bit limit and a 32-bit base is loaded; the high-order eight bits of the six-byte operand are used as high-order base address bits.

The SGDT and SIDT instructions always store into all 48 bits of the six-byte data operand. With the 80286, the upper eight bits are undefined after SGDT or SIDT is executed. With the 80386, the upper eight bits are written with the high-order eight address bits, for both a 16-bit operand and a 32-bit operand. If LGDT or LIDT is used with a 16-bit operand to load the register stored by SGDT or SIDT, the upper eight bits are stored as zeros.

The above concisely describes the actual behavior of 32-bit Intel processors. But wait, that’s not all. The SGDT/SIDT instruction documentation says this:

SGDT/SIDT copies the contents of the descriptor table register to the six bytes of memory indicated by the operand. The LIMIT field of the register is assigned to the first word at the effective address. If the operand-size attribute is 32 bits, the next three bytes are assigned the BASE field of the register, and the fourth byte is written with zero. The last byte is undefined. Otherwise, if the operand-size attribute is 16 bits, the next four bytes are assigned the 32-bit BASE field of the register.

There’s even a compatibility note in the SGDT/SIDT documentation:

The 16-bit forms of the SGDT/SIDT instructions are compatible with the 80286, if the value in the upper eight bits is not referenced. The 80286 stores 1’s in these upper bits, whereas the 80386 stores 0’s if the operand-size attribute is 16 bits. These bits were specified as undefined by the SGDT/SIDT instructions in the iAPX 286 Programmer’s
Reference Manual.

A keen reader has already noticed that the SGDT/SIDT documentation makes no sense whatsoever—it claims that a 16-bit instruction stores 32-bit base and a 32-bit instruction stores 24-bit base. Perhaps a simple typo, perhaps a hint that the whole thing is suspect.

The bigger problem of course is that the LxDT and SxDT documentation flatly contradict each other. SGDT documentation says that operand size matters, but LGDT documentation says that (for SGDT) it does not matter. At least one must be wrong.

The 486 PRM from 1992 cleaned up the SGDT/SIDT documentation:

If the operand-size attribute is 16 bits, the next three bytes are assigned the BASE field of the register, and the fourth byte is undefined. Otherwise, if the operand-size attribute is 32 bits, the next four bytes are assigned the 32-bit BASE field of the register.

Now it’s merely incorrect but no longer obvious nonsense. There is still a contradiction in that the SGDT/SIDT documentation claims in the main body that 16-bit operand size leaves the fourth BASE byte undefined, while the compatibility note says it’s written with zeros. And the 486 PRM still has LGDT/LIDT and SGDT/SIDT instruction documentation contradict each other. To improve programmer confusion, the 486 PRM also presents the following pseudocode for SGDT/SIDT:

DEST ← 48-bit BASE/LIMIT register contents;

That’s right, the pseudocode is correct, and the text which immediately follows it says something else! What to believe, that is the question… (Answer: The CPU itself!)

The Pentium documentation (order no. 241430-004, 1995) presents the same self-contradictory information as the 486 PRM.

The Pentium Pro documentation (order no. 232691-001, December 1995) improved things, kind of. The LGDT/LIDT instruction documentation no longer describes SGDT/SIDT behavior. And the pseudocode in SGDT/SIDT documentation now matches the text. Curiously, the pseudocode starts with “IF instruction is IDTR” when it should have been “IF instruction is SIDT”. Again, let’s chalk that up to a simple typo.

Anyway, the Pentium Pro SGDT/SIDT documentation is consistent with itself! Yay! But it’s consistently wrong… oops. Then again, one out of two ain’t so bad, right?

The latest Intel SDM as of this writing (325383-062, March 2017) still presents the same incorrect information. Along the way, there was a split such that SGDT and SIDT are documented separately. Probably as a result of that, the pseudocode for SGDT now starts with a nonsensical “IF instruction is SGDT” statement—well duh, what other instruction did you think SGDT was? (There’s analogous redundant pseudocode for SIDT.) Information about 64-bit support was added years ago, and a few revisions back, the new CR4.UMIP bit was documented. When it comes to 16-bit operand size, the description and pseudocode is just as wrong as it’s been since 1995.

SxDT/LxDT Asymmetry

The whole thing looks like a misunderstanding caused by asymmetry between SxDT and LxDT instructions. Intel and AMD both agree that for LGDT and LIDT, operand size does make a difference, and if 16-bit operand size is used, only 24 bits of the base address are loaded (with the top bits being zeroed). If that is taken into account, then SGDT/SIDT doesn’t need to do anything special.

Pure 16-bit software will always run with the top 8 bits of GDT/IDT base as zero because LGDT/LIDT can’t do anything else, and therefore SGDT/SIDT simply needs to store the actual 32-bit base as it is, regardless of operand size. In other words, for pure 16-bit code, it does not matter if SGDT stores 24 bits of base address plus a byte of zeros or 32 bits of base address, the result will be identical.

But if 16-bit code runs in a 32-bit environment, SGDT/SIDT will store the full 32-bit base. That seems logical and useful because storing 24 bits of a 32-bit address would corrupt the data (see Win32s).

It is very interesting to see how the documentation got more wrong over the years. It started out as partially correct but inconsistent and the inconsistencies were ironed out, unfortunately removing the correct information and only leaving the erroneous documentation in place. Sometimes Intel gives a distinct impression that processor architects and documentation writers don’t talk to each other much.

This entry was posted in 286, 386, AMD, Documentation, Intel. Bookmark the permalink.

39 Responses to SGDT/SIDT Fiction and Reality

zeurkous says:

May 4, 2017 at 7:53 pm

Or that they produced a working processor merely by accident!
Rich says:

May 4, 2017 at 8:02 pm

The documentation team I believe moved to Microsoft and wrote the Win32 documentation in MSDN
Paranoid Survivor says:

May 4, 2017 at 8:51 pm

Another possibility is that these documentation errors are intentional. Imagine you had produced a clone chip by the docs, passed all your validations, test softwares, and then Win32s came out. BOOM.

Perhaps this is an ignorant question, but why was Appendix H kept secret? To slow down other x86 chip makers, or to slow down VMWare, or something else?
Michal Necasek says:

May 4, 2017 at 9:44 pm

The cloners know that. They realize that they have to examine how Intel CPUs work, not how Intel says they work. Not least because the documentation isn’t complete enough to produce a clone.

I believe Appendix H was kept under NDAs in order to slow down other x86 chip makers (AMD, Cyrix). It certainly didn’t help Intel because it was harder for ISVs to take advantage of Intel CPU features. Which is I think why they gave up on that practice.

Oh, and VMware didn’t even exist yet. Plus VMware is Intel host only, so why should Intel care. Something like SoftPC or Virtual PC might have been more of a concern, but I doubt it.
Yuhong Bao says:

May 4, 2017 at 10:18 pm

I wonder why they did this for LGDT/LIDT on the 80386 in the first place.
Yuhong Bao says:

May 4, 2017 at 10:20 pm

Don’t forget the long nops (0F 18 to 0F 1F) as well (Pentium Pro). I have an old email thread with H Peter Anvin on this one.
Michal Necasek says:

May 4, 2017 at 11:12 pm

Why they did it for LGDT/LIDT… because they either knew of some software or suspected there might be 16-bit software that leaves trash in the last byte of the base when loading it, and they didn’t want to break it. Why they did SGDT/SIDT the way they did is a more interesting question but I think the behavior makes good sense and does not break backwards compatibility.
Michal Necasek says:

May 4, 2017 at 11:12 pm

I’m sure it’s interesting, but how is it relevant? It’s just not apparent to me at the moment.
Yuhong Bao says:

May 5, 2017 at 12:20 am

Because it is another good example.
RetroCPU says:

May 5, 2017 at 12:23 am

Well, normally I use the Intel 80386 Programmer’s Reference Manual for my own 386 emulator, but I agree that the Intel manual sometimes has a strange way of describing certain features of the 386.

For example, the documentation for the SGDT/SLDT/SIDT and LGDT/LLDT/LIDT instructions doesn’t make it clear at all to the user how to set the various “bits” (or flags) in the descriptor tables themselves.

Another issue is with paging. The documentation describes the function of paging itself but doesn’t actually guide the user through allocating (or removing) any of the pages themselves. In many ways, the documentation implies to the user that it’s simply a matter of setting the paging bit in the CR0 (formerly MSW) register and forgetting about everything else afterwards, even when the reality is that nothing could be farther from the truth.

And finally, was the case of adding in support for the 386 PUSHFD and POPFD instructions. I had to go through the Internet to find reliable information on how those instructions actually operated in order for my 386 emulator to identify as a 386 instead of as a 286, because otherwise, making the necessary changes instead resulted in the CPU being identified as being an 80186, which of course is even worse than having it being identified as an 80286. Only after many hours did I finally get it to identify as a 386 or higher.

Still, I continue to use the Intel 80386 Programmer’s Reference Manual, since it’s easily available online in HTML format (whereas virtually every other CPU-related document is still only available as a PDF file), meaning that I can easily move between different sections by clicking on links or using the “Back” and “Forward” buttons in my browser, something which would not at all be possible in a PDF file. Also, most Intel CPU-related websites available online are for newer CPUs.
Michal Necasek says:

May 5, 2017 at 4:10 pm

To be clear, writing good documentation is hard, and Intel sometimes succeeds and sometimes not. I have some experience reading Intel x86 processor documentation and among the common shortcomings are: 1) Missing or incomplete information (“undefined behavior” is really not helpful); 2) A lot of detail but no higher-level overview (why would one want to use a given feature); 3) Documentation so verbose that it is very difficult to discern the meaning (sometimes, 10 words is better than 100); 4) Inaccurate or erroneous information (see SGDT/SIDT); 5) Deliberately undocumented instructions/features (SETALC or ICEBP, anyone?). The last one is frankly ridiculous.

PDFs are a lot more usable these days, that is to say the PDF viewers got a lot better. Searching the gigantic Intel SDMs in Acrobat is not nearly as slow as you might think.

I can highly recommend getting a real 386 and checking how it truly works. There’s an awful lot the documentation simply does not say, or says it in such an obtuse manner that you’ll only find the documentation once you know what it should say. Something like task switches is barely documented, and even very basic things like what happens with a 32-bit push or pop of a segment register are documented poorly or not at all. There is also an unpleasantly large number of areas where the behavior of various Intel CPU generation differs, and only some of those differences are clearly documented.

BTW if you can figure out exactly how the 386 “POPAD bug” works, I’m interested 🙂 AFAIK it’s present on every Intel/AMD 386.
zeurkous says:

May 5, 2017 at 6:10 pm

Writing good documentation IME isn’t hard, yet it requires a good
system and a lot of discipline.

An example of the first is mdoc (although the quality of especially
lunix manual pages tends to be dreadful), for the latter you need the
right personality.

That leaves me to say that I find plain text (or something easily and
unambigiously converted to plain text) the most useful as a medium,
especially for “pills” like the documentation of an entire processor.

One suspects some conservatism in engineering circles. How unexpected 🙂
zeurkous says:

May 5, 2017 at 6:11 pm

Michal, why are smilies cut, anyway? At least they don’t appear here…
Michal Necasek says:

May 5, 2017 at 7:23 pm

I see a smiley here. Browser?
Michal Necasek says:

May 5, 2017 at 7:24 pm

Problem: It is very difficult to understand something you don’t understand. And the x86 architecture is extremely difficult to understand. (On the one hand that’s a problem, on the other it’s a fact of life.)
zeurkous says:

May 5, 2017 at 8:29 pm

Oh, you don’t log that? Lynx.
zeurkous says:

May 5, 2017 at 8:31 pm

Yeah well, ideally, the subject and documentation should be developed
together. Though that won’t help much if you don’t understand your own
architecture!
zeurkous says:

May 5, 2017 at 8:47 pm

About the smiley problem: none of them appear even in the HTML code.
Does wordpress now resort to generating diff HTML based on the
luser-agent header, or wtf else is going on?
Yuhong Bao says:

May 5, 2017 at 10:39 pm

The unicode codepoint for the alt text for example seems to be U+1F642.
zeurkous says:

May 5, 2017 at 10:59 pm

A hex dump reveals that a wide character is there. But this is a text
terminal (and certainly not an xterm!) that unfortunately doesn’t
support Unicode.

I’d say that wordpress is being excessively clever by turning my
perfectly normal smiley into a special character. Can that please be
fixed? Thanks in advance.
crazyc says:

May 6, 2017 at 5:44 am

I love how in the 386 ref description of IRET STACK-RETURN-TO-V86 it says that descriptor privilege checks are done on the return segment register values.
Paranoid Survivor says:

May 6, 2017 at 6:49 am

Does anyone know when VMX was implemented by products of the Pentium era? Win 3.x, 95, OS/2, DOS extenders, DESQview, DR-DOS, QEMM, etc? That might show who was out of the loop.

Could it have been Microsoft who wanted it under wraps while they were in the thick of their platform war?
Richard Wells says:

May 6, 2017 at 9:20 am

Virtual Machine Extensions are listed as added to QEMM 7 (1993) and OS/2 2.1 (May 1993 beta). I also recall a Usenet thread about Virtual Mode Extensions in regards to OS/2 with a recommendation to use VME=no (turns it off) because of problems scheduled to be fixed in Warp Fixpack 14. It was not working reliably for OS/2 in late 1995.

Conversely, I think Windows 3.x and 95 did not support VME.
Michal Necasek says:

May 6, 2017 at 10:13 pm

That sounds like some copy-and-paste fiction…
Michal Necasek says:

May 6, 2017 at 10:26 pm

VMX was introduced in late 2005, under the name VT-x. Oh wait, you must be talking about VME 🙂

It wasn’t Microsoft, Intel had no trouble giving the information (under NDA) to IBM, Quarterdeck, and others. Windows 3.1 definitely didn’t support VME (too old), I don’t know about Win9x off hand (but that OTOH was relatively late anyway).

At least one ISV had information about VME no later than Fall 1990, but the hardware (Pentium) was not available until 1993 and I think software vendors were typically wary of trying to write code that only ran on non-existent CPUs. Intel was clearly aware of V86 mode shortcomings, but it took so long to make the VME-capable hardware widely available that it barely mattered.
Yuhong Bao says:

May 6, 2017 at 10:37 pm

Though Intel did put it into later 486s too.
Yuhong Bao says:

May 8, 2017 at 10:37 am

Worth mentioning: https://blogs.msdn.microsoft.com/oldnewthing/20041215-00/?p=37003
Michal Necasek says:

May 8, 2017 at 3:16 pm

That story makes Microsoft look a bit silly. Why would they ask for faster #UD processing when what they really wanted was faster “VM escapes” from V86 mode…
Yuhong Bao says:

May 8, 2017 at 6:12 pm

The point is that it is likely how the discussions that led to VME was started.
Michal Necasek says:

May 8, 2017 at 6:17 pm

What does VME do to speed up the “VM escape” which uses #UD (I believe Windows 3.x/9x, OS/2, and NT all use this technique)? VME reduces the overhead of handling dispatch and return for interrupts that are handled within the VM, but that never used invalid opcodes.
Yuhong Bao says:

May 8, 2017 at 6:19 pm

It didn’t. The point is that it probably led to further discussions about other V86 mode problems.
Michal Necasek says:

May 9, 2017 at 10:36 am

Actually I think the problems that VME addresses were known pretty much as soon as anyone started trying to use V86 mode with DOS (certainly Compaq back in ’86, maybe Microsoft even earlier?). Every DOS/BIOS INT function call traps to the VMM, and so does every IRET, which is not helpful to say the least. It’s really too bad that VME didn’t make it into the initial 486, it could have had much more of an impact.

But VME is one of those nice examples where a new CPU (Pentium) was a lot faster than the existing offerings in general, and in specific cases it was much much faster thanks to new features.
Joshua says:

May 16, 2017 at 2:36 am

Thanks for this very intersting post and everyones comments!

I’m curious, does the bug also affect OS/2 (native or virtual and which version) ?
According to this link, OS/2 supports a parameter to disable VME.

http://www.markcrocker.com/rexxtipsntricks/rxtt28.2.0796.html
Michal Necasek says:

May 16, 2017 at 10:08 am

Not verified but OS/2 should be affected too, VME support (initially buggy) was added in OS/2 2.1: http://ps-2.kev009.com/eprmhtml/eprmx/h12304.htm I can confirm that at least OS/2 Warp and later enables VME by default.

Interestingly, what does not enable VME is Windows 98 SE.
Michal Necasek says:

May 16, 2017 at 10:44 am

Verified, OS/2 is affected too, not at all surprisingly. BTW I think you meant to post the comment on a different article… but no problem 🙂
Den says:

May 27, 2017 at 10:58 pm

Intel listens to emails with fixes to docs. I tried once and it worked.
Michal Necasek says:

May 27, 2017 at 11:01 pm

I tried it once and it didn’t. But I certainly believe that it worked for you.
Sean McDonough says:

December 19, 2017 at 8:52 pm

Why Intel can’t be arsed to write accurate documentation…
Pingback: Nobody Expects… | OS/2 Museum