Better Late Than Never

Better late than never, although in this instance, it’s really really late—about thirty years late. In the world of computing, that is eternity.

The talk is about the new CR4.UMIP control bit documented in the latest (revision 58) Intel SDM, and the corresponding CPUID feature bit. When set, the CR4.UMIP (User Mode Instruction Prevention) bit prevents the SGDT, SIDT, SLDT, SMSW, and STR instructions from being executed outside of the highest-privileged code (ring 0).

The question is of course why it was ever possible to execute these instructions from unprivileged code. Setting the critical registers (GDTR, IDTR, LDTR, MSW, TR) was never possible from user code, but they could be freely read. The excuse Intel had back in 1982 (when these instructions became part of the 80286) was that they didn’t know what they were doing. That was much less of an excuse with the 386 and by the early 1990, it was well known to be a problem.

It’s a problem for at two reasons. It exposes critical information to unprivileged code for no good reason. That is implicitly a security issue, because such information needs to be protected.

The other reason is virtualization. These instructions caused headaches to anyone attempting to implement efficient x86-on-x86 virtualization. A guest OS invariably needs critical system structures like the GDT “faked”. Writing the system registers is no problem because the relevant instructions can be trapped and emulated. But reading does not trap, and that’s bad—the virtualized OS might see what it’s not supposed to see. All code needs to be carefully scanned and patched to prevent the guest OS from executing such instructions directly.

There are other instructions that pose problems, such as CPUID or flag manipulation. There’s no doubt that Intel’s addition of CR4.UMIP was motivated by security since for virtualization, VT-x solves all these problems and more. But really, if they had thought of this decades ago, they could have saved the industry quite a bit of effort and money. It’s one of those inexplicable “well, duh” things.

This entry was posted in 286, Intel, x86. Bookmark the permalink.

13 Responses to Better Late Than Never

  1. Yuhong Bao says:

    Thinking about it, it is probably not difficult to create a x86 subset that is user mode compatible with most modern programs but lacks things like segmentation and real mode.

  2. Richard Wells says:

    The iAPX-432 had all the security features mentioned and it was slow which is why there is very little 432 software and almost no call for 432 virtualization. The 432 had fast context switching relative to the 386. Placing such features on the 386 would have cut performance similarly. True, 386 virtualization would be easier to write but no one would want one.

    The world was a different place when databases would be designed to run solely in ring zero just to avoid context switches. Without the ability to read registers for user mode, there would have been almost no chance of convincing anyone to write any code that was not running in ring zero. Even hiding the registers won’t help that much since anyone could write a ring zero helper in just a few minutes. Same lack of security but at least 600 cycles slower per call.

  3. Michal Necasek says:

    That’s all true… but how is it relevant? Yeah, checking the CPL in instructions like SGDT or SIDT might cost a cycle or two, but with the possible exception of SMSW, these aren’t frequently executed instructions.

    I also don’t think the 286/386 was insecure (and I hope I didn’t say that!). I’m not aware of any known ways of defeating the 286/386 ring protection. Buggy software, sure, but not inherent hardware design flaws. Now, things like leaking information through SGDT/SIDT/etc. may make it easier to attack the host OS, but it is not an exploitable security flaw in itself as long as the OS is properly designed.

  4. zeurkous says:

    It’s sure sloppy, though.

  5. Richard Wells says:

    Consider instead one of the backdoors to the Machine Specific Registers: RDTSC. Proper secured design would have required accessing time-stamp counter through an OS ring zero process instead of directly. In some cases, that round trip (including privilege level changes) could take milliseconds. That has potentially rather limiting effects on the number of IOPS a database can produce.

    A CPU designer can not know how an OS or application will best use new instructions 5 or 10 in years in the future. Intel tried the properly siloed design. It didn’t work. Sloppy but fast wins out every time even if it makes security theorists apoplectic.

  6. zeurkous says:

    The thing that humans tend to not understand is that margin of error is inversely proportional to the power (speed or otherwise) of a mechanism, and that this rule also applies to computers.

    I’ll agree that with general-purpose machines, such protection should be optional, not mandatory. But it should still be included.

  7. Michal Necasek says:

    Screw security theorists, the danger is that if the design is flawed and it has serious and well publicized consequences, it can lead to very expensive recalls, brand name damage, and other disasters. Not like that never happened to Intel either. I agree that sloppy and fast wins every time, but it’s not necessarily a recipe for long-term success.

    Anyway, RDTSC is a very bad example… because that was done the right way from the beginning. Together with RDTSC, the Pentium added the CR4.TSD bit which, when set, makes RDTSC a privileged instruction. The OS can decide whether application can use RDTSC directly, and it can control that on a per-task basis.

  8. Michal Necasek says:

    Exactly. See RDTSC example 🙂 It’s not like Intel didn’t learn anything.

  9. Sean McDonough says:

    Curious here, why don’t reads trap?

  10. Michal Necasek says:

    You’d have to ask Intel — it really doesn’t make sense.

  11. Xeno says:

    one Meltdown and one Spectre later…

  12. Matt Taylor says:

    There was never a reason for SGDT, SIDT, SLDT, or STR being accessible below ring 0, and it’s certainly not complexity. (Also, why does UMIP restrict these instructions to ring 0 instead of restricting them to CPL <= IOPL as was done everywhere else?)

    The proximate cause of the failure of the iAPX-432 wasn't having security. It failed because the processor was so massive that it required multiple chips, and there is a massive speed penalty when moving between chips. It may be argued that security was a casual factor for the size of the design.

    https://en.wikipedia.org/wiki/Intel_iAPX_432

    "One problem was that the two-chip implementation of the GDP limited it to the speed of the motherboard's electrical wiring. A larger issue was the capability architecture needed large associative caches to run efficiently, but the chips had no room left for that."

  13. Michal Necasek says:

    Checking the CPL does add some tiny but non-zero complexity to the microcode. I have no idea what the real reason was for making the “read” instruction available regardless of privilege level. In retrospect it just makes no sense.

    The UMIP implementation is logical, LGDT is not controlled by IOPL so why should SGDT be? These instructions have nothing to do with external devices, like port I/O or the interrupt flag do.

    I’m sure you’re right that security wasn’t the ultimate reason behind the iAPX-432’s downfall, the design’s extreme complexity (for the time) was. Security was just part of that complexity.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.