How the PC Industry Screws Things Up

I was recently involved in investigating a problem that turns out to be a complete SNAFU which nicely illustrates the chaos that is the PC platform. It’s about the NX/XD bit. Let’s start with a bit of history.

It turns out to be useful, and in fact required for improved security, to separate code and data. On the 8086, no such separation existed. On the 80286, protected-mode segmentation clearly distinguished between code and data segments. Writing to a code segment required creating a data-segment alias. The same segmentation rules carried over to 32-bit protected mode on the 80386, but lost any meaning in flat-model operating systems (OS/2 2.0, Windows NT, 32-bit UNIX). Because the data (DS) and code (CS) segments mapped the entire memory, any data page could be written to and executed. In the age of Internet, that turned out to be a terrible idea because any buffer overflow on the stack could be immediately exploited to inject and execute code in a vulnerable process.

In 2003, AMD introduced the NX (No-eXecute) bit as part of the AMD64 architecture. When enabled in the EFER MSR, 64-bit and PAE page tables support an additional bit which makes a page non-executable. As a result, it is again possible to separate between code and data, and make data pages non-executable. That makes exploits harder (far from impossible!) because there is no longer a ready supply of writable and executable pages and most importantly, the stack is no longer executable.

Microsoft saw NX as a Good Thing and added support for it in a Windows XP SP2 (2004), calling the feature DEP, or Data Execution Prevention. This was initially an opt-in feature because certain classes of legitimate software did not bother distinguishing between data and code and crashed with DEP enabled (typically software which generated code on the fly in one manner or another).

Microsoft presumably made it known to Intel that NX was a desirable feature. Intel had to support NX in all AMD64-compatible CPUs, but also added NX support to late-model 32-bit Prescott Pentium 4 CPUs, and even the later Pentium M processors.

Intel of course couldn’t leave well enough alone and called the bit XD (eXecute Disable), and more importantly, also added a way to completely disable the NX through the IA32_MISC_ENABLE (1A0h) MSR. That’s where things started going south.

Intel didn’t quite think things through. The way to detect whether NX can be disabled is checking the NX/XD bit in CPUID. After CPU reset, if the NX bit is set, NX can be disabled. Once NX is disabled, the CPUID bit disappears. At that point, there is no longer any bulletproof way to tell if NX can be enabled again. Oops.

Now enter BIOS vendors and OEMs. Firmware for Intel systems typically has an option to turn NX off (through the IA32_MISC_ENABLE MSR). To add insult to injury, some OEM systems ship with NX disabled by default, probably because making systems more secure by default is considered a bad idea. The crowning achievement is the Insyde BIOS (used by certain laptop models by HP and others) which in some cases disables NX and provides no way to turn it on.

Enter operating system vendors. Microsoft and Linux both consider NX a valuable feature which should be enabled and used. Microsoft feels strongly enough about it that Windows 8.1 and Windows 10 requires NX, in both 64-bit and 32-bit versions.

On the one hand we have firmware vendors and OEMs who consider NX a harmful feature which should be disabled by default, if not completely (or at least used to be, before Microsoft explained the errors of their ways to them). On the other hand we have OS vendors who consider NX a highly desirable feature (Linux) or even an essential requirement (Microsoft).

So what do OS vendors do? That’s right, they override the BIOS setting! It would be one thing if Microsoft could just tell users to enable NX in their firmware, but in some cases that’s simply not an option. Microsoft probably felt that its hand was forced by the OEMs who disabled NX with no user-accessible option to enable it.

The end result is that if one disables NX on a system in the firmware, Windows 10 will boot up anyway, and any recent Linux will show that NX is enabled. It’s as if the BIOS option wasn’t there. The Linux code which does that can be seen here. On certain Intel CPU models, Linux checks if the MSR_IA32_MISC_ENABLE_XD_DISABLE bit is set in the IA32_MISC_ENABLE MSR and if so, clears it (which enables NX). Please do not try to unpack the control bit macro name and think about what “miscellaneous enable execute disable disable” could possibly signify, besides brain damage. At any rate, booting some other OS, such as DOS, it is possible to verify that the BIOS really works and really does disable NX when asked to.

Windows 10 presumably does something rather similar to Linux, with the added twist that Windows doesn’t check if NX was truly enabled and just goes ahead and tries to turn it on. That can cause crashes in VMs where NX truly cannot be turned on (for example because it’s disabled on the host CPU). Again, the OS is in a difficult position because when NX is disabled, there is no architecturally defined way of determining whether it can be enabled. Then again, Linux checks the CPUID after attempting to enable NX in order to discover the true state, so Microsoft could, too.

The big question is why Intel added a way to disable NX at all. AMD saw no need to do it because NX is disabled by default and must be explicitly enabled by an OS when turning on PAE or 64-bit paging. Was it a solution to some actual problem? If so, what problem? Was it added just in case, because there might be problems?

HP has a useful document which lists Intel Pentium 4 processors with NX/XD support, and also provides a fairly long list of applications incompatible with NX. What the document does not explain is why it’s necessary to have a BIOS option to disable NX when the OS already has sufficient means of disabling it (and when the OS default was opt-in, anyway). The document also mentions that some older HP desktops (based on i915 chipset) had NX disabled by default, while newer ones (i945 chipset) had NX on by default.

BIOS vendors clearly felt it was important enough to go through the effort of a) writing extra code to disable NX, and b) adding a new option to change the behavior. The same question applies: What problem did it solve? We know that disabling NX in the BIOS causes problems, but we don’t know what problems, if any, it solves.

That question has of course been already asked, but not answered. There are some interesting answers which are so militantly wrong that it’s almost amusing, like this one: “If the operating system does not recognize this particular exception, as will be the case for operating systems written before the processor capability existed, the operating system will break if the processor capability is turned on […]”. Well, duh—an OS which knows nothing of NX will not turn it on, and it will never get such unexpected exceptions! Older 32-bit OSes don’t know anything about the EFER MSR anyway, so they can’t enable NX even by accident. Newer ones know better than to write random bits into the EFER. And all 64-bit OSes must know about NX because it’s part of the AMD64 architecture.

Presumably Intel knew all that, and must have had some other reason to allow disabling NX. But what was it? Does anyone know? There are really two separate questions there, why OEMs sometimes disable NX by default (but not always!), and why Intel made it possible at all. For OEMs it adds another way to control Windows DEP, but it is not obvious why multiple independent methods (firmware and OS) are beneficial. What is obvious is only that defaults which perhaps made sense in 2005 were harmful 5 years later.

As an aside, this particular class of problem is one that Apple doesn’t have. Their firmware people talk to the OS people, and if a new OS needs firmware changes, it can actually require/force a firmware update. In the PC world that is just not practical, and users suffer for it.

Update: I verified that on a Pentium 4 540 (older LGA775 Prescott), 32-bit Windows 10 AU really just crashes (triple faults/reboots). This is a CPU which has PAE and SSE2 but not NX/XD. Windows 10 can correctly handle a processor without PAE (displays a clear error message) but not the relatively unusual combination of PAE + SSE2 but not NX. Only older Pentium 4 CPUs (Willamette, Northwood, old Prescott) fall into that category, other CPUs (Pentium III, most Pentium M) either have no SSE2 or no PAE.

The oldest CPUs capable of running 32-bit Windows 10 (but not 64-bit!) are then most likely the original Sledgehammer Opterons from 2003.

This entry was posted in AMD, Intel, Microsoft. Bookmark the permalink.

37 Responses to How the PC Industry Screws Things Up

  1. zeurkous says:

    That almost sounds like a bloody game of Core Wars!

  2. Yuhong Bao says:

    I was the one who mentioned to the Linux folks about this back in 2010. Win8 was the first version of Windows to do it.

  3. Lazaro Millo says:

    “It’s about the NX/XD bit. Let’s start with a bit of history.”

    … a bit of history. I see what you did there! 🙂

  4. Michal Necasek says:

    Mentioned what to them? That the PC industry is a lair of demons? Or something else?

  5. Yuhong Bao says:

    Mentioned the MSR and the fact that it can be changed.

  6. Richard Wells says:

    I thought it was normal to include a BIOS option to disable any new function that affects system wide operation. HPET provided enough surprise interactions to make CPU designers and BIOS writers cautious about other introductions.

  7. Michal Necasek says:

    Exactly. The NX feature does absolutely nothing whatsoever besides setting a CPUID bit unless it is explicitly enabled. AMD didn’t think it was necessary to go beyond that, so why did Intel? That’s the mystery. The HPET is not transparent, it changes system behavior in incompatible ways. VT-x can be disabled and locked down because leaving it on is a potential security hazard. It makes perfect sense to make those switchable. NX is not in the same category.

    It’s possible that NX can be disabled (on Intel systems only, not on AMDs!) purely out of inertia and caution — which would be ironic because the extra caution is in fact causing problems. But maybe there really was some reason for it.

    There are various semi-mysterious BIOS options like disabling additional CPUID leaves or switching between MPS 1.1/1.4 which exist for reasons of compatibility with specific operating systems. NX doesn’t appear to be in that category… but maybe it is an no one knows about it.

  8. Michal Necasek says:

    Linux also overrides the CPUID leaf restriction which may be put in place by the BIOS for compatibility with NT 4. I’m guessing that override is older.

  9. Richard Wells says:

    Motherboard chipset drivers can’t be tested with NX until there is a working OS using NX. Having an OS supporting NX blow up because a driver isn’t NX friendly would give the system maker a bad name. The easiest way of handling this is to disable NX until testing has been done to make sure drivers work with NX and to include a BIOS option to disable NX in case there is a weird interaction that makes seemingly NX compliant code trigger an NX violation. Writing code anticipating changes 18 months out is hard.

  10. Yuhong Bao says:

    XP SP2 was already in beta by the Intel 915 chipset release in mid-2004, though maybe not the NX-capable Pentium 4s.

  11. Fernando says:

    Just for the argument:
    System Management Software.
    ACPI runtime.
    UEFI
    I think that we don’t know what these does exactly. Anybody knows if these touch the NX bit?, I think these three are independent from the operating system.

  12. Fernando says:

    @Richard Wells Linux 64 bit was available before x86-64 was available to the public, but I don’t know when was implemented the NX bit support in the kernel.

  13. Yuhong Bao says:

    It was added to the 32-bit Linux kernel in mid-2004:
    https://lkml.org/lkml/2004/6/2/228
    This article says that it was available in 64-bit Linux before that.

  14. Yuhong Bao says:

    One other thing, this post claims that the Surface 3 had a bug where it set this bit:

    https://lkml.org/lkml/2016/10/12/250

    Has anyone tested?

  15. ender says:

    I remember having trouble upgrading some machines with Windows 7 to 10 precisely because of this problem – XD disabled by BIOS and no option to turn it on (these weren’t HPs, but did use Insyde – might have been Acers). Windows 7 happily runs without it, if you boot Windows 10, it also works fine, but if you start Windows 10 upgrade installer from Windows 7, it complains about no XD support and refuses to continue. Luckily the workaround is to set Windows 7 DEP to AlwaysOn with bcdedit, which will cause 7 to also enable XD.

  16. Michal Necasek says:

    Yes, Acer was another vendor using Inside BIOS. Really fun stuff. I didn’t know Windows 7 could also force NX on.

    I read somewhere that about 1% of systems had NX disabled by default, which sounds insignificant, until you realize that it’s probably hundreds of thousands of affected users.

  17. Yuhong Bao says:

    I don’t think it can. If it does not touch the MSR, it can’t.

  18. Michal Necasek says:

    That makes sense as a “just in case/CYA” approach and may explain why NX can be disabled entirely in the CPU. But it doesn’t explain why some systems ship with NX off by default and others don’t. There is (was) also a way to completely disable NX in the OS so the BIOS option is fairly redundant.

    As for driver compatibility, it sounds like PAE (a pre-requisite for NX) was more of a problem than NX itself. Yet there is no way to disable PAE in the CPU/BIOS and no one seems to miss it.

  19. Yuhong Bao says:

    There seems to be a bug in 32-bit Win10’s code. In 32-bit Win8.1 nt!KiTryForceEnableNx had the RDMSR/WRMSR calls to change the MSR. In 32-bit Win10 1607, the actual calls was removed but the code that sets nt!KiNxForceEnable for example still remains.

  20. random lurker says:

    “VT-x can be disabled and locked down because leaving it on is a potential security hazard.”

    Leaving it off is a much bigger potential security hazard IMO. My desktop security depends on virtualizing internet browsing etc, but my desktop BIOS defaults it to off (ASUS P8H67-M). Fortunately it’s only an issue after the BIOS settings get cleared for some reason or other, but it is annoying when it happens.

    And yes, I do realize that malware in the hypervisor is a possibility but frankly you’re just as screwed if you’re infected with a rootkit and only have access to ordinary detection tools. Disabling virtualization by default is a stupid solution to a problem that doesn’t really exist.

    Besides, we have real world examples of malware in the SMM (ring -2) but none really in the hypervisor (ring -1) besides some proof-of-concept type of things. But you don’t see vendors disabling SMM by default, do you?

  21. Pingback: Lazy Reading for 2017/04/09 – DragonFly BSD Digest

  22. ender says:

    @Yuhong Bao: setting nx to AlwaysOn in 7’s bcdedit seemed to work well enough for 10’s installer to let the upgrade continue – without that it complained about missing DEP support.

  23. Chris M. says:

    Here is another weird NX bit situation. Some folks have attempted to install Windows 8/10 on industrial motherboards that come with the i865 chipset, but with Socket 775 (ASRock still sells these with Core2 support, but the industrial boards have ISA slots). The board officially supports NX bit equipped Pentium D CPUs, but the NX bit doesn’t seem to ever want to enable on these weirdo boards. The BIOS completely lacks any NX bit enable/disable option.

  24. Yuhong Bao says:

    Fun read from http://download.intel.com/design/mobile/datashts/31690801.pdf :
    “Intel will validate this [Execute Disable] feature only on Intel 945GU Express chipset family based platforms and recommends customers implement BIOS changes related to this feature, only on Intel 945GU Express chipset family based platforms.”

  25. Pingback: Links 11/4/2017: Black Lab Linux 8.2, Slackel 7.0 Live Openbox Beta | Techrights

  26. Yuhong Bao says:

    The call is immediately followed by a call to nt!KiIsNXSupported, which has code to always return 1 if the processor is an Intel processor that supports SSE2.

  27. Michal Necasek says:

    I don’t know what they were smoking when they wrote such code but it would certainly explain the behavior of Windows 10 on CPUs with no NX support.

  28. Yuhong Bao says:

    Both functions are called from nt!KiInitializeNxSupportDiscard BTW.

  29. Michal Necasek says:

    Ah yes. And I see that the casing isn’t terribly consistent, KiTryForceEnableNx vs. KiIsNXSupported.

    I see what you were talking about, if the NX bit isn’t found in CPUID, Windows 10 build 10240 (and presumably others) checks the SSE2 capability instead. Do you have any idea why? It really doesn’t make much sense to me.

  30. Michal Necasek says:

    Actually… the logic would almost make sense — for AMD: “The AMD Opteron processors report the same standard features as the AMD Athlon XP processor, with the following additions [CLFLUSH, SSE2]”. From April 2003 “AMD Processor Recognition Application Note”. Only it doesn’t really make sense because Opterons will report NX in the extended CPUID leaves, and of course for Intel it doesn’t make sense because SSE2 does not imply NX, and never did. Nice mess there. I think KiIsNXSupported is just plain wrong, if there’s no NX bit in CPUID leaf 80000001h then there’s no NX, period.

  31. Yuhong Bao says:

    Actually, I just realized that KiGetCpuVendor returns 1 for an Intel processor, and KiIsNXSupported tests for a value of 2. This would make much more sense as pretty much all AMD CPUs that support SSE2 support NX (even Semprons). KiTryForceEnableNx tests for a value of 1 of course.

  32. Michal Necasek says:

    The question is, which CPUs support NX but do not report it in CPUID leaf 80000001h? AMD has had the extended leaves since well beyond NX showed up. Did Intel release any CPUs that had NX but not the extended CPUID leaves? It would have to be something 32-bit. Then again the SSE2 check makes sense only for AMD, not for Intel.

    Any 64-bit CPU has to support the extended CPUID leaves (which is why KiIsNXSupported is very simple in 64-bit Windows), so that leaves 32-bit only CPUs with NX. Which both Intel and AMD made.

  33. Yuhong Bao says:

    “Intel had to support NX in all AMD64-compatible CPUs”
    Almost all. I believe the first D0 stepping Noconas don’t support it.

  34. @random lurker: Hear, hear! Why some manufacturers don’t even include a BIOS setting to turn on hardware-assisted virtualisation (I’m looking at you, Acer)…

  35. Yuhong Bao says:

    Fun trivia: I just found out about nxquery.sys, which is a very small driver that only reads the 0x1A0 MSR. It is shipped with the updates that includes CompatTelRunner and Appraiser.

  36. Yuhong Bao says:

    It is called by Appraiser!Windows::Compat::Appraiser::SystemInventory::InstallAndRunNxDriver and of course targeted at Windows 7.

  37. Yuhong Bao says:

    It would be nice if this support was added to the Windows 7 ntoskrnl instead, but remember that CompatTelRunner and Appraiser was originally shipped as separate updates.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.