I was recently involved in investigating a problem that turns out to be a complete SNAFU which nicely illustrates the chaos that is the PC platform. It’s about the NX/XD bit. Let’s start with a bit of history.
It turns out to be useful, and in fact required for improved security, to separate code and data. On the 8086, no such separation existed. On the 80286, protected-mode segmentation clearly distinguished between code and data segments. Writing to a code segment required creating a data-segment alias. The same segmentation rules carried over to 32-bit protected mode on the 80386, but lost any meaning in flat-model operating systems (OS/2 2.0, Windows NT, 32-bit UNIX). Because the data (DS) and code (CS) segments mapped the entire memory, any data page could be written to and executed. In the age of Internet, that turned out to be a terrible idea because any buffer overflow on the stack could be immediately exploited to inject and execute code in a vulnerable process.
In 2003, AMD introduced the NX (No-eXecute) bit as part of the AMD64 architecture. When enabled in the EFER MSR, 64-bit and PAE page tables support an additional bit which makes a page non-executable. As a result, it is again possible to separate between code and data, and make data pages non-executable. That makes exploits harder (far from impossible!) because there is no longer a ready supply of writable and executable pages and most importantly, the stack is no longer executable.
Microsoft saw NX as a Good Thing and added support for it in a Windows XP SP2 (2004), calling the feature DEP, or Data Execution Prevention. This was initially an opt-in feature because certain classes of legitimate software did not bother distinguishing between data and code and crashed with DEP enabled (typically software which generated code on the fly in one manner or another).
Microsoft presumably made it known to Intel that NX was a desirable feature. Intel had to support NX in all AMD64-compatible CPUs, but also added NX support to late-model 32-bit Prescott Pentium 4 CPUs, and even the later Pentium M processors.
Intel of course couldn’t leave well enough alone and called the bit XD (eXecute Disable), and more importantly, also added a way to completely disable the NX through the IA32_MISC_ENABLE (1A0h) MSR. That’s where things started going south.
Intel didn’t quite think things through. The way to detect whether NX can be disabled is checking the NX/XD bit in CPUID. After CPU reset, if the NX bit is set, NX can be disabled. Once NX is disabled, the CPUID bit disappears. At that point, there is no longer any bulletproof way to tell if NX can be enabled again. Oops.
Now enter BIOS vendors and OEMs. Firmware for Intel systems typically has an option to turn NX off (through the IA32_MISC_ENABLE MSR). To add insult to injury, some OEM systems ship with NX disabled by default, probably because making systems more secure by default is considered a bad idea. The crowning achievement is the Insyde BIOS (used by certain laptop models by HP and others) which in some cases disables NX and provides no way to turn it on.
Enter operating system vendors. Microsoft and Linux both consider NX a valuable feature which should be enabled and used. Microsoft feels strongly enough about it that Windows 8.1 and Windows 10 requires NX, in both 64-bit and 32-bit versions.
On the one hand we have firmware vendors and OEMs who consider NX a harmful feature which should be disabled by default, if not completely (or at least used to be, before Microsoft explained the errors of their ways to them). On the other hand we have OS vendors who consider NX a highly desirable feature (Linux) or even an essential requirement (Microsoft).
So what do OS vendors do? That’s right, they override the BIOS setting! It would be one thing if Microsoft could just tell users to enable NX in their firmware, but in some cases that’s simply not an option. Microsoft probably felt that its hand was forced by the OEMs who disabled NX with no user-accessible option to enable it.
The end result is that if one disables NX on a system in the firmware, Windows 10 will boot up anyway, and any recent Linux will show that NX is enabled. It’s as if the BIOS option wasn’t there. The Linux code which does that can be seen here. On certain Intel CPU models, Linux checks if the MSR_IA32_MISC_ENABLE_XD_DISABLE bit is set in the IA32_MISC_ENABLE MSR and if so, clears it (which enables NX). Please do not try to unpack the control bit macro name and think about what “miscellaneous enable execute disable disable” could possibly signify, besides brain damage. At any rate, booting some other OS, such as DOS, it is possible to verify that the BIOS really works and really does disable NX when asked to.
Windows 10 presumably does something rather similar to Linux, with the added twist that Windows doesn’t check if NX was truly enabled and just goes ahead and tries to turn it on. That can cause crashes in VMs where NX truly cannot be turned on (for example because it’s disabled on the host CPU). Again, the OS is in a difficult position because when NX is disabled, there is no architecturally defined way of determining whether it can be enabled. Then again, Linux checks the CPUID after attempting to enable NX in order to discover the true state, so Microsoft could, too.
The big question is why Intel added a way to disable NX at all. AMD saw no need to do it because NX is disabled by default and must be explicitly enabled by an OS when turning on PAE or 64-bit paging. Was it a solution to some actual problem? If so, what problem? Was it added just in case, because there might be problems?
HP has a useful document which lists Intel Pentium 4 processors with NX/XD support, and also provides a fairly long list of applications incompatible with NX. What the document does not explain is why it’s necessary to have a BIOS option to disable NX when the OS already has sufficient means of disabling it (and when the OS default was opt-in, anyway). The document also mentions that some older HP desktops (based on i915 chipset) had NX disabled by default, while newer ones (i945 chipset) had NX on by default.
BIOS vendors clearly felt it was important enough to go through the effort of a) writing extra code to disable NX, and b) adding a new option to change the behavior. The same question applies: What problem did it solve? We know that disabling NX in the BIOS causes problems, but we don’t know what problems, if any, it solves.
That question has of course been already asked, but not answered. There are some interesting answers which are so militantly wrong that it’s almost amusing, like this one: “If the operating system does not recognize this particular exception, as will be the case for operating systems written before the processor capability existed, the operating system will break if the processor capability is turned on […]”. Well, duh—an OS which knows nothing of NX will not turn it on, and it will never get such unexpected exceptions! Older 32-bit OSes don’t know anything about the EFER MSR anyway, so they can’t enable NX even by accident. Newer ones know better than to write random bits into the EFER. And all 64-bit OSes must know about NX because it’s part of the AMD64 architecture.
Presumably Intel knew all that, and must have had some other reason to allow disabling NX. But what was it? Does anyone know? There are really two separate questions there, why OEMs sometimes disable NX by default (but not always!), and why Intel made it possible at all. For OEMs it adds another way to control Windows DEP, but it is not obvious why multiple independent methods (firmware and OS) are beneficial. What is obvious is only that defaults which perhaps made sense in 2005 were harmful 5 years later.
As an aside, this particular class of problem is one that Apple doesn’t have. Their firmware people talk to the OS people, and if a new OS needs firmware changes, it can actually require/force a firmware update. In the PC world that is just not practical, and users suffer for it.
Update: I verified that on a Pentium 4 540 (older LGA775 Prescott), 32-bit Windows 10 AU really just crashes (triple faults/reboots). This is a CPU which has PAE and SSE2 but not NX/XD. Windows 10 can correctly handle a processor without PAE (displays a clear error message) but not the relatively unusual combination of PAE + SSE2 but not NX. Only older Pentium 4 CPUs (Willamette, Northwood, old Prescott) fall into that category, other CPUs (Pentium III, most Pentium M) either have no SSE2 or no PAE.
The oldest CPUs capable of running 32-bit Windows 10 (but not 64-bit!) are then most likely the original Sledgehammer Opterons from 2003.