Generic x86 multiprocessing, Summer 1994 edition. There’s not much to say:
Yep, that’s NT 3.1 running in a VM, and it sees two processors. Remember, you saw it here first!
And later, after bootup:
Sadly, more than two CPUs aren’t recognized by this regular NT 3.x setup, although Advanced Server ought to support at least four (not yet tested).
It took some lucky Googling to find the MPS 1.1 HAL for Windows NT 3.1 in a very unexpected place (sometimes, the Internet giveth… even if other times it taketh away), as well as hacking up VirtualBox until the annoyingly picky MPS 1.1 HAL was happy with what it saw.
It’s not clear whether the HAL was provided by Intel or Microsoft. It was probably Microsoft with significant help from Intel. It’s from mid-June 1994, predating the release of NT 3.5 by a few months.
At about the same time, OS/2 2.11 SMP also appeared, and likewise supported MPS 1.1 in addition to vendor-specific hardware. Solaris 2.4 was another early adopter of Intel’s MPS, née PC+MP specification.
Relatively inexpensive dual-socket Pentium systems came to market at that time, aided by the fact that second generation P54C Pentiums included built-in local APIC and that Intel provided all the building blocks with the 430NX (Neptune) chipset. MPS was intended to support multiprocessing on such systems (as well as larger servers) and did the job well until it was eventually—much later—replaced by ACPI.
I had no luck with NT 3.5 and 3.51 SMP in a VM. Both crash the same way during bootup for some very non-obvious reason. It’s not clear if the crashes are directly related to SMP or some other features the MPS HAL/kernel might be supporting; likely the latter. There’s never been any trouble with NT 4.0 SMP on the other hand. For reference, NT 3.5 supports MPS 1.1 (but not 1.4) out of the box, NT 3.51 ought to support MPS 1.4 as well. NT 3.1 only supports vendor specific SMP hardware out of the box for the obvious reason that MPS was only finalized about a year after NT 3.1 came out.
If I’m sufficiently bored I might try patching the NT 3.1 HAL to work with MPS 1.4 and newer I/O APICs, but that’s not a promise.
I have a beta of NT 3.51 Workstation which recognises four processors. I can’t remember whether I tested it in VirtualBox or VMware (possibly the latter) but I had no issues using its stock MPS HAL.
And of course – great achievement and thanks for the post 🙂
I have to correct myself: That NT 3.51 beta I have is Server edition and it recognised two processors in VMware Workstation 8 using the MPS 1.4 HAL the OS ships with. The final build of NT 3.51 Workstation successfully saw four processors in the same VM. Screenshots can be provided if necessary.
I’m having trouble finding official statements on this. The NT Server 3.51 Reviewer’s Guide claims support for up to 32 processors in the table near the end. Yet on page 16 the text clearly states that only up to four processors are supported out of the box.
I can say with confidence that NT 3.51 did not recognize four processors on my old Intel board. However, that was with MPS 1.1, and I need to retest with MPS 1.4 in case that changes anything. For NT 3.1 there’s no MPS 1.4 support that I know of.
Actually wait. Server recognized two CPUs and Workstation four? That’s odd.
Yes, indeed. Here’s a screenshot of Workstation showing four CPUs ; they show up in Performance Monitor as well.
I don’t have the final build of 3.51 Server unfortunately; perhaps the beta I have was accidentally limited to only two CPUs.
Seems I cannot include direct links in comments; is there another way to do so?
You should be able to add URLs.
It’s not that I don’t believe what you saw. But I can’t find any official statement on NT 3.51, only about NT 4.0 MS pretty clearly stated (http://www-pc.uni-regensburg.de/systemsw/nt40/doc/4diffws.htm) that only two processors are supported by Workstation. It seems highly unlikely that NT 3.51 would have supported more.
3.51 Workstation also does not recognize all four CPUs on my server board, not even after I changed the BIOS to use MPS 1.4. Maybe there is some other reason for it but maybe two is the limit. Will have to try Server.
Both Server and Workstation share the same kernel, processor limits are controlled by switches in the registry. And they varied between releases based on the whims of marketing. I don’t remember if MPS ever supported more than 2 processors, but I do know we ran NT3.1 on 8 processor systems (NCR behemoths).
Inside Windows NT [4.0], 2e, pp. 36-37
“Windows NT was architecturally designed to run on up to 32 processors. The number of licensed processors is stored in the registry at HKLM\System\CurrentControlSet\Control\Session Manager\Licensed Processors. (Tampering with that data is a violation of the software license; and besides, modifying Windows NT to use more processors is more complicated than just changing this value.) The default value depends on the edition of Windows NT, as you can see in Table 2-1.
Edition Licensed Processors
Windows NT Server, Enterprise 8
Windows NT Server 4
Windows NT Workstation 2
“System manufacturers that sell Windows NT Server systems that support more than eight processors must ship their own remastered Windows NT CD-ROM with a registry set to enable a higher number of processors. They might also need to provide their own HAL.”
Where have you found the HAL? I’ve searched for it for years without any result (http://www.vogons.org/viewtopic.php?p=178763#p178763). The closest thing I’ve found was Intergraph’s HAL for their Dual Pentium on 430NX workstation, but I haven’t tried it with a VM.
As for the CPU numbers. Dale Smoker is correct, NT (at least 4.0) supports up to 32 CPUs. It’s just a matter of customizing (like an OEM could do) SETUPHIV to remove the 2 CPU limit from the Workstation. SGI did this for their Visual Workstation 540. If you install Windows from their NT media you will get NT Workstation with 4 CPUs. You could also hack the LicensedProcessors value, but you have to do it through a registry copy/restore process since these values are actively protected by Windows.
I don’t think there’s a CPU limit imposed by the MPS HAL. But by the rest of the OS sure.
What I was talking about were the actual limits on standard NT releases, not architectural limits or customized OEM editions.
Well, funny story. Try Googling for “installing the mp specification hal” (in quotes) and you might come across a file called PKUNZIP.EXE that is in fact a ZIP archive containing a few bits of an EISA configuration utility and the MPS 1.1 HAL for NT 3.1.
I think I mentioned it already but the HAL works nicely on real hardware, too.
Thanks for looking it up for me, I was too lazy to pull the book off the shelf 🙂 Now the question is, what NT version does that apply to? For example in the NT 3.1 days, there was Workstation and Advanced Server, no “Server” or “Enterprise Server”. Does it actually apply to all NT 3.x versions as well as 4.0?
VMware Workstation’s BIOS supports an MPS 1.1 option. I previously assumed it would facilitate SMP on NT 3.1+. Although the NT 3.1 HCL¹ lists 37 MP systems by twelve OEMs, I’m now guessing they all shipped with custom HALs tightly coupled to their MPS 1.1 implementations (or whatever).
I wish the HAL development kits were in the wild.
You didn’t read the article 🙂 There simply was no MPS spec (and hence no MPS hardware) in 1993. NT 3.1 supported SMP but it was all vendor-specific hardware. The MPS 1.1 spec came out in April 1994. The pre-release version (PC+MP) came out in late 1993.
A second attempt at including the link to the screenshot of NT 3.51 Workstation showing four CPUs: image-share.de/images/53df74475a71ef52ef921f84af7da20f.png.
That worked. But… is this really unmodified NT 3.51 Workstation from MSDN or a generic installation CD? My 3.51 (from generic installation CD) definitely does not go past 2 CPUs. But, funnily enough, a checked build of 3.51 from MSDN sees up 4 CPUs (not more). A total mess?
You may be on to something – what I’ve been using is an OEM copy. This screenshot shows a part of the \i386\registry.inf file from two different copies: The upper file is from my (German) NT 3.51 Workstation OEM CD, the lower file has been extracted from a (Japanese) MSDN copy. Intriguingly, “RegisteredProcessors” is not equal to 2 in both instances. The aforementioned OEM copy has been tested to see four CPUs here, the MSDN copy has not been tested yet. I’ll have a look at my other copies of NT 3.51 (MSDN, checked and non-checked) to see if there’s any pattern.
Link didn’t make it again, hopefully this works.
For NT 4.0 and highly likely at least some earlier versions, checked builds default to four processors while free builds default to two. Don’t ask me why. RegisteredProcessors is what changes the default on NT 3.x. On NT 4.0 it’s a bit more complicated.
The bottom one is standard. It ups the default two CPUs to four if the product is not Workstation. The top one is quite unusual IMO and sets the number to 32 if the product is Workstation. With the top INF, you should get up to 32 CPUs, although the MPS HAL probably won’t support as many.
There was Intel MPS 1.0 in late 1993. But processor support depended on OEM drivers (not only for Windows NT) mostly.
There’s actually no reason for a lower limit than 32 CPUs on the MPS HAL. It seems this limit was associated with 32-bitness of NT: https://books.google.pl/books?id=tyzGX00gcBoC&pg=PA65&lpg=PA65&dq=%22windows+nt%22+%2232+processors%22&source=bl&ots=zxCKQLoSyF&sig=AVj-TuiPU-GGG1tefotIOaTdn1o&hl=en&sa=X&redir_esc=y#v=onepage&q=%22windows%20nt%22%20%2232%20processors%22&f=false
You must be referring to this: Pre-release Version 1.0. Formerly called “PC+MP Specification”; 10/27/93.
I don’t think that’s any more relevant than the work-in-progress PC+MP spec.
The 32 CPU limit had nothing to do with MPS, correct. It was an architectural NT limit — it was very convenient to represent a CPU mask with a DWORD. But you’re kind of wrong because with the flat MPS architecture, you were (back then) limited to 4-bit APIC IDs, and with 0xF reserved you could at best have 15 CPUs. That was the case with all Pentium and P6 class CPUs, inherent in the serial APIC bus protocol.
Obviously not a practical limitation back then. I suspect that without NUMA, at some point adding more CPUs just slowed things down because the memory absolutely could not cope with the demand. But yeah, that’s why the Xeons had to have those big L2 caches.
Ah, yes, I though always about Windows ignoring the hardware.
As for the practical limits for UMA SMP – you could get 36 CPUs with the SGI Challenge, though the bus would obviously be the limit for workloads requiring large memory bandwidth. On the other hand I don’t know about any MPS compatible systems using more than 6 CPUs (the ALR Revolution 6×6 hack) without NUMA. Even Intel’s early 8 CPU chipset was NUMA (the Profusion).
As far as I am aware, the biggest Pentium MPS system that Intel (indirectly) offered was the Xtended Xpress aka Medusa with up to four CPUs. The Alder Pentium Pro platform also went up to four CPUs. Even the 450NX chipset supported 4-way SMP. Did Intel actually sell Profusion-based boards?
There were always vendors offering more but as far as I can quickly ascertain, before Pentium 4 Xeons there was always some custom logic required for going past 4-way SMP. Well, I don’t know if the Xtended Xpress itself counts as custom or not, it was a weird thing.
I know from experience that as late as 2008 (just before Nehalem), benchmarks published for certain systems with lots of sockets and six-core Xeons noted that the numbers were achieved with a good number of the cores being disabled. That way there was more LLC space per core and the memory bandwidth was not quite as much of a bottleneck.
BTW, this claims that Profusion was not NUMA.
“BTW, this claims that Profusion was not NUMA.”
They even sold being non-NUMA as architecture feature :-P. Also I don’t think that Profusion was even Intel’s, but a 3rd party technology that they acquired from Corollary. They would use it in the next years, almost unmodified, until they came with their own high-end SMP technology.
I wonder if there was true x86 NUMA before Opteron entry in the CPU market. Is clearly specified that all Intel chipset offers for P3 and P4 Xeons were UMA, and this also include offers from 3rd chipset makers, like ServerWorks and such. Probably them were custom architectures and designs that i don’t know, and ofc i don’t count, as them probably won’t run NT Server (NT4, W2K, W2K3, etc) anyway.
PC+MP must have been before MPS 1.0. In MP operating systems of the early 1990s only MPS1.0/1.1 was mentioned. In fact 1.1 is only relevant for Microchannel systems.
True x86 NUMA (oversimplified) was only available from Sequent with Intel processors – long before Opterons.
Instead of patching an NT 3.1 HAL to work with MPS 1.4 and newer I/O APICs, would it be possible to create a source version instead by starting with ReactOS code? If so, the before and after could be diff’d to clarify the changes. It would be interesting to study, or even use as a guide of sorts for creating 3.5x HALs.
I’m sure it would be possible, but patching is a question of a few minutes. Building from source, debugging and testing takes orders of magnitude more time… which I currently do not have.
Not being NUMA was a feature… because no mass market PC OS at the time knew what to do with it. Heck, look at Apple. They defined the problem away by getting rid of multi-socket systems.
If you look at the PDF I linked to, Profusion was designed by Compaq together with Corollary/Intel. I guess it was 3rd party but at the same time owned by Intel.
Yes, I have confused Profusion with 870 for the Itanium. It’s has been a while since I’ve analyzed these old chipsets.
Actually, IBM continued with x86 where Sequent left off, since they’ve purchased them. The first was the XA-32 aka Summit chipset debuted with the first Xeon MP (Foster) in the EXA systems. This technology evolved into the current X6 systems, but the last time I’ve checked (ex5) they’ve used it mostly to attach more memory to the system https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/IBM+Technology+Made+Simple/page/eX5+Technology , though a 8-socket configuration was also possible with the EXA http://publib.boulder.ibm.com/infocenter/systemx/documentation/index.jsp?topic=/com.ibm.sysx.7145.doc/bb1pw_t_creatingexamultinodes.html .
Actually, which operating systems exactly mention support for MPS 1.0? NT does not. OS/2 does not.
PC+MP is what turned into MPS. Ever wondered why the MPS configuration table has a “PCMP” signature?
I was actually surprised that the MPS HAL in NT 4.0 SP6 works with Core2Duo machines. I guess there is some backwards compatibility with ACPI and MPS after all. Throw in a USB stack, the VBEMP video driver, and UniATA and it works well on such modern hardware.
In 1993/94 timeframe NT or OS/2 (2.11SMP was released in late 1994) were not relevant in multiprocessing solutions. I have only seen NetWare SFT-III and MPX with real applications. Ok, theoretically also UnixWare. Of course, more typically were multiprocessor Alpha AXP machines with VMS, but also Digital was offering MP Intel machines (for instance with customized MPX).
Wow wish I’d known about this back in the day. I just remember Compaq 386 with dual proc’s running NT 3.1, but I never could get anything SMP & Pentium running. But I’ve always found NT 3.1 far more stable doing SMB stuff than the later versions of NT that ‘optimized’ the stacks and made things more prone to crashing.
I used a NT 3.1 box as a MS mail postoffice for a bunch of SQL servers, and it ran for over 5 years until I left there.
But thanks for this, add in the NT 3.51 ntloader, and you can get gigabytes of RAM and now SMP!
The NT 3.1 loader isn’t limited to 64MB, that’s just a (persistent) myth… see an upcoming blog entry.
Oh? It’s not some old ISA 386/486 based thing?
Also I don’t know why, but I tried KVM, and it’s MPS 1.4 only. Ouch. I’ll have to load it on VMware. I have NT 3.1 Advanced Server so I’m hoping for 4 processors, and 2GB of RAM. Then I can load SQL 4.21, and maybe do something silly. Otherwise I don’t have much that’d take advantage of an NT 3.1 machine that big. Even QuakeWorld is single threaded.
You could say it’s an ISA (EISA?) thing I suppose.
On closer look, MPS 1.4 is not the problem. They don’t actually check the MP table version. The problem is that the old HAL (including the one that ships with NT 3.5) does not like newer I/O APICs. It will refuse to work with the 82093AA I/O APIC and newer, but works with the I/O APIC built into 82379AB aka SIO.A. In other words, it will work with boards available in 1994, but not circa 1996 and later.
I wonder if the problem was that the newer APICs has 24 IRQ lines instead of only 16.
Forgive me for “hijacking”, but which USB drivers are you using that work with NT 4.0’s SMP kernel? The ones I’ve been using only work with the uniprocessor kernel.
I wonder when NT added support for the extended MP configuration table in MPS 1.4.
Well, I just tried VMware ESXi 5.5, I created a type 4 machine, edited the file so the hard disk was an IDE type. On ‘powering on’ the VM, I set the MPS mode to 1.1, and ran through the setup. The MPS 1.1 hal bombs out saying that the hardware isn’t 1.1 compatible.
More of a FYI…
That is expected. The (emulated) hardware actually is MPS 1.1 compatible, but the HAL assumes that any I/O APIC has exactly 16 vectors. It’s just badly coded. That is the case of the MPS 1.1 HAL add-on for NT 3.1 as well as the MPS HAL shipped with NT 3.5.
@Christian They are the IONetworks drivers. The installer complains about not being compatible with SMP, but the drivers seem to work anyway. That being said, when did Microsoft start requiring SMP compatibility to get Windows logo certification?
Could you write some words about hacking virtualbox for nt3.1 smp?
If I remember correctly, the issue was with the number of redirection vectors in the I/O APIC. The MPS 1.1 HAL basically insists on seeing early Pentium-era SMP hardware.
Thanks for the fast replay.
Ok, if you have some spare time in the feature, you could write a small article about that work. I think, there are a lot of people want to run nt 3.1 smp in virtual box.
BTW. Your blog is very interesting.