An old idea: x86 hardware virtualization

Posted on May 13, 2012 by Michal Necasek

It is well known that virtualization of the x86 architecture is an old idea. The Intel 386 processor (1985) introduced the “Virtual 8086” (V86) mode, enabling users to run real-mode operating systems as a task within a 32-bit protected-mode operating systems.

A more complete virtualization of the x86 architecture which includes 16-bit and 32-bit protected mode is likewise relatively old. One of the better known products which provided full x86 virtualization on x86 systems, VMware Workstation, dates back to 1999. Emulation on other architectures is even older, such as Virtual PC for the PowerPC Mac (1997).

On x86 architecture hosts, virtualization had to contend with numerous “holes” in the 32-bit x86 instruction set which made virtualization difficult and/or slow. Some of the more common issues are the POPF instruction which may quietly fail to update the interrupt flag, or the SMSW instruction which lets the guest operating system see the true state of control bits without allowing the hypervisor to trap it. To overcome these and other issues, Intel designed the VT-x (also known as VMX) extension to the x86 architecture, and AMD developed its own AMD-V hardware virtualization support. Specifications for VT-x and AMD-V were only published in 2005 and 2006, respectively; it took several more years for x86/x64 CPUs with hardware virtualization support to become mainstream. Yet the idea of complete x86 hardware virtualization is much, much older—by more than 15 years!

The advantages of virtualization were obvious to anyone familiar with IBM’s VM/370 system. Older operating systems and applications could be kept running on upgraded hardware, while new operating systems could be incrementally tested in a production environment. Best of all, multiple operating systems could run on the a host system at the same time.

With the 386, Intel finally had a processor architecture comparable to mainframe CPUs. Unfortunately hardware support for virtualization was restricted to 8086 systems, as noted above. Virtualizing a 32-bit protected-mode system was not practical.

One of the nastier issues was caused by segmentation and GDT/LDT (Global/Local Descriptor Table) usage. A hypervisor could not let a guest operating system manage its descriptor tables (because the guest could overwrite the hypervisor’s memory), yet instructions to store descriptor register values could not be trapped. A guest OS could therefore read true descriptor register values, quite possibly not what it had written.

These issues could be avoided by code scanning and patching, but at the cost of high complexity and a significant performance loss. The overhead would likely have made virtualization unattractive.

386 Hardware Virtualization

A solution to this problem was proposed in the May/June 1988 issue of Programmer’s Journal on page 46, in an article by Kevin Smith called simply “Virtualizing the 386”. Yes, that’s 1988—before the 486, before Windows 3.0, before even DOS 4.0.

The solution was in many ways similar to what Intel implemented in VT-x nearly 20 years later. At the same time it’s also much simpler, primarily because the 386 was a far simpler CPU than the Pentium 4 class processors which first supported VT-x.

Smith suggested “protected normal” and “protected VM” processor modes, much like the root and non-root VMX operation in VT-x. Rather than creating completely new data structures, Smith’s design simply extended the existing TSSs (Task State Segments) to store additional information.

A hypervisor would then be a “super-task” which could switch to the guest context through a far jump to a task gate (pointing to the guest’s TSS), a mechanism which had been introduced in the 286. Certain events would then cause the guest OS to switch back to the hypervisor task, again via a TSS switch. That would include control register accesses, external interrupts, or execution of the HLT instruction.

Paging would be handled strictly on the hypervisor side through “shadow paging”, a technique commonly used by modern hypervisors in the absence of nested paging in hardware.

Many of Smith’s suggestions are eerily reminiscent of VT-x and AMD-V, such as storing the length of the current (faulting) instruction when switching to the hypervisor, or optional interception of software interrupts.

One additional feature which Intel and AMD did not implement is a TRA (Translate Real Address) instruction which would translate a virtual address to physical using a page table directory potentially different from the current one.

What If…

It is difficult to guess what the computing landscape might look like today if Intel had implemented hardware virtualization back in 1989-1990. It might be very significantly different, or perhaps not.

Hardware virtualization would probably have had a significant impact on the OS wars of the 1990s, and one has to wonder if Microsoft would have been as successful in an environment where Windows could be easily virtualized and used alongside a different OS without the need to dual boot.

It is quite possible that hardware virtualization support would have been very beneficial to Intel. It is conceivable that Intel could have become the “owner” of the PC industry; Microsoft might have still become incredibly successful with Windows, but Intel could have replaced IBM as the steward of the PC hardware platform.

For reasons that may be lost to history, Intel did not implement hardware virtualization until much, much later. In the early 2000s, processors were fast enough and code analysis and dynamic translation technologies were advanced enough that virtualization of x86 on x86 became practical. Once the business importance of virtualization was blindingly obvious, Intel implemented a hardware virtualization technology which was in principle extremely similar to what Kevin Smith had suggested back in 1988.

This entry was posted in 386, Intel, Virtualization. Bookmark the permalink.

14 Responses to An old idea: x86 hardware virtualization

rauli says:

May 14, 2012 at 2:49 pm

In 1982 Motorola’s 68010 (pin to pin compatible with 68000) added virtualization capabilities to the 68xxx family, essentially redefining one instruction to make it privileged (it was unprivileged in the 68000).
Probably Intel could have done something similar with the 386.
michaln says:

May 14, 2012 at 8:53 pm

Read the 1988 article to see what Intel would have needed to do with the 386. It was a lot more than just a single instruction. The entire four-ring privilege architecture was unsuitable to direct virtualization, so effectively a whole new “ring” had to be added (although it wasn’t called that way, and isn’t either with VT-x).
Yuhong Bao says:

May 15, 2012 at 7:53 am

Yea, most of the flaws goes back to the 80286, which the 80386 had to be fully compatible with.
michaln says:

May 15, 2012 at 12:05 pm

“Had to”? No one was holding a gun to Intel’s head saying “make the 386 100% compatible with the 286 or else”. It was Intel’s choice to make it so.

Sure, Intel had reasons to make the 386 upwardly compatible, but saying that it “had to” be compatible is nonsense.
Richard Wells says:

May 16, 2012 at 8:17 am

Intel learned the VAX lesson: getting sales is easy if the new chip can be used as a faster version of the old.

But let’s say Intel tried to create a hypervisor friendly variant of the 386. One would need to add the memory space of the hypervisor plus the memory of each guest system. Assuming the hypervisor available was as good as the hypervisors from 2005 when the hardware virtualization got added by Intel, the guest would run at about half the speed of running on native hardware. So multiple times as much memory (at $100 per megabyte) for a resultant $1,000 CPU that is slower than the standard affordable 80286. If it was also incompatible with a lot of existing Intel code, the net result would be a repeat of the market success of the IAPX-432.

The landscape would be different: Intel would no longer be in business.
michaln says:

May 16, 2012 at 12:03 pm

The “half speed” thing is nonsense, that’s just not how virtualization works. Some operations have significant overhead (much more than 50% hit), others have much less (close to zero). As for the pricing, VT-x may be a good example. Were the CPUs with VT-x more expensive? Not really.

The argument about being slower than a 286 does not hold water. By the same token, Intel should not have implemented paging because it significantly slowed things down. Except… paging was optional. So is hardware virtualization. It only slows down those who choose to use it.

As for hypervisors, look at IGC’s VM/386. That’s an actual hypervisor from the late 1980s, of course only supporting real mode guests.

So let’s posit a different future. With an enterprise feature like virtualization, Intel could have made life much harder for RISC CPUs. Instead of going out of business, Intel would have made much bigger inroads into the enterprise server market much sooner.

All pure speculation of course 🙂
Yuhong Bao says:

May 16, 2012 at 6:38 pm

“As for the pricing, VT-x may be a good example. Were the CPUs with VT-x more expensive? Not really.”
Actually, Intel did disable VT-x in their lower-end CPUs like the Core 2 Duo E7000 and Q8000, but that ended in 2009.
michaln says:

May 16, 2012 at 6:44 pm

Yes, but that has a lot more to do with Intel’s pricing and marketing strategies than the technology. I don’t think anyone believes that today’s Intel CPU prices have anything to do with how difficult the chips are to manufacture.
Lochkartenstanzer says:

May 16, 2012 at 9:02 pm

iirc, x86-PC-Emulation was also done on Motorola 68000 at the end of the 80s. There was a “DOS”-Emulator on Amiga, Atari ST and Macs with 68k-processors, which could run native x86-operating Systems (MS-/PC-/DR-DOS and CP/M86) . But i can’t remember it’s name.

lks
michaln says:

May 16, 2012 at 9:23 pm

Emulating the 8086 (as opposed to 286/386) is relatively simple and has been done many times. One of the bigger companies which specialized in x86 emulation was Insignia Solutions. Their SoftPC was available on many UNIX workstation and Insigina’s code even made it into Microsoft’s NTVDM on non-x86 platforms.
Richard Wells says:

May 17, 2012 at 7:45 am

I went to a recent VMWare meeting where they claimed to have doubled speed over the last few years from substantially slower than real hardware to the current almost the same as the underlying hardware. It is possible VMWare mistated the case but I expect any hypothetical hypervisor from 1987 would have to go through the same learing curve to fully exploit hardware virtualization that happened in the real world.

Soft-PC did work early on but Soft-PC was quite slow. See http://support.apple.com/kb/TA38039?viewlocale=en_US for performance. Basically, Soft-PC turned a Mac II with its 16 MHz CPU into something equivalent to an IBM XT with a very good hard disk. The only Norton SI value I could find for a 386 rated 25 MHz as SI of 40 which suggests a 16 MHz 386 would score a 25 or about 20 times better than an XT.

Lastly, how many transistors are needed to support virtualization? Will it double the size of the 80386 die, quadruple it (pushing it into 80486 territory), or go even bigger to sizes that could not be made for another decade? A million extra transistors means nothing now but it is a lot compared to the 275,000 of the actual 80386.
michaln says:

May 17, 2012 at 6:32 pm

I can’t comment on VMware’s figures because I don’t know what exactly they were talking about. I don’t believe for a second that there was some kind of across the board 100% speed improvement, simply because VMware was never that slow. If they’re talking about using hardware virtualization and using what’s now in Intel CPUs that wasn’t there a few years ago (e.g. nested paging) then sure, there are very noticeable improvements.

SoftPC is really a very different case because it’s emulating a completely different architecture. That will always be horribly slow unless some kind of dynamic recompilation is used. To get some sense of what the speed differential might have been, the best example is V86 mode support in the 386. Yes, it was slower in real mode, both because paging slowed things down in general and because a VMM had some non-zero overhead. People were quite happy to get the features of EMM386 at the cost of a loss in performance, in part because even with the overhead, the 386 was faster than a 286. It would have been a similar situation if the 486 had hardware virtualization, it would probably still have been generally faster than a 386.

As for the transistor counts, again look at the V86 mode that was already in the 386. I don’t know how many transistors it took, but V86 mode did the same thing on a smaller scale. Allow I/O accesses to be trapped, cause page faults here and there, make certain instructions privileged. It was obviously not that hard to do.

Note that comparing a 386 with ~2005 CPUs is not entirely fair because in 20 years, an awful lot of complexity (and crud) had been added to the platform. For example, 3 paging modes instead of one complicate things quite a bit, so does 64-bit mode, SMP, power management, APICs, etc.
Jason Stevens says:

May 18, 2012 at 6:19 pm

The amiga pc emulator was ‘transformer’ .. and it was slow.. but then again what would you expect on a 7Mhz 68000? ..

Back even in the late 1990’s early 2000’s most people looked at PC on PC emulation/hypervisors as being silly, but CPUs got faster and faster, and IMHO with the P3 it was finally fast enough for real world usage.

There were dreams of a gran hypervisor back in the day, remember taligent/pink was going to run everything? And that 610 PowerPC that could not only run x86 code, but fit in a x86 socket?

I guess the joke being that if there was a well documented and agreed way of doing things we could have been there, much earlier, but you know how vendors are… Heck even remember early Xen had a special version of XP adapted for it to show it was possible..? Needless to say that never made the public light of day, and it was the work of the Qemu people that brought larger system emulation to all….

but of course, before qemu there was bochs…
michaln says:

May 19, 2012 at 12:13 am

The Amiga probably could have done an okay job if dynamic recompilation had been used, but that technology just wasn’t there yet (as far as I know).

The biggest problem with x86 on x86 virtualization was that the architecture just wasn’t conducive to virtualization, and without hardware support there was huge overhead. It got worse over the years, too, with new unprivileged instructions that “leaked” privileged information (CPUID) or messy instructions like SYSENTER/SYSEXIT or MONITOR/MWAIT.

I don’t think Taligent (or NT for that matter) was ever meant to be a hypervisor. It was about implementing multiple APIs or “personalities”. IBM utterly failed in that goal with Workplace OS, while Microsoft technically succeeded with NT but did not pursue the strategy very far for business reason.

As for qemu and bochs, they’re really not virtualizers, they’re emulators. Different story. Qemu does a pretty good job, but it just can’t hold a candle to hardware assisted virtualization when it comes to performance. Qemu may have brought system emulation to people, but it was Connectix and VMware who had a serious impact on the industry.