A recent post explored the motivation (i.e. backwards compatibility) to implement the A20 gate in the IBM PC/AT. To recap, the problem IBM solved was the fact that 1MB address wrap-around was an inherent feature of the Intel 8086/8088 CPU, but not the 80286 and later models—but a number of commercial software packages intentionally or unintentionally relied in the wrap-around.
Interestingly, it is obvious that the address wrap-around was much better known and understood in 1981 than it was in the 1990s. For example in 1994, the usually very well informed Frank van Gilluwe wrote in Undocumented PC (page 269): A quirk with the 8088 addressing scheme allowed a program to access the lowest 64KB area using any segment:offset pair that exceeded the 1MB limit. […] Although there is no reason for software to ever use this quirk, bugs in a few very old programs used segment:offset pairs that wrap the 1MB boundary. Since these programs seemed to work correctly, no actions were taken to correct the defects.
Yet it is known that Tim Paterson quite intentionally used the wrap-around to implement CALL 5 CP/M compatibility in QDOS around 1980, and Microsoft Pascal intentionally used it in 1981. In both cases there were arguably very good reasons for using the wrap-around.
Intentional or not, software relying on 8086 address wrap-around was out there and important enough that by the end of 1983, IBM had implemented the A20 gate in the upcoming PC/AT. But did they have to do that?
Let’s explore several what-if scenarios which would have preserved a high degree of compatibility with existing software without requiring the A20 gate.
The CALL 5 interface was a bit of a tough nut to crack because it was a clever (too clever?) hack to begin with. One option would have been to place an INT3 instruction at offset 5 (the one documented single-byte interrupt instruction), or with slight restrictions, use a two-byte interrupt instruction. That would have avoided the need for wrap-around. It might not have been nice but it would have been manageable.
The Pascal run-time was a much nastier problem. Existing variants of the start-up code might have been detected and patched, but it was reasonable to assume that modified variants were out there. It was also reasonable to assume that Microsoft Pascal was not the only piece of code guilty of such shenanigans. The bulletproof solution would have been simple but probably unpalatable—force all or some applications to load above 64K. Any remaining free memory below 64K might still have been available for dynamic allocation. If this option was considered at all, it was likely thought too steep a price to pay for the sins of the past (that is, the 1981-1983 era).
A serious and quite possibly fatal shortcoming of software workarounds was that they required modified software. In an era where bootable floppies were the norm, and even third-party software was sometimes delivered on bootable floppies, a solution which did nothing for users’ existing bootable diskettes was probably considered a non-solution.
A Solution and a Problem
The A20 gate was easy to implement in the PC/AT because there were no caches to contend with. Simply forcing the output of CPU’s address pin A20 to logical zero was all it took to regain the address wrap-around behavior of the 8086. The switch was hooked up to the keyboard controller which already had a few conveniently available output pins.
The implementation was clearly an afterthought; in the PC/AT days, DOS didn’t know or care about the A20 line, and neither did DOS applications. DOS extenders weren’t on the horizon yet. There was no BIOS interface to control the A20 gate, but IBM probably didn’t think there needed to be one—the INT 15h/87h interface to copy to/from extended memory took care of the A20 gate, and the INT 15h/89h interface to switch to protected mode also made sure the A20 gate was enabled. Everyone else was expected to run with the A20 gate disabled.
OEMs like HP or AT&T similarly didn’t think that A20 gate control was something important and devised their own schemes of controlling it, incompatible with IBM’s. There was no agreement across implementations on what the A20 gate even does—some really masked the A20 address line (PC/AT), some only mapped the first megabyte to the second and left the rest of the address space alone (Compaq 386 systems), and others implemented yet other variations on the theme. The A20 gate effects are likewise inconsistent across implementations when paging and/or protected mode is enabled.
The real trouble started around 1987-1988, with two completely unrelated developments. One was the advent of DOS extenders (from Phar Lap, Rational, and others), and the other was Microsoft’s XMS/HIMEM.SYS and its use of the HMA (High Memory Area, the first 64K above the 1M line). In both cases, there was a requirement to turn the A20 gate on in order to access memory beyond 1MB, and turn it off again to preserve compatibility with existing software relying on wrap-around (which, thanks to EXEPACK, only proliferated since the PC/AT was introduced).
The big problem which implementors of DOS extenders and HIMEM.SYS faced was that there was no BIOS interface to control the A20 gate alone, and no uniform hardware interface either. A lesser problem (but still a problem) was that switching the A20 gate on or off through the keyboard controller (KBC) wasn’t very fast, so even on systems which did provide a compatible KBC interface, any faster alternative was well worth using.
The solution was to do it the hard way. Some versions of HIMEM.SYS for instance provided no fewer than a dozen A20 handlers for various systems, and provided an interface to install a custom OEM handler. Around 1990, OEMs realized that this wasn’t workable and only hurt them, and stopped inventing new A20 control schemes. Effectively all systems provided PC/AT-style A20 gate control through the KBC, and typically also PS/2-style control through I/O port 92h (much faster than the KBC).
There were complications for CPU vendors, too. Intel was forced to add the A20M# input pin to the i486—the CPU needed to know what the current state of the A20 gate was, so that it could properly handle look-ups in the internal L1 cache. This mechanism had to be implemented in newer CPUs as well.
Cyrix faced even greater difficulties with the 486DLC/SLC processors designed as 386 upgrades. These processors had 1K internal cache (8K in case of the Texas Instruments-made chips) and did also implement the A20M# pin, but needed to also work in old 386 boards which provided no such pin. That only left unpleasant options available, such as not caching the first 64K of the first and second megabyte.
The life of system software was also quite complicated by the A20 gate. For example memory managers like EMM386 needed to run the system with the A20 gate enabled (otherwise on a 2MB system, no extended memory might be available!) but emulate the behavior software expected. EMM386 needed to trap and carefully track A20 gate manipulation through the KBC, as well as through port 92h. When the state of the A20 gate changed, EMM386 had to update the page tables—either to create a mapping for the first 64K also at linear address 100000h (A20 gate disabled), or remove it again (A20 gate enabled).
Hindsight Is 20/20
From today’s vantage point, it is obvious that IBM should have just sucked it up back in ’84. Leave the the A20 address line in the PC/AT alone, and force software to adapt. That would have saved so much time, effort, and money to software developers and users over the subsequent decades. Complexity is expensive, that is an unavoidable fact of life.
It’s just as obvious that in 1984, the equation was different, and adding the A20 gate to the PC/AT was considered the lesser evil (relative to breaking existing software). Predicting the future is a tricky business, and back then, DOS extenders or the HMA probably weren’t even a gleam in someone’s eye. IBM likely assumed that DOS would be gone in a few years, replaced by protected-mode software which has no need to mess with the A20 gate (such as IBM’s own XENIX, released in late 1984).
By the time the A20 gate started causing trouble around 1987, reliance on the address wrap-around was much more entrenched than it had been in 1984, not least thanks to EXEPACK. At that point, the only practical option was to press on. There was no longer a company which could have said “enough with this nonsense”; hardware had to support existing software, and software had to support existing hardware.
Over time, the A20 gate turned into a security liability and modern CPUs deliberately ignore the A20M# signal in various contexts (SMM, hardware virtualization).
Many years and many millions of dollars later, here we are. DOS has been sufficiently eradicated that some recent Intel processors no longer provide the A20M# pin functionality. Some.
Even the folks at Intel clearly don’t understand what it’s for, and thus a recent Intel SDM contains amusing howlers like a claim that “the A20M# pin is typically provided for compatibility with the Intel 286 processor” (page 8-32, vol. 3A, edition 065). To be fair, other sections in the same SDM correctly state that the A20M# pin provides 8086 compatibility.
It is likely that over the next few years, the A20M# functionality will be removed from CPUs. Since legacy operating systems can no longer be booted on modern machines, it is no longer necessary. In an emulator/hypervisor, the A20 gate functionality can be reasonably easily implemented without hardware support (manipulating page tables, just like EMM386 did decades ago). Goodbye, horrors of the past.