EXEPACK and the A20-Gate

In 1991, DOS 5.0 brought about what’s perhaps the most common manifestation of A20 control trouble…

Packed file is corrupt

Microsoft published a KB article about this infamous error, but its author clearly did not understand the true cause of the problem. Later, Microsoft published another KB article about the same problem, and its author still didn’t understand the real cause. Both articles correctly state that the problem occurs when the load address of a packed executable is below 64K. Both also claim that the cause is a bug in the unpacking algorithm, without asking themselves why the same problem did not always occur with old DOS 2.x/3.x systems which didn’t take up all that much space and left plenty free below 64K. But the previously mentioned AST technical bulletin quite clearly points at the real problem: “The Gate A20 signal is in the wrong state after the system is booted.” Bingo.

(Note: The KB articles and the AST bulletin all mention a “Packed File Corrupt” message. The actual message is “Packed file is corrupt”.)

The EXEPACK unpacking algorithm, which post-dates the PC/AT, needs to solve the same basic problem as the Pascal run-time—move data higher up in memory while the destination potentially overlaps the source.  Unfortunately it also employs the same solution with potentially “negative” segment registers relying on address wrap-around. Was the wrap-around in EXEPACK intentional, as it was in the MS Pascal run-time? If only we could find out…

EXEPACK Address Wrap-Around

Fortunately, we can! It just needs a bit of guesswork and a search engine. It is well known that EXEPACKed binaries have an ‘RB’ signature stored just before the entry point. It is not unreasonable to assume that ‘RB’ might be the author’s initials. After a few literature and header file searches, a possible candidate turned up: Reuben Borman, who worked in the Microsoft language group in mid- to late 1980s, specifically on the linker.

The OS/2 Museum was able to confirm that Mr. Borman was indeed the EXEPACK author. Mr. Borman also said in an e-mail conversation that he did not intentionally rely on address wrap-around when writing the unpacker stub. An analysis of the code shows that it was a fairly straightforward but not necessarily obvious side effect of how the algorithm works, always starting near the top of a 64K segment so that the longest possible string copy or store instructions could be used, and working downwards.

It should be noted that the EXEPACK unpacking algorithm does not inherently require address wrap-around. The unpacker does not always need to point segment registers circa 64K below the current starting address, it’s enough when the segment registers point 64K below the current starting address or zero, whichever is greater. The code never needs to access memory even close to the lowest memory addresses.

It’s fair to say that the address wrap-around in EXEPACK was, perhaps ironically, a victim of the successful A20 gate implementation in the PC/AT. Because the A20 gate mechanism in the PC/AT worked so well, no one even noticed that EXEPACK relied on address wrap-around until the late 1980s, when the damage had already been done.

Timeline

To show that EXEPACK played no role in the creation of the A20 gate, let’s consider the known timeline of events. The IBM PC/AT was announced on August 14, 1984. The A20 gate functionality is clearly described in the IBM PC/AT Technical Reference from March 1984. The original PC/AT BIOS is from January 1984 and includes code to enable and disable the A20 gate. Some of the BIOS source files which reference the A20 gate are dated as far back as November 1983. While there is no way to be certain that the files were really modified at that time, there’s also no reason to doubt that. Given the length of development cycles, it’s safe to assume that the A20 gate functionality existed in late 1983.

On the other hand, EXEPACK made its first public appearance in early 1985, as part of the Microsoft C 3.0 compiler (notable for being the first DOS-based C compiler developed internally by Microsoft). The MS C 3.0 package came with EXEPACK.EXE dated February 14, 1985; included was also EXEMOD.EXE dated February 2, 1985, with support for EXEPACKed binaries. The EXEPACK binary itself contains the string “C Library – (C)Copyright Microsoft Corp 1985” which indicates that it was itself built with the MS C 3.0 compiler.

Microsoft C 3.0 also came with LINK.EXE version 3.01, dated January 17, 1985. The executable contains a matching string “Version 3.01 January 17, 1985”. LINK 3.01 does not support the /EXEPACK option. In other words, MS C 3.0 only provided the stand-alone EXEPACK utility to pack executables.

There exists a slightly newer LINK.EXE version 3.02 which contains the string “Version 3.02 January 28, 1985”. That version does support the /EXEPACK option and can pack executables as part of the linking process, but was not shipped with MS C 3.0.

To summarize, the A20 gate existed in late 1983, while EXEPACK only made its public appearance in early 1985. There is no reason to suspect that EXEPACK was used internally at Microsoft for any significant period of time prior to the public release. There may have been more than a year between the decision was made to implement the A20 gate and the time when work on EXEPACK started.

EXEPACK Evolution

As mentioned above, the reliance of address wrap-around in the EXEPACK stub unpacker did get noticed at some point in the late 1980s. As a result, the unpacker was modified to prevent segment register underflow and thus never rely on address wrap-around. That modification happened sometime in the first half of 1988. Around the same time, the unpacker was also updated to preserve the initial AX register contents, which earlier versions of the EXEPACK stub destroyed.

Said update came just too late for the MS C 5.1 release (March 1988). Microsoft C 5.1 shipped with LINK.EXE version 5.01.21 (protected-mode capable) and LINK.EXE version 3.65 (real mode only). Both produce the same problematic EXEPACKed files.

Real-mode only LINK.EXE version 3.69 (September 1988) was one of the earliest specimens with improved EXEPACK stub. Protected-mode capable LINK 5.05 (1989) is an early but likely not earliest example of a fixed 5.x linker. The late 1980s were generally a busy time for Microsoft linker developers; that was necessitated by changes required to support newer OS/2 versions, as well as changes related to improved language support.

Microsoft C 6.0 shipped with LINK.EXE version 5.10 from February 1990, which likewise contains the fixed EXEPACK stub.

Note that all known versions of the stand-alone EXEPACK.EXE utility (up to and including version 4.06) produce troublesome executables relying on address wrap-around. Only the variants produced by LINK /EXEPACK were corrected.

In total there were five or six variants of the EXEPACK stub unpacker. The differences were minor, the most significant being that the final version did not rely on address wrap-around.

Who Was First?

While researching the history of EXEPACK, and interesting question cropped up. Was EXEPACK the first tool of its kind, compressing executables on disk and uncompressing them in-place when loaded? The answer to that is… definitely maybe.

By 1990, executable compression was very much a thing, with tools like DIET or PKLITE becoming popular. That wasn’t something exclusive to the PC platform, either. On the C64 for example, compression was becoming the norm. The problems were the same everywhere—limited storage capacity, low transfer speeds. Compression made perfect sense.

In 1985 the problems had been the same, but so far no evidence has turned up that EXEPACK was just another executable packer. Contemporary press reviews which mention EXEPACK make no comparisons with analogous tools, but that is at best circumstantial evidence.

Mr. Borman could not recall EXEPACK being developed in response to some third-party tool; it was made to satisfy internal requirements. As a large software vendor, Microsoft obviously stood to benefit from a tool which saved space on distribution media and made executables load faster.

EXEPACK employed a fairly basic run-length encoding, nothing that could be called an advanced compression. It could still make a difference. For example a Feb 1986 e-mail reports that “PCKERMIT.EXE ver 2.27 was 80K when linked normally but shrunk down to 33K when linked with the /E option” because EXEPACK was effective at eliminating uninitialized or zero-initialized storage from executable files.

It remains an open question which, if any, executable compressors pre-date EXEPACK.

It Could Have Been Worse

It was in some ways sheer luck that on systems without address wrap-around, EXEPACKed executables loaded below 64K usually produced the clear (if slightly misleading) error message, “Packed file is corrupt”. Users might easily have ended up with executables that were partially corrupted during unpacking, which would lead to very non-obvious and difficult to diagnose problems.

The stub unpacker does not validate checksums or anything like that. The executable is chopped up into blocks which are either copied as-is or expanded into a sequence of repeating bytes (run-length encoding). Each such block has a header byte, either B0h or B2h (with the low bit optionally set, indicating the final block). If the unpacker does not find the header byte after processing the previous block, it will produce the error message.

This is not foolproof but the probability that an EXEPACKed file would get corrupted during unpacking due to missing address wrap-around without triggering the error is quite low.

That is especially true for executables produced using the Microsoft C run-time: The first 16 bytes are zeros in order to help detect stray null pointer writes (the run-time checks on termination whether the bytes are still zero). That will produce the final EXEPACK record within the lowest 16 bytes of the packed data, greatly increasing the chance that any problems will be flagged.

DOS 5 Exepatch

DOS 5 (i.e. PC DOS 5.0/MS-DOS 5.0) added the option to load DOS high (into the first 64K above 1M, using the DOS=HIGH statement in CONFIG.SYS), which necessitated fine-grained control of the A20 gate and also exposed related problems.

Microsoft went so far as to patch several known variants of the EXEPACK stub code in memory while loading an EXE file (that is, after loading the image from disk but before running it). We will call this “exepatching”. EXEPACK was not the only target—several versions of the DOS/16M extender used by Lotus 1-2-3 R3 were also patched, as well as certain old Microsoft copy-protected executables. Microsoft was proud enough of the exepatching technique that it patented the process in U.S. Patent 5,430,878, Method for revising a program to obtain compatibility with a computer configuration.

But wait! Weren’t we just reminded that the EDIT program shipped with DOS 5 sometimes still failed with “Packed file is corrupt”? Did the exepatching not work at all? Well… it did, but the exepatching was only activated in the DOS=HIGH case. If DOS was not loaded high, then DOS 5 didn’t even bother trying to patch the EXEPACK unpacker, assuming it wasn’t necessary. But just because DOS wasn’t loaded high didn’t mean the A20 gate was disabled, and DOS 5 was just small enough that in some configurations, it might load the first program under 64K even when DOS itself was loaded low.

It may be worth mentioning that for reasons even Microsoft might have difficulty explaining, MS-DOS 5.0 shipped with a mix of older and newer EXEPACKed executables. For example in MS-DOS 5.0a (files dated 11/11/90), most executables (FDISK.EXE, NLSFUNC.EXE, and others) use a newer EXEPACK stub that does not rely on address wrap-around, but QBASIC.EXE uses an older EXEPACK stub that does. And that’s why MS-DOS 5.0 EDIT may fail with “Packed file is corrupt”.

For other, probably equally difficult to explain reasons, PC DOS 5.02 used the old EXEPACK stubs even for NLSFUNC.EXE and other DOS executables. That is to say, PC DOS 5.02 used older EXEPACK stubs than PC DOS 5.00 did, and thus increased the potential for “Packed file is corrupt” messages.

In other words, EXEPACK was loads of fun!

Update: An executable compression utility older than EXEPACK was Realia SpaceMaker, a $75 commercial utility probably released in late 1982 or early 1983, written by Robert B.K. Dewar. The algorithm used by SpaceMaker was probably quite similar to EXEPACK.

What is currently missing is a SpaceMaker executable (likely called SM.COM) actually pre-dating EXEPACK, i.e. from 1984 or earlier.

This entry was posted in Bugs, Microsoft, PC history. Bookmark the permalink.

14 Responses to EXEPACK and the A20-Gate

  1. dosfan says:

    The mix of differing EXEPACKed executables is simple: EDIT (which is really QBASIC with the /EDIT switch) was likely maintained by a different group using their own tool chain. PC DOS was built by IBM in Boca Raton and they also had their own tool chain. Neither IBM or Microsoft always updated all of their tool chains to the latest versions, in some cases code relied on quirks of older versions in order to build. For example the base code of PC DOS 7 was built with MASM 5.10 and MS C 6.00A – MASM 5.10 was required because the messy message parsing code relied heavily on macros which didn’t work with MASM 6.0 or later.

  2. Michal Necasek says:

    Yes, I think QBASIC was simply not built as part of DOS but rather provided as a binary, built by a different team with different tools. Which is not a very good reason to ship outdated code.

    The quirks related to MASM sound all too familiar. Many times I ran across code that only MASM 5.1 would compile, not MASM 6.x or anything else in MASM 5.1-compatible mode. Almost always it was code written by Microsoft.

  3. Richard Wells says:

    An early utility for the PC to do what EXEPACK was Realia SpaceMaker which got a quarter page ad in PC Magazine Jan 1983* page 417 including description of technique.

    For those not wanting to track the link, I have extracted the description from the ad
    “How it works: Uninitialized (binary zero) areas are compressed, and the relocation
    entries are eliminated. When executed, the program expands and relocates itself,
    recreating the original program.”

    Also see http://www.was-ist-fido.de/doks/fnews/fido243.txt

    https://www.pcorner.com/list/DIAGS/FT119.ZIP/FT119.DOC/ shows some commentary from 1987 on how 3 different exe packers all fail.

    * The Google Books link claims it to be the Nov 1982 issue; the issue shown has Jan 1983 on the bottom of the page. Either way, there wasn’t much time between the PC and DOS and the compression of executables.

    Similar techniques had popped up with some of the cassette loaders (not for IBM PC) which changed how the rest of the file was read. For C64, the loader was embedded in unused parts of the file name. I think there were a few others done for minicomputers back in the late 70s but those could finesse the issue by having 2 complete sets of RAM to decompress the executable without worrying about overwriting the original addresses.

  4. Michal Necasek says:

    Realia SpaceMaker, that actually rings a faint bell. Interesting that the Fidonews post is only from December 1985 (and makes no mention of EXEPACK, which had been out for months by then). From the brief description it sounds like SpaceMaker was in principle really similar to EXEPACK.

    Annoyingly, all I can find about SpaceMaker prior to 1985 is advertising. A Russian magazine from 1991 claims SpaceMaker is from 1982 (unsourced), and also that the first Norton Utilities release was processed with SpaceMaker. PC Magazine dated Jan 8, 1985 contains a “Hard disk housekeeping” article by Peter Norton. That article mentions Realia SpaceMaker.

    If the advertising is to be believed, SpaceMaker predates EXEPACK by ~2 years. Just because something was advertised doesn’t mean it really existed, but it is circumstantial evidence that SpaceMaker is older. SpaceMaker 1.06 says (c) 1983,1984,1985 Realia Inc.

    Interestingly there are hints that (older versions of?) SpaceMaker always converted executables to COM format, which meant it couldn’t handle larger EXE files.

  5. Yuhong Bao says:

    “PC DOS 5.02 used the old EXEPACK stubs even for NLSFUNC.EXE and other DOS executables. That is to say, PC DOS 5.02 used older EXEPACK stubs than PC DOS 5.00 did, and thus increased the potential for “Packed file is corrupt” messages.”
    Probably because the build was moved from MS to IBM with a new DOS team.

  6. Michal Necasek says:

    I found a legal document mentioning Realia SpaceMaker as being written “1982-1983”: https://docs.justia.com/cases/federal/district-courts/georgia/gandce/1:2008cv01425/150651/124/1.html — and the author’s name, Robert B.K. Dewar. So that’s a strong contender for the oldest PC executable compressor.

    What I cannot find is an actual SpaceMaker binary from before 1985.

  7. Michal Necasek says:

    And finally something I’d consider reliable information. The 1988 PC Tech Journal Directory, page 368, lists Realia SpaceMaker ($75), describes the utility, and says: “Copies sold: 1,500 since 1982”.

    I still find it odd that I couldn’t find one single mention of the utility prior to 1985, except for the ads. I checked some of my offline archives as well, but there’s nothing.

  8. Michal Necasek says:

    Certainly. But a new team does not immediately imply old tools.

  9. Richard Wells says:

    Realia’s major product for DOS was a COBOL compiler. The two earlier DOS products (Spacemaker and terminal emulator) seem to have projects done to get the COBOL operational and then marketed to bridge the revenue gap until the release of Realia COBOL.

    The only evidence I have that SpaceMaker did anything useful was The Dirty Dozen list which mentioned a lot of common trojans and pirated software including SpaceMaker and a version of ARC that had been run through SpaceMaker.

  10. yksoft1 says:

    FYI the leaked MS-DOS 6.0 source code pack contains a full copy of QBASIC.exe 4.5 source code, and the toolchain to build it in /45.

    They are indeed using /Exepack option in linker.

  11. bleuge says:

    Spacemaker 1.06 can be easily found in Google.

    Also, I’ve checked my dos executables db and found 4 versions of Exepack: a no version exe, 3.00, 4.00 and 4.06

    the no-version is the one in C compiler 3.0 dated 02/14/85
    3.00 is dated 08/30/85
    4.00 is dated 10/16/85

    Maybe there are other versions, but I wonder the jump in versioning from no-version to ver 3.0

  12. Michal Necasek says:

    Yes, but I’m not sure how much that says about DOS 5. QB was developed by a different group so it’s not shocking that it used outdated tools, just silly.

  13. Michal Necasek says:

    There is 4.05 as well. There is also at least one EXEPACK stub unpacker version in later (circa 1989) LINK.EXE releases which does not correspond to a stand-alone EXEPACK.EXE.

    My best guess is that the initial EXEPACK release had no version simply as an oversight. Then the version jumped to 3.0, either to match MS C 3.0 or LINK.EXE version 3.x. Why they then went to 4.0 I don’t know. It does not make a whole lot of sense, but maybe there were no obviously better versioning alternatives.

  14. Michal Necasek says:

    I was going to contact Dr. Dewar and ask about Realia SpaceMaker. Sadly, he passed away almost three years ago.

    I am yet to analyze SM 3.06 (apparently the only surviving version) but it really sounds like it was very similar to EXEPACK, so I would expect it to be similarly useful — a small improvement for most executables but a big improvement for a few.

Leave a Reply

Your email address will not be published. Required fields are marked *