Learn Something Old Every Day, Part XVI: DOS 4.0 SELECT Is Too Clever

A while ago I discovered an antique pirated copy of IBM DOS 4.00 on 5.25″ media, which was something that was missing in my archive. And by antique I mean from August 1988, when DOS 4.0 was practically brand new.

IBM DOS 4.00 installer, SELECT (1988)

There were SNATCH-IT disk images in files DOS400-1.ARC to DOS400-5.ARC (DOS400-6.ARC includes the disk from the IBM DOS 4.0 Technical Reference, not part of the OS per se). There were two problems. At first, I didn’t know what to do with SNATCH-IT images at all, and also the image of the install disk (DS40INST.CP2) was corrupted, with ARC reporting a CRC error.

Once I sorted out SNATCH-IT image decoding, I was left with the corrupted disk. The corrupted image was 60 bytes longer than it should have been, but it was actually mostly OK. A “feature” of old compressors like ARC is that they work with blocks of several kilobytes in size, and if one is lucky, one block will be corrupted but the rest won’t.

I set out to determine where exactly the SNATCH-IT image was corrupted. Armed with the knowledge of the format, I found the corruption was in one of the data blocks (large chunks, a few dozen kilobytes in size, that hold bare sector data) and not in any of the control structures. So far so good; since I have images of 720K disks of IBM DOS 4.00, I would likely have good files to recover from the damage.

Then I tried to cut out the extra 60 bytes while causing minimal damage. That was tricky because the sector data was stored interleaved in the SNATCH-IT image. In the end I localized the damage and created a semi-fixed DS40INST.CP2 which I could convert to a raw image.

In the converted image, there were two corrupted files, DISKCOPY.COM and DISPLAY.SYS. Interestingly thanks to interleaving, the last cluster/sector of DISKCOPY.COM was intact. I replaced the files in the image with copies from the 3.5″ variant of IBM DOS 4.00, but as it happens these two files were identical even in IBM DOS 4.01 anyway.

Now I had what should be a good set of disk images, with some questions as to its provenance. The images were clearly made from IBM DOS 4.00 disks (otherwise why bother with SNATCH-IT), but were those disks unmodified originals? There were no obvious signs of tampering and all timestamps were what they should have been.

Yet there were at least two oddities. The OPERATING 1 disk did not contain a hidden zero-length file called DOS01I.400, and there were no hidden zero-length files such as VENDOR-#.TO1 that are present on 5.25″ disk images of IBM DOS 4.01.

I never quite figured out the story with the VENDOR-#.xxx files and they are clearly not required. The DOS01I.400 file is clearly not required either, as it later turned out, and it may be a genuine omission. All images have “IBM 4.0” OEM string in the boot sector, which suggests they are genuine, as such disks were anything but common in August 1988.

Back to that slack space at the end of DISKCOPY.COM. It contains a curious fragment of text:

     0AB1008 \\SPIDERMAN\DRIVEA (A)
000030AB1008 \\SPIDERMAN\DRIVEA (A)
06/14/1988 08:10:43.30 RAMBO 000030AB1008 \\SPIDERMAN\720KB (720)
06/14/1988 08:11:30.54 RAMBO 000030AB1008 \\SPIDERMAN\DRIVEA (A)
06/14/1988 09:32:38.97 RAMBO 000030AB1008 \\SPIDERMAN\720KB (720)
06/14/1988 09: 

The \\SPIDERMAN\DRIVEA string looks like the name of a network share. What’s more, the timestamps (June 14th, 1988) happen to be the same as the timestamps on the actual disk, which is something that might very conceivably happen if the IBM DOS 4.00 disks were in fact mastered on June 14th, 1988. The thought that someone planted this fragment in a disk image of commodity software nearly 40 years ago seems not just extremely improbable, but outright lunatic.

Installing

Okay, so let’s install the thing. Starting the installer in a VM was no problem, but after rebooting and continuing after FDISK, the installer would just lock up every single time.

Maybe I restored the installation disk incorrectly? Maybe it was more damaged than I had thought? Well… I tried the 5.25″ install disks of IBM DOS 4.01 and they had the exact same problem!

I tried various things, re-running parts of the installer manually… and eventually a light bulb went off. The installer was asking me for the OPERATING floppy, not OPERATING 1. The 5.25″ 360K disk set has three OPERATING 1/2/3 disks, whereas the 3.5″ 720K disk set only has one OPERATING floppy. Does the installer think I’m using the 3.5″ disk set?

Thanks to Microsoft we have the source code for SELECT, the DOS 4.0 installer… and sure enough, it determines the installation media type from the drive type!

In other words, the SELECT installation files are essentially identical between the 5.25″ and 3.5″ disk sets. SELECT determines the kind of disk set it’s working with based on the drive type. The installation disks were double density only, so there was no worry about DD versus HD floppies. That logic is not crazy, because physical 5.25″ disks cannot possibly be used in a 3.5″ drive and vice versa… but virtual disks can.

Using 360K disk images in a virtual 1.44M 3.5″ drive works just fine most of the time. But clearly SELECT is one of the exceptions. Again, it is hard to call this a bug in SELECT, because users would have had to work extra hard to reproduce the same scenario on physical hardware.

Once I adjusted the VM to use a 5.25″ 1.2M drive, sure enough, the installation worked fine, and I ended up booting into the DOS Shell:

IBM DOS 4.0 Shell

Then I realized there’s yet another non-obvious thing that SELECT does. I thought the installer would ask me to insert a blank disk to create a writable “SELECT COPY” disk which is a modified copy of the installation disk. But it never did, and just modified the installation disk.

This was what I expected to see, but never did:

Creating a SELECT COPY disk

Then I realized that SELECT is kind of clever. It tries updating the installation disk (it needs to modify AUTOEXEC.BAT and write a few temporary files) and if that works, it just does that. Only if it finds that the installation disk is write protected, it asks the user for another disk to write to.

I can imagine that this would have made testing the DOS 4.0 installation less of a hassle, because skipping the SELECT COPY disk creation avoids a bunch of disk swaps. But it is an unexpected (if logical) difference in behavior depending on whether the installation disk is writable or not. As they say, learn something new old every day…

This entry was posted in Archiving, DOS, IBM, PC history. Bookmark the permalink.

17 Responses to Learn Something Old Every Day, Part XVI: DOS 4.0 SELECT Is Too Clever

  1. Josh Rodd says:

    What would the target marker for PC DOS 4.00 on 5¼” have been? I’m guessing the PC/AT?

  2. Michal Necasek says:

    Actually DOS 4.0 supported just about every PC except (I think) the PCjr. So PC/AT, PC/XT, the XT 286, and really most everything before the PS/2 line.

  3. MiaM says:

    Interesting. The disk type thing might had actually had caused minor problems for anyone only having 360k install disks and wanting to install it on a computer that only had a 1.44M drive, and would had just copied the 360k disks to the first half of 720k disks.

    On the other hand I can’t remember anyone using the installer for DOS before say 5.0 or even DOS 6. 🙂

  4. Josh Rodd says:

    DOS 4.00 did work on a PCjr although it wasn’t officially supported. I can’t think of anything more painful than DOS 4 on a 128kB system.

    DOS 4.00 was one of the most useless releases I can think of – we were stuck with it on our Model 25 since that’s what came with it and we didn’t have a copy of 3.30 back then (which chewed up a lot less RAM). We eventually got a 40MB hard disk so at least we had some reason to use 4.00. And it really should have been versioned as 3.40.

    SELECT was an odd beast which would install COUNTRY and friends even if you just selected US and U.S. keyboard, gobbling up even more RAM, although you could just use SYS and copy the diskettes to \DOS yourself. The DOS Shell seemed like another thing that basically nobody used. I think everyone breathed a sigh of relief when 5.00 came out – even 8086 users.

    One of the more fascinating aspects of 4.00 was all the machinations it did to make sure Windows 1.x and 2.x would still work. I really don’t get why Microsoft didn’t just issue updates to the relevant Windows versions instead.

  5. Trap says:

    In your article: “One of the more fascinating aspects of 4.00 was all the machinations it did to make sure Windows 1.x and 2.x would still work. I really don’t get why Microsoft didn’t just issue updates to the relevant Windows versions instead.”

    Are these machinations documented anywhere? What’s the source of this paragraph? I’m curious

  6. Michal Necasek says:

    At a guess – there were runtime versions of Windows 1.x/2.x shipped with all kinds of products, and I don’t know how feasible it was to update those. That may well have been the reason.

    DOS 4.0 was far from useless, and it introduced a lot of stuff (like partitions > 32MB!) that no one even for a second thought of throwing out in DOS 5.0. Several new INT 21h and 2Fh APIs were also kept. As well as the “message server” in COMMAND.COM. And handy things like the MEM utility.

    You’re right that the default config that SELECT created was pretty memory hungry and it was not necessary, especially for US users.

    Re DOS Shell… yeah it seemed useless to me, yet Microsoft put a lot of work into writing their own Shell for DOS 5.0. Why bother if no one used it?

  7. Malcolm says:

    Re “are these machinations documented anywhere” – I don’t know what Josh had in mind, the ones I know about are that it didn’t have the SETVER.EXE TSR yet, so it hardcodes a series of version lies including lies for Windows. For some reason it also included a count so that it would lie N times before telling the truth. SETVER always needs to know the program name, but implementing the N lie also requires tracking when the program was launched and exits to reset the count. The only program in the hardcoded list that used the N count lie was WIN200.BIN.

    See https://github.com/microsoft/MS-DOS/blob/2d04cacc5322951f187bb17e017c12920ac8ebe2/v4.0/src/DOS/MSINIT.ASM#L639C1-L667C10 for initial pointers.

  8. John Elliott says:

    A quick search online doesn’t turn up any details of the .CP2 disk image file format, so I’d be interested to know whatever you’ve managed to glean while deciphering them.

  9. Josh Rodd says:

    The Windows machinations are documented various places. Basically, Windows 1.x/2.x make assumptions about DOS internal structures that are no longer true with DOS 4.00, so DOS 4.00 did various tricks to make internal structures appear in a pre-4.00 compatible way; see, for example, https://github.com/microsoft/MS-DOS/blob/main/v4.0/src/INC/SF.INC

    DOS 5.00 took it a step farther: it would simply detect if a caller made assumptions about DOS internals and hot-patch the executable in memory with corrected code (with Windows 1.x/2.x/3.0 being the most obvious candidates for this), which I believe is called “exe-patching”. This was a more stable approach.

    And yeah, I think one reason Microsoft didn’t ship a replacement Win 1.x/2.x binary is because it tended to have that weird linked-by-the-installer WIN200.BIN etc. file that would be different for every combination of device drivers. Taking a look at something using “embedded” Windows, its .BIN isn’t anything close to a “standard” WIN200.BIN.I guess I should say I’ve really never understood why 1.x/2.x worked that way (and they stopped doing it in 3.0). Faster start times?

    It is hard to see functional improvement from MS-DOS 3.31 (or perhaps COMPAQ) to DOS 4.00, as the only actually useful thing in DOS 4.00 was >32MB support.

  10. MiaM says:

    Re dos shell:
    I would think that there were a disconnect between actual users and Microsoft, partially due to most customers got DOS as part of buying a new computer.

    Re backwards compatibility: If DOS 4 didn’t include backwards compatibility, update disks would somehow had to been distributed, and a readme with DOS 4 would had to tell the user how to obtain these. Sure, in North America a skilled user would likely had downloaded updates from Compuserve or a Microsoft BBS or whatnot. But that wasn’t a thing worldwide. Sure, there were modems and BBSes in many parts of the world, but it wasn’t that common and I bet that a customer would had been angry if they would had had to spend ages on long distance modem calls to download updates after having spent lots of money on buying DOS 4. In other words, the patches would more or less had to be supplied with DOS 4, and would likely had resulted in an additional disk, I.E. a higher manufacturing cost without any real gain.

    (Arguably they could had included a list of program names not to use in order to not trigger the backwards compatibility thing for new programs. Not like many would call their executables WIN200.BIN, but it’s not unlikely that someone making some sort of gaming thing would chain load a WIN.EXE or WIN.COM to show a winner splash screen, handle high scores and whatnot. Or for that sake WIN might be a meaningful abbreviation of something completely different in some language.

  11. Richard Wells says:

    PC-DOS 4 got at least 3 CSDs and those were necessary. Another update that improved compatibility wouldn’t have changed things much. IBM did make it easy to get them.

    Microsoft had been pushing MS-DOS Manager with DOS 3 for some clones for the ease of file management compared to the command line. IBM made DOS Shell out of ideas from Manager plus IBM’s own Fixed Disk Organizer menu launcher. I don’t know if any code was shared from the older software. Those ideas were rather similar to the new OS/2 interfaces undergoing testing.

  12. Michal Necasek says:

    My starting point was this: https://github.com/retrohun/pce/blob/master/src/drivers/psi/psi-img-cp2.c

    Where they got the information from I have no idea. I will add the above code cannot handle SNATCH-IT multi-volume images. But those would probably be best dealt with by using a separate utility to glue them back together. I am not sure if it’s actually possible to identify the first file of a multi-volume image (except by the fact that it’s incomplete), but the ASCII digit after the ‘$’ in the header is the volume number.

  13. Vlad Gnatov says:

    > DOS 5.00 took it a step farther: it would simply detect if a caller made assumptions about DOS internals and hot-patch the executable in memory with corrected code (with Windows 1.x/2.x/3.0 being the most obvious candidates for this), which I believe is called “exe-patching”. This was a more stable approach.
    Interesting, the only patches I remember is at dos exec for exepack (a20 mitigation) and for rational extender (preserve 386 registers). Is dos 5 oak/dos 6 code accessible online somewhere (and not on some obscure members only forum) ?

    > It is hard to see functional improvement from MS-DOS 3.31 (or perhaps COMPAQ) to DOS 4.00, as the only actually useful thing in DOS 4.00 was >32MB support.

    You forgot xcreat 🙂

  14. Michal Necasek says:

    Yes, INT 21h/6Ch is not something programmers would willingly give up. It solves real problems.

    DOS 4.0 also added EMM386 functionality. IBM DOS only had support for IBM machines, but MS-DOS included EMM386. Very important piece of software.

    I don’t know if any DOS 5.0 OAK is online anywhere. The MS-DOS 6.0 source code is, and a quick search should come up with something. In any case the OAK does not include DOS kernel source code, only object files.

  15. Vlad Gnatov says:

    > I don’t know if any DOS 5.0 OAK is online anywhere. The MS-DOS 6.0 source code is, and a quick search should come up with something.

    The universal search engines are becoming more and more useless. I got nothing on google/bing, but found dos 6 source on archive.org through internal search.
    Anyway, I roughly grepped, but was unable to find code for patching executables outside of dos exec function. Josh, can you please give me a pointer? I don’t want to manually review the whole dos kernel sources.

    P.S. I have found, though, weird copy-protection scheme for com files, based on disabling a20 line for 10 dos calls after exec call. Anyone knows the details?

  16. John Elliott says:

    Thanks for the info. Looks like a file format based very much on 64k data segments!

  17. Michal Necasek says:

    Yep. SNATCH-IT is a wrapper around COPY II-PC and I suspect it’s basically dumping the latter’s internal data structures to disk, and later loads them again and feeds them to COPY II-PC to write to disk. It’s a work of art, but not easy to reverse engineer: https://trixter.oldskool.org/2015/02/05/annoying-adventures-in-disassembly-snatchit/

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.