Learn Something Old Every Day, Part XI: DOS Directory Searches are Bizarre

A while ago I started playing with EMU2, a piece of software which calls itself “A simple text-mode x86 + DOS emulator”. It is indeed relatively simple, only emulating an 8086 (or maybe 80186, with little bits of 80286 here and there), but it’s in some ways quite capable, doing a remarkably good job of running at least some text-mode DOS programs (such as the Turbo Pascal IDE).

Spurred by the release of the MS-DOS 4.0 source code, I thought I’d see if I could use EMU2 to build MS-DOS 4.0 on a non-DOS system. It did not go well.

The MS-DOS 4.0 source BAK includes almost everything needed to build, with the exception of COMMAND.COM, which is required by NMAKE to execute certain commands. OK, I thought, let’s just grab an existing MS-DOS 4.0 COMMAND.COM.

Not even a single DIR command worked right. After a bit of tinkering, I got that working, only to find that TREE.COM was hopelessly broken. And I found other programs mysteriously failing, such as Watcom wmake (wmaker.exe to be exact, since the default 386-extended version has no chance of running under EMU2).

In the process I learned a lot about how DOS directory searches work, and what kinds of seemingly crazy things DOS itself does.

Let’s start with DOS 2.x directory searches, using INT 21h/4Eh (Find First) and INT 21h/4Fh (Find Next). Find First takes an ASCIIZ string as an input, optionally including a path; the file name may include wildcards (asterisks or question marks).

The results of the search are stored in the DTA, or Disk Transfer Area, which is located at an address set by INT 21h/1Ah. The DTA will contain the name, attributes, size, timestamp, etc. of the first search result, if any.

The DTA notably starts with a 21-byte data area which is undocumented, and contains information required to continue the search. In essence, this area contains information about what the search is looking for (file name with optional wildcards, search attributes) and what the current position is in the directory that is being searched.

The structure of the information in the first 21 bytes of the DTA differs across DOS versions. It can be also used by emulated DOS environments (such as the OS/2 MVDM) to store information in a format completely different from DOS.

Case One: Watcom wmake

My first “patient” was Watcom wmake. It had trouble finding any files at all under EMU2. I quickly established that it does something that works fine on DOS and in most if not all emulated DOS environments, but not in EMU2.

Namely, the Watcom run-time library sets the DTA address (using INT 21h/1Ah), and calls Find First (INT 21h/4Eh). Then it copies the first 21 bytes of the DTA to a different memory location, sets the DTA address to point at the new location, and calls Find Next (INT 21h/4Fh).

This broke down in EMU2 because EMU2 ran the host side search during Find First, and associated the search with the DTA address. When the Find Next came, the DTA address was different, and EMU2 said, in so many words, oops, I have no idea what you want from me.

So I tweaked EMU2 to store a unique “cookie” in the DTA, and use that cookie when locating the corresponding host-side search. That was enough to make wmake work.

Case Two: Microsoft NMAKE

Then I started trying to build the MS-DOS 4.0 source code. As one might imagine, NMAKE runs quite a lot of directory searches. And EMU2 started failing because it kept running out of host-side directory searches (the maximum was defined as 64, and it seemed like supporting that many simultaneous searches should be more than enough).

This problem with searches is one that people have encountered in the past, likely many times. When DOS drives aren’t local FAT-formatted disks, searching gets tricky. Often the “server” (networked or not) needs to keep more information than fits in the 21-byte area, potentially lots more.

In the case of EMU2, the host-side search is completed in the Find First call, and results are returned on successive Find Next calls. If the directory is large, there may be quite a bit of data that needs to be kept in memory on the host side.

The problem is that although DOS has Find First and Find Next calls, it has no Find Close call. In other words, searches are started, continued, but not ended. Or not explicitly.

Solving this problem requires heuristics (especially in light of Case Three). If a DOS program issues Find First with no wildcards, there will be zero or one result. The search is effectively over by the time Find First finished, because there will be no further results. Thus the host-side data can be freed right away.

When a DOS program uses wildcards, things get a lot more complicated. Assuming that the search found multiple results, the host-side information obviously needs to be kept while DOS repeatedly calls Find Next.

The host-side state can be more or less safely discarded when Find Next reaches the end of the search results. In most cases, wildcard searches will in fact keep calling Find Next until the results are exhausted. But not always.

A well working solution appears to be a heuristic which more or less intelligently decides when searches are completed, combined with some sort of LRU approach to discard old searches.

Case Three: MS-DOS 4.0 TREE.COM

After improving the logic discarding host-side search state, I was able to build most of MS-DOS 4.0 with EMU2. This included TREE.COM. When I tried running the freshly build TREE utility, bad things happened.

Fortunately I had the TREE.COM source code, and I was able to understand what was going on. What TREE does is… interesting. To recap, TREE prints a “graphical” directory tree that looks like this:

   ...
├───DOS
├───H
├───INC
├───LIB
├───MAPPER
├───MEMM
│ ├───EMM
│ └───MEMM
├───MESSAGES
├───SELECT
└───TOOLS
└───BLD
├───INC
│ └───SYS
└───LIB

When displaying a line of text with another directory item, TREE needs to know whether it should print an I-shape or an L-shape character. An L-shape is printed for the last entry in a directory, and an I-shape is printed for all other entries.

And here’s where TREE is very tricky, and does something that one might not expect to work at all. For example, when it prints the line for DOS (at the top of the diagram above), it already called Find Next and found the DOS directory. TREE then saves the DTA contents (those 21 bytes) on its internal directory stack, and calls Find Next again. In this case, there is another result, the H directory, so TREE prints an I-shape and goes into the DOS directory to find out if there are any sub-directories.

When TREE gets back to where it was (in this case quickly, since no sub-directories exist), it restores the saved DTA contents and runs Find Next again. And as before, Find Next has to return the H directory as the next result.

In other words, TREE saves and restores the DTA contents in order to rewind the search to a previous position. Depending on the directory tree depth, there may be a number of simultaneously active searches, each for one level of the directory hierarchy.

Understanding the inner workings of TREE.COM necessitated further changes to EMU2. The information stored in the DTA needs to include not only an identifier associating the search with host-side state, but also a position in the host side search. That way TREE can successfully return to a previous position.

Note that this does not break the heuristics for freeing host-side state. TREE only ever rewinds the search back by one entry. Thus when Find Next reaches the end of the search results, the host-side state can be discarded. Although TREE may rewind the search, it only needs to call Find Next again to see that there are (still) no further results.

Case Four: MS-DOS 6.0 COMMAND.COM

Since I had the MS-DOS 4.0 COMMAND.COM working reasonably well, I thought the COMMAND.COM from MS-DOS 6.0 would work too.

Not so. The DIR command only returned one result, which necessitated further digging.

For reasons that aren’t terribly obvious to me, COMMAND.COM uses FCB searches to query directory contents. Now, FCB searches work a little differently from “normal” directory searches.

INT 21h/11h (Find First FCB) takes an unopened FCB (File Control Block) as input. The FCB contains the file name (possibly containing wildcards) and searches the current directory. The search results are then stored in the DTA (again set by INT 21h/1Ah).

EMU2 originally associated the host-side search state with the DTA address for both FCB and non-FCB searches. While implementing fixes to cases One to Three, I saved the search state in the DTA.

But then I found that MS-DOS 6.0 COMMAND.COM calls Find First FCB, which places the first search result into the DTA, and then calls INT 21h/47h (Get Current Directory)… which happens to place the result into the same memory area that also holds the DTA.

This led to a facepalm moment when I read through the MS-DOS 4.0 source code in order to understand how FCB searches work. It turned out that the search continuation information is not placed into the DTA at all, but rather into the unopened FCB that is passed as input to both INT 21h/11h (Find First FCB) as well as INT 21h/12h (Find Next FCB). If only I had read the RBIL more carefully, the information was right there.

Not too surprisingly, it turned out that the magic 21-byte search continuation state area is actually the same for FCB and non-FCB searches, and its layout is derived from the unopened FCB format, going back to DOS 1.x.

One way or another, the search continuation area includes the input filename (possibly with wildcards) and attributes, to know what it’s looking for. It also needs to contain the cluster number of the directory being searched, as well as position within the directory.

What’s Reasonable and What Isn’t?

The design (or lack thereof) of DOS directory searches raises an obvious question: Just how long are directory searches valid?

The answer is “as long as the directory being searched remains unmodified”. As long as a DOS process doing the searches has control, it can (but of course doesn’t have to!) ensure that directory contents don’t change. As soon as the DOS EXEC call is invoked to start a new process, it can’t be assumed that directory contents remain static.

In practice, searches remain valid indefinitely, because DOS makes no effort to ensure that directory searches are invalidated even when directory contents change. However, the term “valid” needs to be understood to mean that Find Next can be called, not that Find Next will return sensible data. It is easy to imagine a situation when a search is started, the directory is deleted, and the clusters it used to occupy are replaced with completely different data. Find Next will then attempt to process random file data as a directory, with unpredictable results.

That’s not even considering a situation when a program modifies the search continuation information (in the DTA or FCB). DOS has no way of preventing that, but fortunately DOS programs have no incentive to do so.

Redirectors (whatever method they use to interface with DOS) are thus in a difficult situation, because they often need to associate some “server side” state with a search (necessitated by the fact that the 21-byte search continuation area is likely not large enough to store all required data in it). Since there is no “Find Close” API, redirectors are forced to guess when a search is done.

An approach using LRU logic may not be sufficient. When performing recursive directory searches (such as TREE.COM), a program may start searching the root directory, descend into a sub-directory, and search hundreds or thousands of nested directories before returning back to the top-level directory. The search of the top-level directory must remain active on the server side the entire time, without being discarded.

That’s where heuristics help, because they greatly reduce the number of open searches, thus making it much more likely that still-active searches won’t be inadvertently recycled.

Fortunately well behaved DOS programs do not actively try to break such heuristics, and although they may do surprising things, the list of such surprises is (probably!) not endless.

This entry was posted in Development, DOS, Undocumented. Bookmark the permalink.

52 Responses to Learn Something Old Every Day, Part XI: DOS Directory Searches are Bizarre

  1. Michal Necasek says:

    Windows 95 would have used is own floppy drivers, while DOS and Windows 3.1 used the BIOS.

    OS/2 had its own floppy drivers too, and in my experience floppy access was quite fast. They all use DMA for floppy transfers… at least on the usual PC/AT compatibles.

    Some speedup usually stems from optimized FDC parameters (higher step rate, shorter motor speed-up). A lot depends on the floppy format. Floppies formatted with optimized track skew and possibly interleave can be very noticeably faster to read.

  2. MrT says:

    Hi Michal thanks! As I recall the speed up was for std formatted 1.44M floppies not just specially format. In particular the floppies on which win95 (and also os/2 i think) seemed slower than normal floppies, probably because of the higher density.

    I am quite sure about the speed up but it is long since i had a machine where this could possibly be tested. Unless one could make a fancy solution of bridging a real floppy drive to a VM and then test with DOS vs Win95.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.