Is it so hard to document things?

A few weeks ago I spent a bit of time debugging a program which mysteriously failed under DOS 3.3, although it worked without any apparent problem on DOS 4.0 and later, and there was no indication that it required anything later than DOS 3.1 or so.

The problem turned out to be related to INT 21h, function 6300h, a curiously underdocumented DOS API. The call was first implemented in the Far East versions of MS-DOS 2.25 and returned a table of DBCS lead bytes. That is crucial information for software running on DBCS systems, required for correctly parsing pathnames etc. The RBIL also documents the call for Far East versions of DOS 3.2 and later. So what’s the problem?

The problem is that the call only exists in Far East version of DOS 3.x. But that’s not the whole story. Microsoft’s official documentation (e.g. the MS-DOS Encyclopedia) documents the INT 21h, function 6300h API to be only available in MS-DOS 2.25. The call is not documented in any of the official references for MS-DOS 3.3, 4.0, 5.0, or PC DOS 7.0.

So how are applications supposed to get the information? In DOS 3.3, the internationalization support was redesigned (by IBM?) and function 65h was used to return country-related information. However, perhaps because IBM did not support Japanese or other DBCS languages at the time, there was no documented way to obtain the DBCS lead byte table. In DOS 4.0, sub-function 7 was added to INT 21h, function 65h to return the table, in almost the same format as the one previously returned by INT 21h, function 6300h.

The official MS-DOS documentation does not clearly say whether INT 21h, function 65h, sub-function 7 is supported on US and other non-DBCS versions of DOS, but it is supported and returns an empty lead byte table.

What the official documentation doesn’t say is that DOS 4.0 and later also supports the old INT 21h, function 6300h API, in all national language versions. The upshot is that an application can call INT 21h, function 6300h on any Far East version of DOS at least since 3.2 (and 2.25) to obtain the DBCS lead byte table (verified with MS-DOS/V 6.2).

It is a mystery why Microsoft/IBM didn’t document this fact. In all official DOS 3.3 and later references, INT 21h, function 63h is marked as reserved. Did Microsoft want to force programmers to utilize  the newer INT 21h, function 65h API? But how were they supposed to get the information on DOS versions prior to 4.0? There seems to be no good reason not to document the API, as there’s nothing secret or potentially subversive about it.

It would seem that an application might then simply call INT 21h, function 6300h to obtain the DBCS lead byte table and just be done with it, documented or not. The catch is that on US versions of DOS 3.x, the INT 21h, function 6300h API simply does nothing. It does not indicate an error, it just returns without modifying any registers or flags.

So how to get the DBCS lead byte table without complicated DOS version checks and guesswork? It’s actually easy. The DS:SI registers should be set to 0:0 before calling the API. If a valid DBCS lead byte table is returned, the DS:SI registers will be modified (the table can’t possibly be stored at 0:0). If DS:SI are still 0:0, the API is not implemented, the DOS is not a Far East version, and hence there are no double-byte characters and no lead byte table.

Note that INT 21h, function 6300h is also implemented in the OS/2 2.0 DOS box and in NTVDM.  It is likewise implemented in DR DOS, at least versions 6.0 and later. Interestingly, the API is not documented in the DR DOS technical reference either, but the DEBUG.EXE utility in Novell DOS 7 displays an appropriate description (“DOS: Two byte chars”) when the INT 21h service is about to be executed.

Update: Last week I obtained a copy of Developing Applications Using DOS by Christopher, Feigenbaum, and Saliga. While it is not an official DOS reference book, it was written by the engineers who led the development of DOS 4.0 at IBM and it is clearly an unusually well informed book. INT 21h, function 6300h is documented in detail and marked as a DBCS-only function introduced in DOS 3.2. The function is labeled as “published”, that is supported by future DOS versions. There’s a general note that DBCS functions return an invalid function error on non-Asian DOS versions, but no specific remarks for INT 21h, function 6300h with regard to DOS versions.

This entry was posted in DOS. Bookmark the permalink.

7 Responses to Is it so hard to document things?

  1. Yuhong Bao says:

    Well, remember that the MS-DOS encyclopedia dates back to 1986. Maybe the DBCS version of MS-DOS 3.2 was not released at the exact time it was written.

  2. michaln says:

    Remember that there was a second edition published in 1988, which is what I was looking at. It describes DOS 3.3 (in an separate, fairly extensive appendix), so it can’t possibly be too old.

  3. Yuhong Bao says:

    I think the normal US DOS documentation do not describe it exactly because it was only applicable to DBCS versions of DOS.

  4. michaln says:

    Then why document the INT 21h, function 6300h API for the DOS 2.25 DBCS releases? If the US references ignored all DBCS related APIs, it would be somewhat understandable, but that’s clearly not the case.

  5. Yuhong Bao says:

    Well, notice only the MS-DOS encyclopedia document it.

  6. michaln says:

    And Advanced MS-DOS Programming. Both published by Microsoft Press.

  7. Yuhong Bao says:

    I think the question is that is it so hard to update documentation, because that is the problem.

Leave a Reply

Your email address will not be published. Required fields are marked *