ICAT and Watcom

Some time ago I embarked on a minor retro development project related to this post. For convenience, I decided to do the development on a Windows host machine and (obviously) test in a VM. For C code compiling, and linking, I used the Open Watcom tools, and for the assembler code, MASM 5.1 (because nothing else likely works anyway).

The initial phase went well and I only needed the basic OS/2 kernel debugger (KDB) to analyze a few problems. But then I started toying with the idea of reworking the code base a bit, and that would really benefit from understanding the code better. Tracing code in a debugger is an excellent way of learning how it works. But KDB is definitely not up to that task, or at least not for code written in a language other than assembler.

The problem with KDB is that it’s a rough equivalent of the ancient SYMDEB debugger. It understands symbols (that is, it can match names with addresses), but it can’t map code locations to source files, and worse, it has no idea about complex types (structures, arrays, etc.). Analyzing the code and data is manually is possible, by matching code and data offsets with source and header files, but in the end it is a very poor use of time which could be spent much more productively.

IBM ICAT Debugger

Back in 1996 or 1997, IBM released a debugger that could take care of this. ICAT (Interactive Code Analysis Tool) is a remote kernel debugger for OS/2, roughly similar to WinDbg. ICAT is a remote debugger, meaning that it runs on a host (OS/2) machine while debugging a target OS/2 system with kernel debugger installed.

The ICAT debugger running on OS/2

While the standard KDB simply drives a serial terminal, ICAT uses a custom packet protocol to communicate with the target system. This protocol was added to the OS/2 kernel debugger in Warp 4 and also integrated into the later Warp 3 FixPacks.

ICAT appears to have been used for OS/2 for PowerPC development—presumably running a capable GUI debugger on a stable system was rather more productive than running a local debugger on top of an unstable OS still under construction. Later IBM also shipped a separate ICAT for Java debugger.

The initial ICAT releases for (Intel) OS/2 kernel debugging only supported serial communication. The final ICAT versions (4.06 from 2002 is the last one) also support UDP transport. Comparing how Windows and OS/2 approached kernel debugging over Ethernet is interesting. For a long time, Windows did not support it at all. When support was added, it relied on a special driver and only very few Ethernet controllers were supported.

In contrast, IBM used the standard OS/2 network stack and only supplied a custom KDBNET protocol driver. The IBM approach was limited by the fact that the debugger link only came up when the OS/2 network stack was brought up, relatively late in the startup sequence. On the other hand, no special drivers were needed and any Ethernet adapter supported by OS/2 worked for kernel debugging. Different tradeoffs.

ICAT vs Watcom

My initial attempts to use ICAT with modules built with (Open) Watcom C were a complete failure. The debugger didn’t see any debugging information at all. Random people on the Internet claimed that “it doesn’t work”.

The old IBM articles about ICAT didn’t really say if it should work or not. But another article published around the same time clearly said that ICAT could be used for debugging OS/2 drivers written in Watcom C++.

The ICAT documentation is somewhat self-contradictory but it does say that debugging drivers built with Watcom tools is possible, and even shows which command line switches and options to use for compiling and linking. It is clear that CodeView style debugging information must be used with ICAT, not the default DWARF or the old Watcom debugging information.

Still, I had no luck at all. Finally it occurred to me that maybe I should try Watcom C/++ 10.6, because that was the version mentioned as working. Lo and behold, a simple hello world program compiled and linked with Watcom C 10.6 worked just fine under ICAT–the debugger found the source code and was able to examine variables etc.

So when did it stop working? I tried Watcom 11.0 and found that although ICAT could see source code and variables, it was unable to display structures or really anything more than basic types. Which is a big big problem. By Open Watcom 1.9, ICAT couldn’t find any debug information at all.

The second problem turned out to be easier to solve. The Open Watcom 1.9 linker inserts a 512-byte data block at the beginning of CodeView debug information data, which is meant only for PE-format (NT) modules. ICAT apparently looks at the LX executable (OS/2) module header and if it finds an offset to debug information there, it expects to find a supported signature (NB09 in this case) at that offset. When it’s not there, the debugger gives up and assumes debug information in a supported format is not present.

The Watcom debugger does not give up and next looks at the end of the executable image, to see if it can find the debugging information signature that way—and it usually can. So the problem was not noticed there. Patching the LX header would have been easy enough, but fixing the linker to not write the extra data to non-PE modules was the better option.

With that change in place, ICAT could see source code, but it had exactly the same problem as Watcom 11.0—almost no type information. Trying to break down the problem further, I realized that Watcom 10.6 used an original Microsoft cvpack utility, while 11.0 shipped with a replacement written by Watcom. And that was causing the difference, not the compiler or linker.

I spent a fair amount of time looking at CodeView dumps and reading the specification, trying to understand why the debugger refused to show type information. I had a suspicion that maybe the ICAT debugger didn’t like the debug info ordering. Microsoft cvpack, when producing type information for a structure, orders things like this:

0x1001 : Length = 26, Leaf = 0x0204 LF_FIELDLIST_16t

0x1002 : Length = 22, Leaf = 0x0005 LF_STRUCTURE_16t
# members = 2, field list type 0x1001,

That is, type 0x1001 is the field list and the following structure type 0x1002 refers to the previous type.

The Watcom cvpack uses the opposite ordering where type index 0x1001 refers to a “future” type 0x1002:

0x1001 : Length = 22, Leaf = 0x0005 LF_STRUCTURE_16t
# members = 2, field list type 0x1002,

0x1002 : Length = 26, Leaf = 0x0204 LF_FIELDLIST_16t

The Watcom debugger clearly does not care about the ordering, but maybe IPMD does? Only how to change it…

Before I figured out how to modify cvpack to reverse the ordering, I realized that it was not just the structures but everything in the sstGlobalTypes subsection of the CodeView debugging information that IPMD didn’t seem to understand.

I tried changing the order in which the subsections were emitted to better match Microsoft’s cvpack, but to no avail. Then I took the CV4 spec and tried to go through the actual data in the final packed debugging information.

There I found something that Microsoft’s own cvdump utility didn’t even hint about. At the very start of the sstGlobalTypes subsection, there’s a 32-bit integer that the CV4 specification is a bit confused about. The text states that the “first long word of the [sstGlobalTypes] subsection contains the number of types in the table”, but that’s actually the second one.

The first doubleword is shown in a diagram as a “flag”, and broken down into three bytes that are unused and one that is a “signature”, unhelpfully described only as a “global types table signature”, with no explanation of what that means or what a valid signature might possibly look like.

Fortunately, elsewhere in the CV4 specification, the mystery is somewhat explained when the $$TYPES segment/section of an object file is defined:

The first four bytes of the $$TYPES table is used as a signature to specify the version of the Symbol and Type OMF contained in the $$TYPES segment. If the first two bytes of the $$TYPES segment are not 0x0000, the signature is invalid and the version is assumed to be that emitted for an earlier version of the Microsoft CodeView debugger (version 3.x and earlier). If the signature is 0x00000001, the Symbol and Type OMF has been written to conform to the later version of the Microsoft debugger (version 4.0) specification. All other values for the signature are reserved.

The signature is not particularly useful in the final debugging information, since the NB09 header already specifies it as CodeView 4.x. But the Watcom cvpack utility emitted a zero in place of the signature, and ICAT clearly expected to see a one… and when it didn’t, it decided that the type information is in an unknown format and just skipped all of it. Oops.

In the end, the difference between working and non-working (or at least only partially working) debugging information was literally one bit, and fixing the Watcom cvpack utility to produce CodeView debugging information acceptable to IBM ICAT was trivial.

With the change in place, the ICAT debugger was able to load the debugging information and properly display structure contents. Now I can actually use it in a productive fashion.

This entry was posted in Debugging, Development, IBM, OS/2, Watcom. Bookmark the permalink.

2 Responses to ICAT and Watcom

  1. Josh Rodd says:

    Lots of fond memories of ICATGAM.EXE. I used to debug DPMI DOS programs (inside a VDM) and it worked shockingly well for that purpose. I pretty much learned how to reverse engineer 32 bit programs using it.

    The PowerPC history is interesting – ICAT is a really complete, high quality piece of software, whereas otherwise doing low level OS/2 development tended to be very jerry-rigged and unpleasant. Does that mean a version of ICAT is out there that can understand a PowerPC target?

  2. Michal Necasek says:

    There are ICAT versions in the OS/2 for PowerPC Toolkit. IBM also had a version for Java later on.

    For code not written in assembler, ICAT is way more productive than KDB. I did some retro development not long ago and found that stepping through C code in KDB is not that bad, but the inability to understand complex structures is a major productivity killer.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.