ICAT and Watcom

Some time ago I embarked on a minor retro development project related to this post. For convenience, I decided to do the development on a Windows host machine and (obviously) test in a VM. For C code compiling, and linking, I used the Open Watcom tools, and for the assembler code, MASM 5.1 (because nothing else likely works anyway).

The initial phase went well and I only needed the basic OS/2 kernel debugger (KDB) to analyze a few problems. But then I started toying with the idea of reworking the code base a bit, and that would really benefit from understanding the code better. Tracing code in a debugger is an excellent way of learning how it works. But KDB is definitely not up to that task, or at least not for code written in a language other than assembler.

The problem with KDB is that it’s a rough equivalent of the ancient SYMDEB debugger. It understands symbols (that is, it can match names with addresses), but it can’t map code locations to source files, and worse, it has no idea about complex types (structures, arrays, etc.). Analyzing the code and data is manually is possible, by matching code and data offsets with source and header files, but in the end it is a very poor use of time which could be spent much more productively.

IBM ICAT Debugger

Back in 1996 or 1997, IBM released a debugger that could take care of this. ICAT (Interactive Code Analysis Tool) is a remote kernel debugger for OS/2, roughly similar to WinDbg. ICAT is a remote debugger, meaning that it runs on a host (OS/2) machine while debugging a target OS/2 system with kernel debugger installed.

The ICAT debugger running on OS/2

While the standard KDB simply drives a serial terminal, ICAT uses a custom packet protocol to communicate with the target system. This protocol was added to the OS/2 kernel debugger in Warp 4 and also integrated into the later Warp 3 FixPacks.

ICAT appears to have been used for OS/2 for PowerPC development—presumably running a capable GUI debugger on a stable system was rather more productive than running a local debugger on top of an unstable OS still under construction. Later IBM also shipped a separate ICAT for Java debugger.

The initial ICAT releases for (Intel) OS/2 kernel debugging only supported serial communication. The final ICAT versions (4.06 from 2002 is the last one) also support UDP transport. Comparing how Windows and OS/2 approached kernel debugging over Ethernet is interesting. For a long time, Windows did not support it at all. When support was added, it relied on a special driver and only very few Ethernet controllers were supported.

In contrast, IBM used the standard OS/2 network stack and only supplied a custom KDBNET protocol driver. The IBM approach was limited by the fact that the debugger link only came up when the OS/2 network stack was brought up, relatively late in the startup sequence. On the other hand, no special drivers were needed and any Ethernet adapter supported by OS/2 worked for kernel debugging. Different tradeoffs.

ICAT vs Watcom

My initial attempts to use ICAT with modules built with (Open) Watcom C were a complete failure. The debugger didn’t see any debugging information at all. Random people on the Internet claimed that “it doesn’t work”.

The old IBM articles about ICAT didn’t really say if it should work or not. But another article published around the same time clearly said that ICAT could be used for debugging OS/2 drivers written in Watcom C++.

The ICAT documentation is somewhat self-contradictory but it does say that debugging drivers built with Watcom tools is possible, and even shows which command line switches and options to use for compiling and linking. It is clear that CodeView style debugging information must be used with ICAT, not the default DWARF or the old Watcom debugging information.

Still, I had no luck at all. Finally it occurred to me that maybe I should try Watcom C/++ 10.6, because that was the version mentioned as working. Lo and behold, a simple hello world program compiled and linked with Watcom C 10.6 worked just fine under ICAT–the debugger found the source code and was able to examine variables etc.

So when did it stop working? I tried Watcom 11.0 and found that although ICAT could see source code and variables, it was unable to display structures or really anything more than basic types. Which is a big big problem. By Open Watcom 1.9, ICAT couldn’t find any debug information at all.

The second problem turned out to be easier to solve. The Open Watcom 1.9 linker inserts a 512-byte data block at the beginning of CodeView debug information data, which is meant only for PE-format (NT) modules. ICAT apparently looks at the LX executable (OS/2) module header and if it finds an offset to debug information there, it expects to find a supported signature (NB09 in this case) at that offset. When it’s not there, the debugger gives up and assumes debug information in a supported format is not present.

The Watcom debugger does not give up and next looks at the end of the executable image, to see if it can find the debugging information signature that way—and it usually can. So the problem was not noticed there. Patching the LX header would have been easy enough, but fixing the linker to not write the extra data to non-PE modules was the better option.

With that change in place, ICAT could see source code, but it had exactly the same problem as Watcom 11.0—almost no type information. Trying to break down the problem further, I realized that Watcom 10.6 used an original Microsoft cvpack utility, while 11.0 shipped with a replacement written by Watcom. And that was causing the difference, not the compiler or linker.

I spent a fair amount of time looking at CodeView dumps and reading the specification, trying to understand why the debugger refused to show type information. I had a suspicion that maybe the ICAT debugger didn’t like the debug info ordering. Microsoft cvpack, when producing type information for a structure, orders things like this:

0x1001 : Length = 26, Leaf = 0x0204 LF_FIELDLIST_16t

0x1002 : Length = 22, Leaf = 0x0005 LF_STRUCTURE_16t
# members = 2, field list type 0x1001,

That is, type 0x1001 is the field list and the following structure type 0x1002 refers to the previous type.

The Watcom cvpack uses the opposite ordering where type index 0x1001 refers to a “future” type 0x1002:

0x1001 : Length = 22, Leaf = 0x0005 LF_STRUCTURE_16t
# members = 2, field list type 0x1002,

0x1002 : Length = 26, Leaf = 0x0204 LF_FIELDLIST_16t

The Watcom debugger clearly does not care about the ordering, but maybe IPMD does? Only how to change it…

Before I figured out how to modify cvpack to reverse the ordering, I realized that it was not just the structures but everything in the sstGlobalTypes subsection of the CodeView debugging information that IPMD didn’t seem to understand.

I tried changing the order in which the subsections were emitted to better match Microsoft’s cvpack, but to no avail. Then I took the CV4 spec and tried to go through the actual data in the final packed debugging information.

There I found something that Microsoft’s own cvdump utility didn’t even hint about. At the very start of the sstGlobalTypes subsection, there’s a 32-bit integer that the CV4 specification is a bit confused about. The text states that the “first long word of the [sstGlobalTypes] subsection contains the number of types in the table”, but that’s actually the second one.

The first doubleword is shown in a diagram as a “flag”, and broken down into three bytes that are unused and one that is a “signature”, unhelpfully described only as a “global types table signature”, with no explanation of what that means or what a valid signature might possibly look like.

Fortunately, elsewhere in the CV4 specification, the mystery is somewhat explained when the $$TYPES segment/section of an object file is defined:

The first four bytes of the $$TYPES table is used as a signature to specify the version of the Symbol and Type OMF contained in the $$TYPES segment. If the first two bytes of the $$TYPES segment are not 0x0000, the signature is invalid and the version is assumed to be that emitted for an earlier version of the Microsoft CodeView debugger (version 3.x and earlier). If the signature is 0x00000001, the Symbol and Type OMF has been written to conform to the later version of the Microsoft debugger (version 4.0) specification. All other values for the signature are reserved.

The signature is not particularly useful in the final debugging information, since the NB09 header already specifies it as CodeView 4.x. But the Watcom cvpack utility emitted a zero in place of the signature, and ICAT clearly expected to see a one… and when it didn’t, it decided that the type information is in an unknown format and just skipped all of it. Oops.

In the end, the difference between working and non-working (or at least only partially working) debugging information was literally one bit, and fixing the Watcom cvpack utility to produce CodeView debugging information acceptable to IBM ICAT was trivial.

With the change in place, the ICAT debugger was able to load the debugging information and properly display structure contents. Now I can actually use it in a productive fashion.

This entry was posted in Debugging, Development, IBM, OS/2, Watcom. Bookmark the permalink.

18 Responses to ICAT and Watcom

  1. Josh Rodd says:

    Lots of fond memories of ICATGAM.EXE. I used to debug DPMI DOS programs (inside a VDM) and it worked shockingly well for that purpose. I pretty much learned how to reverse engineer 32 bit programs using it.

    The PowerPC history is interesting – ICAT is a really complete, high quality piece of software, whereas otherwise doing low level OS/2 development tended to be very jerry-rigged and unpleasant. Does that mean a version of ICAT is out there that can understand a PowerPC target?

  2. Michal Necasek says:

    There are ICAT versions in the OS/2 for PowerPC Toolkit. IBM also had a version for Java later on.

    For code not written in assembler, ICAT is way more productive than KDB. I did some retro development not long ago and found that stepping through C code in KDB is not that bad, but the inability to understand complex structures is a major productivity killer.

  3. lobsterguy says:

    Microsoft KDNET is actually well structured, and isolated enough that even ReactOS guys have found a way to load and use the binary DLL and drivers to debug their own kernel via ethernet.
    As for the KDNET, them are simple enough than anyone would just make new ones by porting code from BSD (or linux, but lately them have introduced lots of boilerplate which isn’t really portable across *nix OSs anymore, let alone NT)… The main problem is those drivers needing NT private libraries and headers to link against, which you can’t get without getting some MVP kit (not as hard to get as the older HALkit, but still not available to the general public). The ethernet card also needs to support certain features, which aren’t available to many old PCI 100/10 and 1G adapters, but modern adapters support and work without trouble, even Realtek’s.

  4. Josh Rodd says:

    For the brief window of time when I was foolish enough to try to write PM apps, I used ICAT to debug them as well, since otherwise debugging PM apps is a nightmare: it’s hard to debug things inside your event queue, and if you call out to some other DLL inside your event queue, it’s basically impossible.

    So I just used ICAT and it didn’t matter if the machine under test was unresponsive to input.

    Did the PowerPC Toolkit come with the OS/2 for PowerPC CD? Or was it available from somewhere else? (I tended to always get my toolkits from DevCon CDs, since you could get those sent to you for free if you were an IBM employee.)

  5. LightElf says:

    Is there a description of the UDP protocol used by KDBNET somewhere? How exactly are debugger data packets packaged into UDP packets?

  6. Michal Necasek says:

    I don’t think KDNET is in any way bad, it’s just interesting to compare the design tradeoffs. The fact that you need a special driver clearly is an issue, but if you can get it it’s great!

    Do you know what features exactly the cards need to have that might not be present on all adapters? Just wondering what it might be.

  7. Michal Necasek says:

    There were various hacks (aka undocumented APIs) for PM debuggers but… my experience was not great, things had a nasty tendency to go sideways if you weren’t super careful. Remote debugging was way safer, whether it was ICAT or (say) the Watcom debugger over parallel port or TCP/IP.

    The PowerPC toolkit actually came on the Developer Connection CD-ROMs, I think Volume 8 and 9 had it. The beta came with the PowerPC Toolkit too.

    One thing I remember is that IBM initially used the MetaWare C/C++ compiler, then in the last Toolkit update they had their own VisualAge PowerPC compiler, and then they scrapped the whole thing.

  8. Michal Necasek says:

    I am not aware of any description of either the serial protocol that ICAT uses when talking to the kernel debugger or of the UDP transport.

  9. Jonathan Wilson says:

    Did you submit the things you fixed upstream to OpenWatcom?

  10. lobsterguy says:

    @Michal Necasek
    As far a I know, it has to do with the way the card request interrupts. Looks like KDNET only supports cards which can do MSI-X interrupt requests. The some ms dev at osdev said Microsoft explicitly will not support 10/100 cards, only 1Gigabit ones and up (dunno if it is a technical limitation, or they just didn’t wanted to support quirks with the old cards). I remember a presentation PDF done by Alex Ionescu which explains better the technicalities of KDNET hardware support, but I can’t find it in the archives. I may share will share a copy as soon as find a copy in my own archives (and I hope someone takes those and archives them in a more appropiate place than the web archive which throws old data every day).

    BTW… Looks like these days MS distributes the headers and libraries to compile the KDNET “extensibility modules” along with the WinDBG package.
    A better read can be found in en-us/windows-hardware/drivers/debugger/how-to-develop-kdnet-extensibility-modules (at) learn.microsoft.com

  11. Julien says:

    I just only now realize that the comment RSS feed actually shows what comment you have been replying to. The website (theme?) doesn’t. For the longest time now I’ve been playing match-up with the comments here…

  12. Michal Necasek says:

    Okay, that is very interesting. I can’t quite imagine why they’d need MSI-X but it could have something to do with how KDNET works “outside” of the OS. That has nothing to do with the Ethernet standard but in practice there might not be any NICs that have MSI-X and aren’t at least GbE.

    Actually… the link you pointed at claims that the KDNET drivers must run entirely without interrupts. That would imply the card needs big enough buffers to not lose packets and probably a filter to reject unwanted traffic (of which there is a LOT on modern networks). That’s even less related to 10/100 vs GbE.

  13. LightElf says:

    > I am not aware of any description of either the serial protocol that ICAT uses when talking to the kernel debugger or of the UDP transport.
    Serial protocol of ICAT is documented (a sort of) in Debugging Handbook, but it’s
    wrapping into UDP was not documented. Years ago I was thinking about KDBNET
    replacement with something more usable (init-time debugging,, SMP-compliance, etc) …

  14. Michal Necasek says:

    Where exactly is it documented? I couldn’t quickly find anything, but maybe I was not looking at the right edition.

    I can’t imagine the UDP transport would be doing anything super complicated, most likely it just wraps the existing serial protocol. But that’s only a guess.

  15. LightElf says:

    Sorry, I was wrong. It was a “Control Program Programming Guide And Reference” from the Toolkit, where section “Kernel Debugger Communication Protocol” outlines packet format and the commands available.

  16. Michal Necasek says:

    Thanks, found it now. It’s kind of funny to see the mentions of PowerPC in that document.

  17. Lars says:

    I used to work at IBM with their , crypto processors (which are PowerPC/Linux machines), and one of the options we had for debugging was ICAT (but we had a build of the software that worked outside the card, and combined with the fact that we had the source code, that was better). I never cared to figure out how to use ICAT against the card (but I knew it was supposed to work) and always used GDB on my local laptop.

  18. Michal Necasek says:

    Ah, cool. ICAT was definitely a pretty general debugger because I know IBM published versions for at least PowerPC OS/2, Intel OS/2, and Java (running on Windows and OS/2 I think?). I’m not surprised to hear there were additional variants for internal use.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.