Wait here. No, wait here!

While working on a hobby project, I set up an OS/2 MCP2 (Convenience Package 2 for OS/2 Warp 4) virtual machine with a debug kernel and an expectation that I’d reboot the VM a lot. I was disturbed to find that there was a consistent 30-second delay on boot while setting up networking, which accounted for something like two thirds of the VM’s boot-up time.

At first I thought maybe the DHCP client was being silly, or perhaps there could be a bug in the NIC emulation. Upon closer look, it turned out that the delay was happening in the MPTSTART.CMD script, which looks like this:

@ECHO OFF
IF NOT EXIST C:\MPTN\BIN\SETUP.CMD GOTO NBSETUP
INETWAIT 1>NUL
IF ERRORLEVEL 1 GOTO END
CALL C:\MPTN\BIN\SETUP.CMD
:NBSETUP
IF NOT EXIST C:\MPTN\BIN\NBSETUP.CMD GOTO END
CALL C:\MPTN\BIN\NBSETUP.CMD
:END

The referenced SETUP.CMD script does exist on the system, and NBSETUP.CMD does not. The 30-second delay was occurring in the somewhat mysterious INETWAIT command.

I could not find much about INETWAIT command on the web, although its purpose is more or less clear. It is meant to wait for the TCP/IP stack to be ready before it can be properly configured.

On closer look, the command actually is documented in the TCP/IP Command Reference that comes with MCP2. The command “causes the current program to wait until either the binding process between the first interface driver and the TCP/IP stack is complete, or the timer expires”. It was apparently added in TCP/IP version 4.1.

The INETWAIT usage help shows the following:

[C:\mptn\bin]inetwait -?
Usage: Inetwait [wait_time  retries]
       default wait_time=10000 (in milliseconds),  retries=3

Okay, so by default, INETWAIT should wait 10 seconds, retrying up to three times, for a total of up to 30 seconds. But… why does it wait the full 30 seconds? Surely everything must be ready after the first 10-second wait? Networking did work in the VM and it could not possibly take more than 20 seconds to initialize everything…

Since the VM was already set up for debugging, it was not hard to break into the debugger during the 30-second wait and see what the INETWAIT process was up to. What I found was a little unexpected.

To communicate with the TCP/IP stack, INETWAIT uses the TCPIP32.DLL dynamic library. I believe it uses the SIOCGIFBOUND IOCTL to find out if network interfaces are bound to the TCP/IP stack. Since setting up a network interface may take some amount of time, interfaces may not be immediately ready, and INETWAIT is meant to give a bit of time until they come online–only not too much time, because they might not (maybe a cable is not plugged in, maybe a PCMCIA NIC isn’t inserted, etc.).

Sounds reasonable. Except… in the _DLL_InitTerm routine of TCPIP32.DLL (that is, code which gets executed while the DLL is being loaded), there is a check for TCP/IP stack readiness, and if that fails, the _DLL_InitTerm routine sleeps for 30 seconds.

As far as I can tell, the logic in TCPIP32.DLL completely defeats the purpose of INETWAIT. If the TCP/IP stack is ready, there will be no waiting; but if it’s not, TCPIP32.DLL will wait for 30 seconds before INETWAIT gets to do any checking or waiting.

The upshot is that either INETWAIT won’t need to wait at all, or if it does, it will most likely just add a useless extra delay in a situation where the TCP/IP stack is not going to be operational after any amount of waiting.

The behavior of TCPIP32.DLL has all the hallmarks of a quick and poorly thought out fix. Sleeping for 30 seconds in a DLL initialization routine is a terrible idea. The waiting should be done elsewhere (like INETWAIT!), but I expect it did fix some problem somewhere.

I should add that the behavior with an undesirable 30-second delay is quite system and configuration specific. It depends on how the system is configured, what hardware it uses, and how fast it runs. On many or perhaps most systems this issue probably won’t be visible because everything will be ready by the time INETWAIT runs, and neither TCPIP32.DLL nor INETWAIT will do any waiting.

I worked around the problem by adding a 1-second delay just before calling INETWAIT. That was enough to avoid the 30-second sleep in TCPIP32.DLL, as well as any waiting in INETWAIT itself, obviously. The boot time went down from about 45 seconds to 15 seconds, which is quite a difference.

Update: The INETWAIT utility has been shipped with OS/2 and used at least since Warp Connect, although there does not appear to be any documentation for it om the older versions. In Warp Connect, the wait parameters cannot be modified, but the maximum wait is 30 seconds (same as the default in later versions).

In Warp Connect and Warp 4, INETWAIT does not wait inordinately long, most likely because it does not use TCPIP32.DLL at all. The old INETWAIT utility in fact seems to be internally quite different; instead of calling a TCP/IP-specific IOCTL, it checks for the existence of the \SHAREMEM\INETXXX shared memory block.

This entry was posted in Bugs, Debugging, OS/2. Bookmark the permalink.

6 Responses to Wait here. No, wait here!

  1. Giano Mayne says:

    I’m experiencing the same issue as you with a virtual machine on my MacBook Pro, which has an Apple Silicon M2 Max processor.
    What is the command for the one-second pause?
    Thanks!

  2. Michal Necasek says:

    I could not find anything pre-made. Since the machine has 4OS2 installed, I added the following to MPTSTART.CMD:

    c:\util\4os2.exe /c delay 1

    It is also possible to use the REXX SysSleep function, see e.g. here: https://www.edm2.com/index.php/Stupid_OS/2_Tricks/REXX_Commands#SLEEP.CMD

  3. Josh Rodd says:

    One of the first things I used to do on a new 4.5 install was move MPTSTART into the Startup folder since it also mysteriously bypassed this problem. I just assumed it took a long time for the network stack to come up.

    Overall, 4.5 and TCPIP32 felt like a hurried, beta-quality release.

  4. Michal Necasek says:

    I’m not sure about hurried (maybe?) but definitely poorly tested. I remember that the Toolkit shipped with the release had a completely broken resource compiler — crashed every time you tried to run it. Just completely untested, kind of embarrassing really.

    The network delay was almost certainly caused by adding the 30-sec wait to the TCP/IP DLL load, which could have been a late change (I haven’t tried digging). That one could have escaped discovery because it didn’t always happen, and perhaps more importantly wasn’t fatal.

    And yeah moving MPTSTART to the startup folder would almost certainly fix the problem, simply because MPTSTART ran a few seconds later.

    There is another possible 30-second wait, in the IDE driver (IBM1S506.ADD). It can happen if there’s only an ATAPI device on an IDE channel. That one is somewhat poor coding, not reasonably handling a slightly odd but common configuration. And not new in MCP(2).

  5. Josh Rodd says:

    OS/2 boot times were embarrassingly long, particularly compared with early 1.x OS/2 (although they weren’t that great either, compared to DOS).

    Most of the causes of the long boot times could have been fixed, such as a tiny bit of cacheing by the mini-IFS when loading \OS2\BOOT drivers. Instead, the disk activity pretty much meant reading the directory and traversing the FAT on each and every BASEDEV= load, or the equivalent on HPFS, which seemed to be even slower.

    The Workplace Shell also took an excruciatingly long amount of time to load… and it’s not exactly clear for what. It’s a shame, because OS/2 was actually quite performant under the hood, and the Microsoft 2.0 beta performs quite well since it’s unburdened by the WPS.

    If you were doing any kind of driver development, you quickly set up a very, very minimal system to deal with these long boot times.

  6. Michal Necasek says:

    Yep, I had a text-mode only, cut down OS/2 setup with networking, reduced to the bare minimum of what it needed. It definitely helped.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.