Several months ago I had a go at producing a high resolution 256-color driver for Windows 3.1. The effort was successful but is not yet complete. Along the way I re-learned many things I had forgotten, and learned several new ones. This blog entry is based on notes I made during development.
Source Code and Development Environment
I took the Video 7 (V7) 256-color SuperVGA sample driver from the Windows 3.1 DDK as the starting point. The driver is written entirely in assembler (yay!), consisting dozens of source files, over 1.5MB in total size. This driver was not an ideal starting point, but it was probably the best one available.
The first order of business was establishing a development environment. While I could have done everything in a VM, I really wanted to avoid that. Developing a display driver obviously requires many restarts of Windows and inevitably also reboots, so at least two VMs would have been needed for a sane setup.
Instead I decided to set everything up on my host system running 64-bit Windows 10. Running the original 16-bit development tools was out, but that was only a minor hurdle. The critical piece was MASM 5.NT.02, a 32-bit version of MASM 5.1 rescued from an old Windows NT SDK. The Windows 3.1 DDK source code is very heavily geared towards MASM 5.1 and converting to another assembler would have been a major effort, likely resulting in many bugs .
Fortunately MASM 5.NT.02 works just fine and assembles the source code without trouble. For the rest, I used Open Watcom 1.9 tools:
wrc (make utility, linker, and resource compiler). I used a floppy image to get the driver binary from the host system to a VM, a simpler and faster method than any sort of networking.
With everything building, the real fun started: Modifying the Video 7 driver to actually work on different “hardware”.
Trials and Tribulations
Fortunately there was not a huge amount of Video 7 specific code in the sample driver. Unfortunately the hardware specific code was sprinkled throughout the code base.
My first change was to unify the bank switching code, which is critical for performance. The sample driver had about half a dozen different bank switching routines (no, I don’t know why). I replaced them with one, and made sure the bank switching is only done when necessary (i.e. the current bank differs from the requested one).
Why not use a linear framebuffer, you ask? From the beginning, I did not want to restrict the driver to 386 Enhanced mode Windows. Using a LFB in Standard mode is difficult; it’s easy to reprogram a selector base, but that breaks down when the system runs with paging and the LFB is not mapped. Even worse, in real mode a LFB just can’t be used, period.
Then came the painstaking work of removing the V7 specific drawing code. There was more of it than I’d expected. The V7 hardware has latches and pattern blit registers that accelerate certain operations. Now, there were pure software fallback drawing paths more or less everywhere, but not at all clearly identified. It took some effort to force all drawing to go through the software path.
There was one very nasty bug related to this; the driver assumed that the hardware pattern blit registers take care of pattern rotation. Forcing pure software drawing could lead to a situation where the pattern was not correctly rotated. This caused very visible problems when dragging windows around, as the selection rectangle (a patterned line) was prone to leaving “droppings” behind.
I ditched any attempts to use offscreen memory in the driver. While that is usually a performance win on real hardware, it’s not in a VM. Due to the bank switching overhead, moving data from system memory is always faster.
The drawing code in the V7 driver assumes that a single scanline never crosses a bank boundary; that significantly simplifies the drawing logic because there’s no need to potentially switch banks between two adjacent pixels. The original driver used a 1K pitch, allowing maximum horizontal resolution of 1024 pixels. I changed that to 2K, which enables resolutions up to 2048 pixels horizontally. This could potentially be made more flexible in order to conserve video memory, but at least for the time being that didn’t seem worth the effort.
The mouse cursor drawing code had another nasty bug in it. The V7 driver would more or less always use the hardware cursor and the software fallback was probably very rarely used, if ever. Under some circumstances, the routine to save screen contents under the cursor could be entered with the direction flag set, and ended up copying data in the wrong direction and overwriting innocent memory. A single CLD in the right place fixed that.
I also had to contend with the question why the colors in my driver are different from the Windows 3.1 VGA/SVGA drivers. I learned that for whatever reason, the Windows 3.0 and 3.1 8514/A driver (the canonical high-res driver) really used a different color scheme. This was documented in the Windows 3.0 and 3.1 DDKs, although no explanation was provided as to why the colors should be different.
The linker (
wlink) caused one very interesting problem. By default,
wlink enables far call optimization, replacing far calls with near ones. This optimization is almost always safe and a performance win, but not in the case of the Windows display driver. The driver “compiles” drawing routines by copying fragments of code from the code segment on the stack, assembling a selection of them together as needed and modifying constants within the code. Now,
wlink optimized the far calls within the code segment, which would have been fine, but when that code got copied on the stack, calls to the code segment really needed to be far. Disabling the far call optimization was trivial once I knew what the problem was.
As a side project, I also wrote a quick and dirty
wmapsym tool, a functional equivalent of Microsoft’s MAPSYM but using Watcom map files as input. This proved extremely useful when debugging the driver.
Apropos debugging—the tool for Windows 3.1 driver debugging is WDEB386, a more or less standard Microsoft-ish debugger similar to SYMDEB, the OS/2 kernel debugger, NTSD, and others. I used it with input and output redirection to a serial port; this was routed to a pipe on the host, and PuTTY attached to the pipe.
Going More Retro
The display driver functionality in Windows 3.1 Standard and 386 Enhanced does not matter much. The one area where there’s a major difference is DOS session support. In 386 Enhanced mode, there’s a whole dedicated VxD (VDDVGA) that handles video virtualization.
Even when switching to a fullscreen DOS session, the display driver remains active in 386 Enhanced mode, but it is notified via
dev_to_background calls that it’s going to the background or coming back. In Standard mode, the driver is shut down via the
Disable call when switching to a full-screen DOS session, and re-initialized via
Enable on the way back.
Things started getting even more interesting with Windows 3.0. In Standard and 386 Enhanced mode, the differences from Windows 3.1 are minimal. But Windows 3.0 running in real mode is a different beast. I had to modify the driver to not use any APIs available only in protected mode and decide at runtime (using
WinFlags) what to do.
It would have been lovely if I had the Video 7 sample driver from the Windows 3.0 DDK. Alas, I never managed to find it. Anyone?
Windows 2.x was more work to get going. The basic structure of the driver is the same, there are just fewer GDI calls the driver needs to implement. For the most part, Windows 2.x is extremely similar to Windows 3.0 in real mode. The difference is that the API calls added to support protected mode (such as
AllocCSToDSAlias) do not exist in Windows 2.x at all. The drawing code is essentially identical, but the driver initialization and teardown need to be slightly different.
In theory it might have been possible to import the Windows 3.x specific routines dynamically, and use a single binary for Windows 2.x and 3.x. In practice that is not workable because the drivers also need a different format of resources (Microsoft significantly changed the resource format between Windows 2.x and 3.0). It was therefore much simpler to create a separate Windows 2.x driver binary and use conditional compilation for using either Windows 3.x or 2.x code paths.
A related complication was that I could not find a resource compiler capable of dealing with Windows 2.x resources and running on 32-bit Windows. I resorted to running RC from a Windows 2.x SDK in a DOS VM in order to finalize the 2.x driver binary. Not pretty but fully functional.
All in all, it was an interesting retro development trip. And there’s more work to be done.
Update: An interesting problem was noticed in Windows 3.1 running in Enhanced 386 mode. A windowed DOS application performing a mode set (e.g. ‘MODE CO80’) would corrupt the display. Specifically the host VGA hardware (host from Windows 3.1’s point of view) would switch to planar mode, disrupting 256-color banked mode operation. This was unexpected since windowed DOS apps shouldn’t be able to do that.
This problem did not happen on Windows 3.0, and moreover it also did not happen when using VDDVGA30.386 on Windows 3.1 (the 3.0 compatible VDD or Virtual Display Driver is shipped with Windows 3.1 and some drivers use it).
Further probing established that on Windows 3.1, the display driver must call into the VDD and use the poorly documented VDDsetaddresses (0Ch) service. This subtly changes the behavior of VDDVGA. An internal fVDD_DspDrvrAware flag is set, which skips certain parts of the VDD logic.
Without the display driver linking up with the VDD, it appears that the VDD itself (as opposed to the windowed DOS box) modifies the VGA register state. This is likely not a problem for a display driver running in a planar VGA/EGA mode.
The exact logic is very poorly explained in the DDK documentation, but can be discerned from the VDDVGA source code in the Windows 3.1 DDK. As always, the source code is the best documentation.