The other day I set out to solve a seemingly simple problem: With a DOS extended application, lock down memory buffers using DPMI and use them for bus-mastering (BusLogic SCSI HBA, though the exact device model isn’t really relevant to the problem).
Now, DPMI does not allow querying the physical address of a memory region, although it does have provisions for mapping a given physical memory area. But that doesn’t help here–mapping physical memory is useful for framebuffers where a device memory needs to be mapped so that an application can access it. In my case, I needed the opposite, allowing a bus-mastering device to use already-allocated system memory.
As many readers probably know, VDS (Virtual DMA Services) should solve this problem through the “Scatter/Gather Lock Region” VDS function. The function is presented with a linear address and buffer size, and returns one or more physically contiguous regions together with their physical addresses.
I already had VDS working for low (DOS) memory, but I just could not get it working for “normal” extended memory. It did not matter if I used statically allocated variables placed in the executable, C runtime
malloc(), or direct DPMI memory allocation functions. The VDS call succeeded and filled the result buffer with the same address I passed in, indicating a 1:1 linear:physical mapping, except the memory definitely was not mapped 1:1. So bus-mastering couldn’t work, because the addresses I programmed into the adapter were bogus. But why was this happening?
The exact same problem happened with EMM386 version 4.50 (PC DOS 2000) and QEMM 8.01. It also didn’t matter if I used the DOS/4GW or CauseWay DOS extender. The result was always the same, VDS gave me the wrong answers.
On a whim, I ran my code in the Windows 3.1 DOS box. And lo and behold, it worked! Suddenly VDS gave me the correct answers, i.e. physical addresses quite different from linear. So my VDS code was not wrong.
After more poking around, I’m not quite sure if this is a bug in EMM386 and QEMM, or in the DOS extenders. The QEMM documentation (QPI.TEC, QPI_GetPTE call) hints that for QEMM, only linear addresses up to 1088KB (1024 + 64) might have their physical address returned correctly. For EMM386 the exact logic is different but the behavior is similar: for higher linear addresses VDS does not bother translating the addresses and returns the input addresses unchanged (but does not fail!).
This is very likely why the DOS/4G(W) FAQ says (in so many words) about DMA to/from extended memory “don’t even try that, it’s not worth the trouble”. I followed the FAQ’s advice, allocated the required buffers in low memory, and hey presto, everything worked the way it was supposed to.
Since I wasn’t quite able to leave well enough alone, I had to try Jemm as well. It failed just like EMM386 and QEMM. I also tried the DOS/32A extender, but it behaved just like DOS/4GW and CauseWay–the physical addresses provided by VDS were wrong.
Using the Qualitas 386MAX-derived DPMIONE DPMI 1.0 host with EMM386 likewise did not change the outcome; VDS still wasn’t working.
On the other hand, using DOS/4GW, CauseWay, or DOS/32A on a system without a memory manager did work–because then there was a 1:1 linear to physical address mapping.
Where’s the Bug?
Windows 3.1 shows that this can work. So why doesn’t it always? The general answer is “because no one cared enough about making this work everywhere”.
VDS services need to be implement by a DOS memory manager (EMM386, QEMM, etc.) because without VDS, UMBs will break in nasty ways. VDS also needs to be implemented by a multi-tasker (like DesqView or Windows/386) because without it, DMA anywhere in DOS memory will break.
However, a memory manager only tends to really care about VDS in the first 1MB + 64KB region; typical device drivers are usable by real-mode programs and therefore keep all their DMA buffers in low memory.
The VDS specification does not say who is responsible for providing the VDS services, although the likely answer is apparent: Whoever controls the page tables–because whoever controls the page tables knows the linear:physical mappings.
In a DPMI environment, that should be the DPMI host. That is the case with Windows 3.1, but not with e.g. the QEMM DPMI implementation or with the DPMI hosts built into many or most DOS extenders.
In VCPI environments, the lines get very blurry because the VCPI host (usually a memory manager like EMM386 or QEMM) shares responsibilities with the VCPI client (DOS extender). Things get very confused because the memory manager/VCPI host implements VDS, but does nothing to take VCPI clients into account. That leads to VDS calls succeeding yet delivering incorrect data.
Is There a Way Out?
So what was that DOS/4G FAQ talking about? What are the ways of performing the linear to physical address mapping? Obviously if one had access to the page tables, it would be trivial to map linear to physical addresses. But how to actually get there?
As it turns out, the solution for DOS/4G(W) is deceptively simple. At least when not running under some other DPMI host, one can read the CR3 register–which includes the physical address of page tables–and feed the physical address to the DPMI service to map physical memory. That way, the page tables become accessible and looking up the physical addresses is not difficult.
Under the CauseWay DOS extender, it’s both more complicated and simpler. CauseWay runs application code in ring 3, so reading CR3 is harder, and apparently CauseWay also refuses to map the physical addresses of page tables. On the other hand, CauseWay always keeps an alias of the page tables at linear address 1024*4096*1023 (i.e. FFC00000h), which means the page tables are already there and can be accessed without any further action.
With a bit of legwork, and when running on a known/supported DOS extender, it is possible to do the job VDS ought to be doing.
Does the implementation of VDS done this way handle the dirty bit? DMA transfers don’t set the dirty bit. Windows goes with an aggressive solution of setting the bit for regions allocated by VDMAD whether any writes will occur there or not. It is unlikely to happen since most of the VCPI and DPMI systems have limited virtual memory functionality but it would be unfortunate to have virtual memory system discard the disks reads passed through DMA and then reload the old version of those pages from disk.
Good question. The VDS call is supposed to lock the memory, so it can’t be swapped out. Of course if software calls VDS S/G lock on a buffer, does DMA read, calls VDS S/G unlock, and then reads the buffer, interesting things might happen. I’d think it’s up to the VDS caller (not VDS itself) to make sure the pages are marked as dirty if the caller relies on that. The VDS spec has nothing to say about this.