The other day I was trying to fill a couple of gaps in my understanding of the Intel 8237A DMA controller documentation. I wrote a small testcase that performed a dummy transfer and modified the base address and count registers in various ways, and then examined what happens to the current address and count registers.
I ended up with printing out the current DMA address and count at the beginning and end of the test. I noticed that the current address changed between test runs, which was quite unexpected. No one else should have been using the DMA channel and the current address can’t just randomly change.
The change itself wasn’t random at all: The current address was being set to the base address. That happens when the base address register is written, but I was pretty sure no one was doing that.
After much head scratching, I realized that my own code was triggering the change. I had some trivial code in place to save and restore the channel’s DMA page register, and it was restoring the page register that caused the current base address to change after the last state printout. That was definitely not expected to happen. So why was it happening?
The board I used for testing (Alaris Cougar) has an OPTi chipset, 82C499. The chipset has the DMA controller etc. all integrated, and includes the equivalent of the OPTi 82C206 integrated peripherals controller.
On a true blue IBM PC/AT, the 8237A DMA controller is physically separate from the DMA page registers. It is difficult to imagine how writing a DMA page register could affect the DMA controller state in any way on a genuine PC/AT. But in the OPTi 82C499 chipset, that appeared to be the case.
How It Really Works
The IBM PC (and PC/XT, and PC/AT) has an extremely annoying limitation caused precisely by the fact that the 8237A DMA controller and the page registers are separate devices. In the PC/AT, for 8-bit DMA channels, the page register simply drives the top eight bits of a 24-bit address and the 8237A drives the bottom 16 bits. As a consequence, DMA transfers cannot cross a 64K boundary. The DMA page register effectively selects an aligned 64K window within which the DMA controller can operate.
What I observed was arguably better. There appeared to be a 24-bit current address register that was initialized from the 8237A-compatible base address registers and from the corresponding DMA page register. This register had no trouble crossing a 64K boundary, because it worked as a single 24-bit register rather than separate 8 + 16 bit registers.
The scheme I observed should be backward compatible with the IBM implementation and any DMA buffers not crossing a 64K boundary will work just fine. The inverse is of course not true.
It should be apparent that for the scheme to work, it must load the internal current address register whenever the DMA page register is written, because it cannot assume that the page register is always written before the base address registers. And that’s precisely where it does not behave like the IBM implementation, because writing the DMA page register may change the current address visible through the 8237A registers, something that can’t happen on a PC/AT.
It would appear that I completely by accident stumbled on a way to detect this kind of DMA controller behavior, a way that does not require attempting to execute a DMA transfer that crosses a 64K boundary. But it’s only marginally useful, because it still requires some DMA transfer.
Why does it need that? Because the 8237A design is… ancient. It’s from a time when chip designers thought write-only registers were cool. That probably made sense in the days when a single person controlled the entire software stack running on the machine (why do you need to read the register value what you had written yourself?), but that was maybe true in the 1970s. In any general-purpose environment, it is essential to have the ability to save and restore hardware state because you can never be sure who assumes what.
Anyway, the 8237A, being an ancient design, has two sets of address and count registers, called “base” and “current”. The base registers are write only and do not change, but writing them also sets the current registers, which do change, and their contents can be read. What that means is that after writing the base address or count register, its “current” counterpart will always show the same value (except for one exception noted bel0w).
In order to see anything interesting, it is necessary to execute even the tiniest dummy DMA transfer, so that the current registers no longer match the base ones. Only then might writing the page register have any discernible effect, if that in fact copies the DMA base address to the current address.
Bug or Feature?
The 82C499 datasheet says nothing about the DMA behavior at all and implicitly refers to the 82C206 datasheet. Said datasheet makes no mention of the 64K boundary limitation, but it certainly gives the impression that when 8-bit DMA transfers are in progress, the top eight bits of the 24-bit address are determined solely by the contents of the DMA page registers.
So… is the datasheet inaccurate, or am I missing something obvious?
Have I Seen This Before?
This whole thing gave me a strong sense of deja vu. I already came across this a long time ago, unfortunately so long ago that I don’t remember the exact details. It must have been around 1995 and I was either writing custom PC floppy controller code to read 3.5″ floppies written by the Commodore 1581 drive, or attempting Sound Blaster programming. Either way, the code used DMA, and I found that my code worked on one machine but not another. Eventually I realized that I wasn’t making sure that the DMA buffer couldn’t cross a 64K boundary.
And then it finally hit me. Somehow in my recently used test environment I had EMM386 installed, and I had forgotten about it. It was EMM386 doing that!
Once I removed EMM386, I observed that with the OPTi 82C499 chipset, DMA indeed cannot cross a 64K boundary and wraps around to the start of the 64K block instead. I verified that it’s the case on at least the OPTi 82C499 and Intel 440BX + PIIX4E chipsets.
So why does EMM386 do this? Any properly written code needs to watch out for crossing a 64K boundary with DMA, so why is EMM386 trying to “fix” broken code like that?
Actually, my best guess right now is that EMM386 may not be doing this quite intentionally. Because EMM386 may remap memory pages in interesting ways, such that the “physical” addresses software is working with aren’t necessarily contiguous or generally where software thinks they are, EMM386 must contain logic to transparently emulate DMA transfers that would cross a physical 64K boundary even when software is not aware of that.
I suspect the remapping logic is simply kicking in even when it perhaps shouldn’t, and creates an interesting incompatibility for “broken” software.
I did not attempt to do any kind of comprehensive comparison but I noticed other differences between EMM386 and bare 8237A hardware. For example, if only one byte (either low or high) of the base address or count register is written, EMM386 behaves as if the entire corresponding 16-bit current address or count register were also updated.
On the other hand, the OPTi 82C499/82C206 (which I assume is a very close facsimile of the original Intel 8237A) only updates the low or high byte of the current register, not both. In a normal situation where both low and high byte are written, there will of course be no difference. But in edge cases there will be, and the difference is visible to software. This is in fact the one case where even after a (partial) write to a base register, its current counterpart may not hold the same value.
I clearly need to pay more attention to little details. Here’s what my testcase printed when DMA was behaving oddly:
And here’s what it showed after I removed EMM386:
My own code was telling me right there—“Machine in V86 mode.”
More generally, when the machine is in V86 mode (EMM386 or another memory manager, running inside a DOS box, whatever), chances are pretty high that the DMA controller functionality is at least partially emulated, and when it’s emulated, it may be not behaving like real hardware—for both good and less good reasons. Be careful not to draw too many conclusions from the behavior of emulated DMA.
Outside of the ibmpc arch, the pc-9801 (which uses the 8237) has a mode where the page registers can increment when the dma reaches a page boundary. The fmtowns uses the nec upd71071 which is a 8237 superset with 24-bit addressing.
If the PC-9801 used plain 8237, do you know how they handled the page boundary crossing? Some kind of additional circuitry that watched the low 16 bits going from FFFFh to 0000h?
The EISA PC platforms even supported 32-bit DMA addressing, and the Intel 82374EB/82374SB also got rid of the 64K boundary restriction. But I don’t think it was ever used much, legacy hardware had to be able to deal with PCs and PC/ATs and their limitations, and newer hardware tended to use bus mastering and skip the 8237 nonsense altogether, even though the EISA DMA controller could run significantly faster than the PC/AT one.
Looking back I was a bit mistaken, it’s the later 286+ models that support that. I know the earlier models have a real 8237 and the later ones have it (all have only one) integrated but I can’t find any pics of the middle era models.