A few days age I came across an article about the 8237 DMA controller in an old German computing magazine (DOS Extra, issue 1 ’87/88, page 123, Schnelle Speicherverwaltung mit dem DMA-Controller, or Fast memory management with the DMA controller). While skimming through the article, I began to suspect that the although the author did a good job reading the 8237 datasheet, he had only a rather vague idea of how the controller was actually wired up in the IBM PC.
On closer reading of the article, my suspicion was confirmed. While there is some PC-centric information in the article (which I/O ports the 8237 is mapped at, or the fact that the DMA controller is used for memory refresh in the IBM PC), absolutely crucial IBM-specific information is missing.
Sometimes the article contradicts itself slightly, or at least does not address the full implications: It claims that DMA transfers allow the CPU to do other things in the meantime, but elsewhere also points out that DMA transfers block the CPU. It is, of course, both true—the CPU can do other things while DMA transfers are running, but when active DMA transfers own the bus, the CPU can’t access any data or read instructions, which is especially an issue in the original cache-less designs. The faster the DMA runs, the less time there’s left for the CPU to access the bus.
But now to the gaping holes in the article. It correctly notes that the 8237 can address and transfer up to 64KB. It does not appear to have occurred to the article’s author that since the PC’s address space is 1MB (and 16MB in the PC/AT, but that is not truly considered in the article), there’s a bit of a problem there. And thus the article says absolutely nothing about the PC’s DMA page registers, which determine the 64KB region of memory a given DMA channel will address. Naturally there’s also no mention of the fact that in the IBM PC and many (but not all) clones, the DMA page registers contain strictly bits 19:16 (on PC and XT; bits 23:16 on the AT) of the physical address and the 8237 supplies bits 15:0. Which has the rather annoying implication that DMA transfers can’t cross a 64K physical address boundary (in some chipsets, the page register and DMA address are added and the limitation does not exist; in the original IBMs they’re only logically ORed).
To be fair, the IBM PC Technical Reference does not do a very good job of explaining the DMA page registers either, but it does mention them, the BIOS listings provide rather solid clues as to how the page registers really work, and the board schematics also clearly show how the DMA page registers are wired. (The DOS Extra article does not provide any references so it’s anyone’s guess what it’s based on besides the 8237 datasheet.) The IBM PC/AT Technical Reference is much more informative on the subject.
The other giant hole in the article has to do with memory-to-memory DMA transfers. The article correctly mentions that the 8237 supports memory-to-memory transfers using DMA channels 0 and 1 (and only those). It then says things like this: “Think for example of moving blocks of texts within a larger area of text, which could be significantly accelerated through the 8237.” Sounds cool, right? Not so fast.
Elsewhere the article also mentions that the IBM PC uses the DMA controller for memory refresh. What is omits to mention is that DMA channel 0 is the one used for memory refresh in the IBM PC, which makes memory-to-memory DMA quite problematic, given that channel 0 would be also required for memory-to-memory transfers. You can do memory-to-memory transfers, or you can keep the DRAM refreshed, but not both.
It’s actually even worse than that. Looking at the IBM PC system board schematics (page D-5 in the original August ’81 Tech Ref), it is apparent that DMA channels 0 and 1 effectively share the same DMA page register; the way the page registers are wired is rather non-obvious but it is explained here for the PC/AT. In the original IBM PC, DACK 2 and 3 pins of the 8237 were connected to pins RB and RA of an LS670 latch, respectively. Because the PC uses active-low DACK signals, DMA channel 2 uses the page register accessible on I/O address 81h, DMA channel 3 uses port 82h, but DMA channel 1 uses port 83h—when neither DACK2 nor DACK3 is active, the RA and RB pins on the LS670 are in the high state, selecting register 3. To put it differently, if either DACK0 or DACK1 or no DACK signal is active, register 3 in the LS670 is selected; memory-to-memory transfers would be constrained to a single 64K window, severely limiting their usefulness (memory-to-memory transfers don’t use DACK signals at all, so it would not be easy to select the right page register anyway).
Another issue is that memory-to-memory transfers would have to program the DMA controller for block mode (rather than the typically used single or demand modes). It is questionable how much the CPU could actually do while a block DMA transfer was running and hogging the bus bandwidth.
As an aside, using the DMA controller for memory refresh has an implication for IBM PC initialization. Although the PC has no complicated memory controller and no memory timings to set, immediately after power-up RAM can’t be used because it’s not being refreshed yet. During POST, the BIOS sets up the DMA controller (and timer, which is also involved) to perform the refresh function, but until then, there’s no RAM, and therefore notably also no stack.
The DOS Extra article nicely demonstrates why reading component datasheets is not sufficient for understanding and programming of the IBM PC and compatibles. Numerous ICs are wired in such a way that some of their theoretical capabilities cannot be used. And some of the board wiring is practically “secret sauce” (not truly secret back when the schematics were published) which may be surprisingly difficult to find adequately documented, precisely because the information isn’t covered by any datasheet.
The biggest reason why the PC had a DMA controller may well have been the floppy controller. The FDC does not produce or consume data at any kind of prodigious rate, but it is very sensitive to latency. The CPU should be theoretically able to service the FDC in PIO mode, but it would not be able to do so with interrupts enabled, and that would be a problem especially when transferring several sectors of data at once. The DMA controller can react with low latency, avoiding underruns or overruns.
P.S.: The IBM PC/AT does not use DMA channel 0 for memory refresh. It is probably still impossible and certainly highly impractical to do memory-to-memory DMA. One practical problem is the inability to use separate page registers for the source and destination since memory-to-memory transfers don’t use DACK. A much worse problem is likely the fact that the 8237 would have to hold the bus during the entire transfer, entirely blocking the CPU, and possibly interfering with DRAM refresh. Finally, memory-to-memory DMA is almost guaranteed to be slower than CPU transfers on a PC/AT, since it is limited to fairly slow 8-bit transfers, while the CPU can use faster 16-bit cycles.
Late Update: By happenstance, I came across DOS Extra Nr. 7 1989, which rehashed the hardware article from the first ’87/88 issue. On page 37, an article titled simply Der DMA-Controller 8237 (The 8237 DMA Controller) presents much of the same information that the older article did, but with notable improvements (and some new problems).
The new article clearly states that memory-to-memory DMA is not possible on the IBM PC because DMA channel 0 is used for DRAM refresh. It also states, without explaining why, that on the AT, memory-to-memory DMA is still not practical even though the 8237 is no longer used for memory refresh (“Leder ist beim AT keine sinvolle Speicher-zu-Speicher-Übertragung programmierbar […]”). The article also notes that DMA transfers lock out the CPU, which is a problem with longer block transfers.
At times the new article is incomplete, such as when it claims that the DMA controller is “up to ten times faster” than the CPU. That may have been true for the original PC, but certainly isn’t true for any PC/AT or newer system.
At times the new article is misleading, such as when it says that the DMA controller can process up to four transfers concurrently (“Weil der 8237 vier getrennte Kanäle besitzt, ist er zudem in der Lage, vier solche Datenübertragungen gleichzeitig durchzuführen.”). That gives a distinct impression that four data streams could be actually transferred at the same time, but of course only one DMA channel can own the bus at any one time. Four DMA transfers can all be “running”, but they still have to transfer data one at a time.
Unlike the original 1987 article, the 1989 update explains the DMA page registers (although it omits their port addresses), and it also briefly describes the cascaded DMA controllers in the PC/AT (without presenting the not-so-obvious details of 16-bit DMA transfers). Overall a clear improvement over the original article.