More Fun with ISA DMA

A reader comment on a previous post on ISA DMA pointed out that UMBPCI (or rather the DMACHK utility distributed with it) does something unusual with regard to ISA DMA. There was a suspicion of somehow accomplishing the mythical memory-to-memory DMA transfers; that proved to be unfounded, at least in the UMBPCI case, but what the utility does is nevertheless quite interesting.

Dual NEC D8237AC-% DMA controllers in a PC/AT clone board.
Classic cascaded 8237A DMA controller chips in a PC/AT clone

First some background about what DMACHK does and why it exists in the first place. UMBs are generally prone to causing difficulty with DMA, and UMBPCI is no exception. The way UMBPCI works is that it enables memory between 640KB to 1MB for use with UMBs. Such memory is normally only intended for ROM shadowing and in some chipsets, it is not accessible via DMA (whereas EMM386/QEMM/386MAX use paging to remap normal memory into the UMB range, causing physical addresses to differ from linear ones).

Because DMA may be necessary to effectively use UMBs (booting from floppy or a bus-mastering hard disk controller), it is rather useful to know whether DMA works in a given UMB range. But how does one test whether ISA DMA can access UMBs without requiring a DMA-capable device? The method DMACHK uses is quite ingenious.

To check a given UMB range for DMA accessibility, DMACHK performs roughly the following sequence of steps:

  • Fill a 1KB buffer in the tested memory region with a 55AAh pattern
  • Unmask DMA channel 1 and program it to target 1KB buffer
  • Set DMA channel 1 for writing in block transfer mode
  • Set the 8237’s DMA Request register bit for channel 1
  • Read the DMA Status register in a loop which terminates when either the T/C bit for DMA channel 1 becomes set or a given timeout period elapses
  • Check whether the 1KB range is still filled with the 55AAh pattern
  • If it is not, DMA is working

How does this work and why? The block transfer mode is not normally used at all because it locks out any other bus accesses. However, it has the interesting property that it can be triggered by software (through the 8237’s Request register), which is probably related to the fact that block transfers are the only type that can be used for memory-to-memory DMA.

Now, in a typical PC, chances are there’s no device using DMA channel 1, or perhaps there might be a sound card using it. But DMACHK does not program any hardware beyond the 8237 DMA controller.

The catch is that if a block write DMA transfer is triggered by software, the 8237 will drive the bus, and something will get written to the memory. If no device provides data on the bus, most likely a sequence of 0FFh bytes will be written. It might be something else, but if DMA can target the memory, the memory will be written to.

That is exactly what DMACHK takes advantage of. It does not care what gets written to the memory, it only cares whether the 55AAh pattern changed. Now, since DMA channel 1 is 8-bit, the memory might theoretically get written with 55h bytes or with 0AAh bytes, but it is very difficult to imagine how it would be written with a sequence of alternating 0AAh and 55h bytes. Therefore if the 55AAh pattern remains undisturbed, it is safe to conclude that DMA is not working, but if the 55AAh patter is not intact, it must mean DMA works (or the RAM is bad, but then DMA is the least of one’s problems).

It has been experimentally confirmed that on at least one system (Alaris Cougar board, OPTi 82C499 chipset, 486 CPU), this kind of DMA transfer fills the buffer memory with 0FFh bytes. That is precisely the expected result on a typical system where reads from an “empty” data bus return with all bits set.

Not So Fast!

As mentioned in the previous post, a characteristic of 8237 DMA transfers is that when the DMA controller is accessing the bus, the CPU is locked out, together with any other possible bus agents. When DMA transfers are running, the CPU is slowed down because it has to share bandwidth with DMA transfers, or wait for their completion in the case of block DMA transfers.

This “bug” can be turned into a feature if the goal is to slow down the CPU. It was done at least twice with DMASLO (1988) and DMAKO (1992). Both utilities use the same principle described above: When DMA is running, the CPU isn’t, especially on older machines with no cache or less sophisticated cache controllers.

What the utilities do in detail is quite different, suggesting that they were developed entirely independently.

The DMASLO utility runs on XTs as well as ATs and was developed to allow old games to work on newer PCs, especially games with joystick support. These were often CPU speed dependent and failed on faster PCs. DMASLO can be tuned to slow down newer systems (“newer” as of the late 1980s) to perform like old XTs.

DMASLO can use any of the four channels on the first (and possibly only) DMA controller, with the obvious caveat that reprogramming DMA channel 0 on a PC/XT is likely fatal. DMASLO programs the channel for a verify transfer which is interesting because it goes through all the motions of a normal DMA transfer—xcept actually transferring any data. DMASLO also uses block transfers which entirely lock out any other bus access, and the length of the transfers can be set by the user. By default it’s one, and of course the DMA channel is set to auto-init. When the DMA transfer is (automatically) restarted, the CPU has a little bit of time to do something. Thanks to this, DMASLO is a very powerful handbrake and can slow down a system to a crawl.

DMASLO uses an interesting method of triggering DMA transfers. It changes the DREQ polarity in the 8237, which means that a previously inactive DREQ signal will instantly turn into an active one.

DMAKO (as opposed to DMAOK, one would think) is hardcoded to use DMA channel 3, 7, or both. It has no tuning parameters and simply sets up the required DMA channel(s) for single mode read transfers. That means there will be a read from memory, and almost certainly no device on the other end that would care about it.

DMAKO offers little control and does not slow down the system nearly as much as DMASLO can, but it does make a clear difference on the test system (the same Alaris Cougar 486 board mentioned above) and noticeably reduces the measured CPU speed.

However, the DMASLO utility is only intended to (drastically) slow down games and the author warns that it likely interferes with the PC’s operation. The DMAKO utility is designed to slow things down but not break anything; it uses DMA channel 3 (or 7) because it has the lowest priority, meaning that floppy access (using DMA channel 2, higher priority than 3) still works.

This entry was posted in PC architecture, PC hardware, Software Hacks. Bookmark the permalink.

8 Responses to More Fun with ISA DMA

  1. Sean McDonough says:

    The name “DMAKO” isn’t necessarily a play on “DMA OK”; it could stand for “DMA Knockout”, in that it slows down the computer, like how knocking someone out in a fight tends to slow them down at least momentarily.

  2. Michal Necasek says:

    Yes. The source code has two sets of routines, dmako() to slow things down and dmaok() to put things back to normal.

  3. OBattler says:

    I did just find one PC-compatible chipset that does support memory to memory transfter: OPTi 82c895 + 82c206. The MR 2.02 BIOS (available on Vogons in the 486 BIOS thread) actually uses that to copy over the CMOS Setup data.

  4. Michal Necasek says:

    Memory to memory DMA should actually be usable on more or less all PC/AT compatibles. It just has nasty restrictions so bad that it’s almost always worse than the alternatives.

    What exactly does MR BIOS do? Where does it copy the CMOS setup data from and to? Just between two system memory locations?

  5. OBattler says:

    Yes, it copies it between two system memory locations, and then jumps to the target location to run the code.

  6. Nils S. says:

    Have a look at the 186 DMA controller. Really simple and everything fits into it’s overall design, no hacks…

    The last weeks I am playing with a Am186 running at 40MHz. I will soon try some mem2mem DMA and measure time, as this is on my todo list anyway.

    How would I test what the CPU actually could do while DMAing?

    I really like AMD doc #21069 – Am186ES/EM User Manual for it’s clearness and overall style, but the Intel manual 80c186 (Order Nr 270788-001) is a beauty too…

    Too bad that it differs from my chip 😀

    The basic stuff is the same, but it is not totally compatible. I already noticed a few things. Thank god the UM, Datasheet and Monitor Sources have somehow survived (at least here – I think I should upload everything to or something like that)

  7. Nils S. says:

    Aaah, reading a bit further clarifies a few things and throws up more questions…

    The DMA of the Am186ES can work with the on chip serial ports.
    It can do mem2mem, io2io and io2mem, so no matter where PCB (peripherial control block) is mapped.
    Put into DMA destination register either RX or TX register, no addr increment.
    Put somewhere into source, configure increment/decrement and TC and etc. the way you like and go.

    The UARTs can issue RX and TX interrupts. For example sending (long) string could be done like this:
    configure UART (baudrate, TX interrupts, …)
    place string in RAM
    configure DMA

    Now it sends for example a long string at 9600 baud. This takes a while byte by byte. There should be plenty of time in between for the CPU to have the bus. It saves a few bytes of program code in tx ISR I assume.

  8. Michal Necasek says:

    Uploading docs to is a great idea 🙂 These things have a nasty tendency to vanish.

    As far as I can tell, the 80186’s biggest problem is that it wasn’t PC compatible (that is, the embedded chips weren’t). It came out at exactly the wrong time, basically. That said I must have at least a dozen of 186/188 variants embedded on various SCSI adapters, network controllers, hard disks, and similar gear.

    DMA could be really nice for faster serial speeds as it should be much more resistant to drop-outs than interrupts, even with a slow CPU. The same reason why the one device requiring DMA in the original PC was the floppy; even though the FDC was capable of interrupt-driven operation, it just wasn’t really practical because the CPU couldn’t keep up. DMA took care of it and worked quite reliably.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.