Learn Something Old Every Day, Part VI: Backward Buffer Overwrite

A few days ago I spent far too much time debugging a largish piece of 16-bit Windows code written in assembler. I found a scenario where (fortunately fairly reproducibly) Windows crashed because the internal state of a DLL got corrupted.

I could see that the state made no sense and a supposedly small rectangle was suddenly tens of thousands of pixels large, causing segment overruns when it was copied. The internal state of the DLL was corrupted, and it very much looked like a buffer overflow.

I added size checks to make sure nearby buffers weren’t being overwritten, but the checks never fired. Or rather they only fired when the state was already corrupted.

Then I tried reshuffling the data so that the buffer which I suspected of overflowing was at the very end of the data segment, hoping that it would cause a protection fault when the buffer overrun happened. But the fault never happened, and the state was still getting corrupted.

And then it finally hit me. The DLL’s internal state was getting corrupted not because a data copy ran past the end of a buffer, but because it was copying data in the wrong direction. In some situations, a REP MOVSB instruction could be executed with the correct source and destination address and the correct count, but with the CPU direction flag set (rather of clear), causing the copy to decrement addresses instead of incrementing.

This is a situation programmers nowadays just don’t think of. Backward copying is generally avoided because CPUs are not good at it and it’s therefore slow. In a language like C, buffer overflows can happen, but only running past the end of a buffer, not overwriting data located in memory before the start of a buffer (it’s possible to decrement an array index so that it points below the first element, but that is rare). The cause of an overflow is invariably a size problem, with the amount of data copied being larger than the size of the destination buffer. That’s what I was looking for, but the bug was something different.

ABIs generally assume that the CPU direction flag must be clear, and if some code sets it, then it must clear the flag again before returning to the caller or before calling other functions. String move instructions thus always work in the “up” direction unless someone explicitly sets the direction flag (which implies assembler code), and such buffer “underflows” normally can’t happen. But the CPU is still capable of executing them and with old and sketchy code, it is something to be aware of.

This entry was posted in Bugs, Development, Windows. Bookmark the permalink.

24 Responses to Learn Something Old Every Day, Part VI: Backward Buffer Overwrite

  1. zeurkous says:


    One could argue, though, that backwards operations are as natural as
    forwards ones. It’s just that wetware doesn’t generally think that way
    (and then some wetware smells “optimization!” and makes backwards
    operations needlessly more expensive…).

  2. Richard Cranium says:

    In 6502 assembly, it’s extremely common to run loops “backwards”, since it’s more efficient. For example, a simple (only supporting copies of up to a page), forward memcpy() would look like this in 6502 (apologies if this is wrong, my 6502 is rusty) :

    LDX #$00
    CPX Length
    BEQ Exit
    LDA Source,X
    STA Dest,X
    JMP Loop

    But by running it backwards (I’m almost certain this is wrong in some way, don’t come crying to me if your C64 locks up running this :D) :

    LDX Length
    BEQ Exit
    LDA Source,X
    STA Dest,X
    JMP Loop

    You can save a few bytes and a few processor cycles because the branch instructions run off the zero flag, which is cleared by a lot of instructions – so it behaves like there’s an implicit CPX #$00 before the BEQ. I’ve done similar things to this occasionally on x86, in extremely constrained environments where every byte counts (e.g. boot sectors, binary patches, etc).

  3. Michal Necasek says:

    Prefetching, read-ahead, etc. has to “guess” the direction it’s going to use. You can either work with it, or against it. One of those will be a lot faster. That does not make the “wrong direction” needlessly slower, it just makes the “right direction” a lot faster.

  4. Michal Necasek says:

    That makes sense. I guess on x86 where CX is used as a separate counter there’s no inherent advantage in going in one direction or another.

  5. John Elliott says:

    Having the direction of copying controlled by a separate flag (rather than making the direction part of the opcode as, for example, the Z80 does) feels like forcing the programmer / compiler to use global state to solve a local problem.

  6. Michal Necasek says:

    Yes, in retrospect, having separate instructions or prefixes for backward/forward direction would have been soooo much cleaner. The direction flag is pretty much an accident waiting to happen.

  7. Richard Wells says:

    There are a lot of instructions that use the direction flag. Adding a second version for wrong way usage would consume a lot of opcodes. Having two otherwise identical long routines for copying or comparison except every instruction indicates direction would make code that much more complex.

  8. zeurkous says:

    This all would seem to be a downside of automagic indexing.

    Even so, me’d argue that wasting a bit of opcode space for directional
    generality is not that great an issue these days. (Unless the design is
    wasteful in other ways, of course.)

    This should make it obvious that there *are* limits to the scalability
    of an architecture: cramp the opcode space to sacrifice orthogonality
    (and thereby shove complexity towards the software end of things), or
    go for a clean, orthogonal (one might even say “agnostic”) instruction
    set that takes up, say, an “extra” nibble per instruction?

    Me can’t think of a universal answer. The specific answer is that in the
    past, we didn’t have quite as much of a choice in the matter, as we
    simply couldn’t make dense enough circuits yet.

    In that light, having an external direction flag could indeed very well
    make sense, but only for limited (to our current standards) designs.

  9. zeurkous says:

    Either way, essentially fixing the direction in hardware (“go forwards,
    or suffer)” is not necessarily justified.

  10. Tux2000 says:

    A similar problem existed in Windows 95 and NT 4.0 when PeekMessage32() was invoked with the direction flag set, causing General Protection Faults. This was documented in a short article in the German c’t magazine, issue 9/1997, page 272. Unfortunately, this article is paywalled (https://www.heise.de/select/ct/archiv/1997/9/seite-272).

  11. Michal Necasek says:

    Interesting. Strictly speaking, I don’t think that is a bug in Windows. It is a bug in an application that violates Windows calling conventions.

  12. GromBeestje says:

    A violation of Windows calling conventions?
    This makes me wonder about differences on Windows versions, as Windows 3.x was co-operative multi-tasking, while NT/9x are pre-emtive multi-tasking.
    When running on a co-operative multi-tasking OS, the application gives control back to the OS, and has to make sure the flag is set to the correct direction prior to doing so.

    What happens when such application runs on Windows 9x or NT. I know NT runs 16 bit application in a Virtual Machine (NTVDM), but I am not sure what happens on 9x.

  13. Michal Necasek says:

    It’s not that simple. Even in Windows 3.1, where applications are not really separated from each other in a meaningful way, they don’t just pass control back and forth. Windows still handles the task switching, and it is entirely plausible that some central piece of code always clears the direction flag before handing control over to the next task. The problem I debugged was related to hardware interrupts, which is a somewhat different case. Though I did not expect that the direction flag would just be left as it happened to be at the moment the interrupt occurred.

    On NT, processes are fully separated and the direction flag is strictly local to a process. Win9x I’m not entirely sure about, but I suspect 32-bit and 16-bit applications might behave differently. Win95 is in some ways extremely complex because it’s a weird mish-mash of everything.

  14. Victor Khimenko says:

    Note that it’s not actually true that copying in forward directions is more efficient and thus backward copying is not used.

    There was huge uproar when memcpy in glibc on Linux started copying bytes going from the end to the beginning: https://lwn.net/Articles/416821/

    This broke Flash, but, well… it’s wrong to say that “backward copying is generally avoided because CPUs are not good at it and it’s therefore slow” when in reality it’s employed to make things faster.

  15. Michal Necasek says:

    There are certainly cases where forward copying is noticeably faster: https://stackoverflow.com/questions/57137074/movsd-performance-depends-on-arguments

    If you end up copying entire cache lines then yes, it really shouldn’t matter which way the copying goes.

  16. zeurkous says:

    Me supposes that a generic copy routine would detect the correct
    direction to go in.

    But yes, breaking the specification of an established routine such as
    memcpy(3) just leads to a world of trouble.

  17. Michal Necasek says:

    That glibc thing is an interesting philosphical discussion. The fundamentalists will always say that software not written according to the spec deserves to broken, no matter how much trouble it might cause. More practically oriented people will argue that breaking things that used to work is not progress and should be avoided.

  18. zeurkous says:

    This fundamentalist believes that when an interface specification has
    turned out to be sufficiently unclear, that interface should be
    deprecated (not necessarily removed!) in favour of an (likely new)
    unambigious interface.

    But that’s me.

  19. Victor Khimenko says:

    You are missing one important fact: not only memcpy was never supposed to be used with objects that overlap, C also included memmove, evil cousin which is supposed to be used when they do, in fact, overlap! And it was there since the very beginning!

    Thus it’s not problem of “sufficiently unclear” specification, but rather an attempt to use memcpy in a way which is very clearly unsupported!

    This being said the final solution was kinda sane: since GLibC supported versioning since version 2.0 they simply made memcpy synonym for the memmove when used with old programs (but kept new implementation for new, recompiled, programs).

    That way old Flash binaries were working again but new programs should be written sensibly to work.

  20. zeurkous says:

    C doesn’t include any routine. You must be confusing it with the
    standard library (which, despite often being called ‘libc’, is not

    “Unsupported”. Me’ll offer another one: “must be used instead”. Nice ‘n
    vague, don’t you think? Manual pages that hint at “wrong” use, but don’t
    just specify what the damn routine actually does (especially in case of
    simple operations), should be rewritten. In the same vein, me’d very
    much argue that memmove(3) itself has an ambigious and somewhat
    unexpected name: one might reasonably expect data moves to be
    necessarily destructive to the original (while w/ memmove(3), that is
    only the case for the overlapping area, if any).

    (Tune in another time for me latest developing pet peeve, “transfer” for

    Versioned symbols embedded in versioned monolithic libraries (as opposed
    to, say, a ‘file per object’ scheme) are a special hell that the
    inventor should forever be locked up in.

    BTW, the OpenBSD memmove(3) manual page says that it appeared in
    4.3BSD-Reno. Do you have contrary information…? (Or just a liberal
    idea of “the very beginning” :?)

  21. Michal Necasek says:

    From the C89 standard: “The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.”

    Also from the C89 standard: “The memmove function copies n characters from the object pointed to by s2 into the object pointed to by s1. Copying takes place as if the n characters from the object pointed to by s2 are first copied into a temporary array of n characters that does not overlap the objects pointed to by s1 and s2, and then the n characters from the temporary array are copied into the object pointed to by s1.”

    Once again, some people understand undefined behavior as “must be broken at any cost” and others as “if it works that way, leave it be”. As I said, it is a philosophical question.

  22. zeurkous says:

    Indeed, the C… `standards’ specify many things not really part of C
    (and often at odds with the very premise of C; this started at least as
    early as [0]).

    But your point is well-taken 🙂

    Leaves me to note that even defined behaviour sometimes changes,
    especially when there’s no observable change to the caller. OpenBSD
    memmove(3), while in itself somewhat convoluted, certainly does *not*
    use an intermediate array; instead, it follows the approach of
    determining the non-destructive direction to copy in.

    [0] https://www.lysator.liu.se/c/dmr-on-noalias.html

  23. Michal Necasek says:

    The “as if” language is not meant to suggest how things should be implemented, it’s just a shorthand for defining the expected results.

  24. zeurkous says:

    You’re right — me plainly read over that.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.