Win16 Memory Management

Posted on June 5, 2026 by Michal Necasek

This is a kind of knowledge base article which resulted from attempts to understand exactly how memory management works in 16-bit Windows. It is not exactly undocumented, but it is also not well documented; even before Windows 3.0 appeared, the assumption was that essentially all application developers were going to use a high-level language and their development tools would take care of the low-level details.

Furthermore, nearly all materials for beginning Windows developers focused on the more visible aspects of Windows programming, i.e. windows, icons, menus, and so on. Memory management was glossed over, even though it was absolutely critical to writing a solid Windows application any more complex than a Hello World program.

The memory management details and mechanisms are rooted in the 8086 real mode history of Windows 1.x and 2.x, and much of the complexity persisted even when Windows only ran in protected mode starting with Windows 3.1.

Unless noted otherwise, in this article “Windows” refers to the 16-bit line of Microsoft products, not Windows NT.

Introduction to Windows Memory Management

The key to understanding Windows memory management is that from the very beginning, Windows was among other things a fancy overlay manager. For many years, Windows was too big for typical PCs of the time and needed some way to keep only the most active memory segments in physical RAM, with some mechanism to discard and reload less frequently needed segments on demand. Paging was obviously not used because there was no support for it in 8086 and 80286 systems (and before Windows 3.0, those were very nearly the entirety of the installed base).

In the simplest case of an application with one code segment and one data segment, the movable nature of Windows segments is almost entirely transparent. When the application is running, the CS (code) segment register points to the code segment and the DS (data) and SS (stack) segment registers point to the data segment. As long as the application only uses near calls/jumps within its code segment and near pointers to the data/stack segment, it does not care at all where exactly the segments are in memory, i.e. the actual values loaded into CS/DS/SS registers. Windows can move the segments around and everything will work fine.

But even beginning Windows programmers working through a Hello World style example very quickly start suspecting that life is not so simple in the land of 16-bit Windows. The window procedure must be declared as FAR PASCAL, which is fair enough given that it needs to conform to Windows calling conventions. But it also has to be exported from the application’s executable, otherwise the program won’t work properly. That is a concept entirely unfamiliar to non-Windows developers.

To help implement its memory management scheme, Windows adopted and extended the “New Executable” (NE) format first used by “DOS 4”, better known as Multitasking DOS 4.0 and significantly different from PC DOS and MS-DOS 4.0/4.01. Unlike the DOS MZ executable format where an application is effectively a single binary blob, the NE format is segment oriented and each segment is stored on disk separately. That gives Windows the ability to load (or reload) individual segments and move them around in memory.

The NE format also supports imports and exports. Imports are used when an application needs to call external code, such as the OS itself. Exports are used for application code which is externally called.

A window procedure is one such externally called piece of code. It needs to be exported so that Windows can perform its magic on it. Said magic lets Windows fix up the window procedure prolog (entry sequence) so that it loads the application’s own data segment into the DS register.

Shifting Memory

Everything in Windows memory management revolves around segments, contiguous blocks of memory up to 64KB in size. In normal 8086 programming, each segment is identified by its segment address, which directly corresponds to its address in physical memory. Because most segments in Windows can be moved or discarded, they are instead identified by handles. A handle is a 16-bit value which should be considered opaque, even if it might actually a simple index into some table.

For programmers familiar with x86 protected mode, a Windows segment handle is a lot like a protected-mode selector: It is a 16-bit value which uniquely identifies a memory segment, but it is independent of the segment’s location in system memory. The similarity is not coincidental. Steve Wood, the designer of Windows 1.0 memory management, used the Intel 286 protected mode as inspiration¹ for the Windows memory manager (the 286 came out in 1982 and work on Windows started in 1983).

A handle refers to a memory segment regardless of where it is in memory, i.e. regardless of what its 8086 segment address is. The GlobalAlloc API allocates contiguous memory from the global heap (possibly more than 64K) and returns a segment handle.

Since the 8086 does not support protected mode, approximating protected-mode functionality takes quite a bit of extra work and discipline. Given that a handle is not a segment address, it can’t be used as the segment portion of a far 16:16 pointer. To address anything in another segment, an application needs to form a far pointer.

To that end, the application needs to call the GlobalLock API which returns a segment address and locks the segment in memory (increments its lock count). While locked, the segment won’t be moved and its segment address will stay valid.

Once it is done accessing memory in the segment, the application calls GlobalUnlock. That decrements the segment’s lock count and once the count drops to zero, the segment may be moved again.

Needless to say, after calling GlobalUnlock, the segment address returned by GlobalLock must be considered invalid. Note that this is a possible source of sneaky bugs—after calling GlobalUnlock, the segment most likely won’t move immediately. An application might erroneously access a previously locked segment after unlocking it and not cause any obvious harm.

Indeed Windows won’t move or discard a segment unless it has to, because it may well be used again. However, once segments are unlocked, Windows may move them around or discard them at any moment.

Now let’s take a closer look at the possible segment types.

Segment Flags

Windows segments have several important attributes which determine how they’re treated by the Windows memory manager.

Segments can be fixed or movable. The names are clear enough; movable segments can be shuffled around by Windows as long as they’re not locked, while fixed segments stay in place. For example, segments which hold interrupt handler routines must be fixed so that interrupt vectors stay valid. Ideally most of an application’s code and data segments would be movable, giving Windows an opportunity to efficiently manage memory. The ability to move segments is necessary because freeing or discarding segments creates “holes” in memory, potentially quickly fragmenting memory. Windows needs to be able to compact segments by moving them in order to consolidate free memory into one or more larger chunks.

Segments can also be discardable or nondiscardable. Code segments are typically discardable because they aren’t writable. If an unused code segment is removed and later needed again, Windows can easily reload it from the original executable. The same is true of resources which are also read-only. Data segments, on the other hand, tend to be non-discardable because they’re usually writable and once they’re modified, they cannot just be reloaded from disk. That said, applications might allow writable data segments to be discardable if they are willing to re-create their contents in case the segment is needed again after having been discarded.

DLLs

Dynamic linking was not yet a widespread technique in the mid-1980s and Microsoft Windows was one of the first systems with support for dynamically linked libraries (DLLs), also called shared libraries. While some larger systems used dynamic linking since the 1970s, UNIX systems only started introducing shared libraries in the mid to late 1980s.

Windows DLLs are NE format images just like Windows applications, but DLLs are not applications. DLLs cannot be executed directly, only loaded and called into by other processes (tasks in Windows parlance). The bulk of Windows was in fact implemented as DLLs (KERNEL, USER, GDI).

DLLs export routines (entry points) that are callable by applications. Applications can be linked against DLLs at link time, with imports referring to DLL names and entry points. DLLs can be also loaded entirely dynamically, and their entry points can be queried by ordinal (number) or by name.

Note that unlike UNIX systems, Windows never had a global name space for dynamic symbol resolution. Symbols from DLLs were always imported first by module name and then by name or ordinal. The two-level name space takes slightly more effort to manage but avoids name collisions, such that if two DLLs export a symbol named Alloc, there is no confusion as to which one is needed because the module name distinguishes between the two. And of course without the two-level name space, imports by ordinal (which are slightly faster and consume less memory) would have been completely impractical.

One key difference between applications and DLLs that is relevant to Windows programming is that DLLs have no stack of their own and always run with the stack of their caller. Although DLLs almost always have their own data segment, it is different from the stack segment, i.e. SS != DS.

This difference means that DLLs must be built differently from applications. The compiler must be told to generate code for DLLs, or more specifically, told that it cannot assume DS and SS registers address the same memory.

In the early days of Windows, the prolog and epilog for DLL entry points was the same as application prolog/epilog. Compiler writers eventually figured out that the prolog for applications can be simplified, because SS equals DS. But that is not the case for DLLs, and DLLs still need to use the old style “fat” prologs that the Windows module loader needs to patch up.

Secret Switches

Microsoft C supported Windows development from its earliest days, i.e. version 3.0 (earlier Microsoft C versions were rebranded third-party products; Microsoft C 3.0 was the first C compiler developed by Microsoft, initially for XENIX and DOS).

However, for many years, this support was almost secret. The Windows specific switches were completely omitted from compiler documentation, or they were listed but users were referred to the Windows SDK. That was the case up to and including Microsoft C 5.1, which documents the fact that the /Gw and /Aw switches exist, but does not explain what they do and how to use them, instead referring to the Windows SDK documentation. This perhaps neatly illustrates the somewhat incestuous relationship between the Windows development group and the Microsoft languages group.

Since Microsoft C 3.0 (1985), the compilers had the /Aw and /Gw switches (and also the /Au switch) .

The /Aw switch is a memory model modifier and specifies that SS != DS, but DS should not be reloaded at function entry (because Windows takes care of that). The /Aw switch is meant to be used when generating DLLs.

The /Gw switch generates Windows prologs and epilogs for far functions. It is required for exported functions located in both applications and DLLs, and it is very much a Windows specialty.

Windows Prologs and Epilogs

So what exactly do those Windows specific function prologs and epilogs look like? Everything is spelled out in the CMACROS.INC file shipped with the Windows SDK. Unfortunately CMACROS.INC is a jumble of MASM conditionals, nearly impossible for humans to read. It’s much easier to see what code the C compiler produces, or what exactly assembly code using CMACROS.INC turns into.

Here’s what Microsoft C 3.0 generates, as shown by a listing file the compiler produces, with added comments:

	PUBLIC	Proc
Proc	PROC FAR
	*** 000	1e 		push	ds        ; almost
	*** 001	58 		pop	ax        ; no-op
	*** 002	90 		xchg	ax,ax     ; NOP
	*** 003	45 		inc	bp        ; marker
	*** 004	55 		push	bp        ; save BP
	*** 005	8b ec 		mov	bp,sp
	*** 007	1e 		push	ds
	*** 008	8e d8 		mov	ds,ax     ; reload DS
; Line 4
	*** 00a	8b 46 06 	mov	ax,[bp+6]
	*** 00d	03 46 08 	add	ax,[bp+8]
	*** 010	83 ed 02 	sub	bp,2
	*** 013	8b e5 		mov	sp,bp
	*** 015	1f 		pop	ds
	*** 016	5d 		pop	bp    ; restore BP
	*** 017	4d 		dec	bp    ; recover value
	*** 018	cb 		ret	
Proc	ENDP

First of all, note that the prolog seemingly spends a lot of instructions on doing very little real work. It pushes DS, moves it to AX, and then moves AX to DS after saving DS. It also increments BP before pushing it on the stack, and decrements it again after popping.

All in all, seemingly a lot of effort for nothing. But that’s actually the point: The Windows prolog and epilog code is meant to be harmless when it is not needed.

If the function is in fact exported from a Windows NE module, the Windows loader will patch the first three bytes to load the module’s default data segment into AX. Here’s what it looks like in SYMDEB, taken from a random GDI function:

_TEXT:SELECTOBJECT:
5BC1:1840 B80591     MOV AX,9105
5BC1:1843 45         INC BP
5BC1:1844 55         PUSH BP
5BC1:1845 8BEC       MOV BP,SP
5BC1:1847 1E         PUSH DS
5BC1:1848 8ED8       MOV DS,AX
5BC1:184A 83EC04     SUB SP,+04

In the above case, 5BC1h is the GDI module’s _TEXT code segment, and 9105h is the default data segment of the GDI module.

The Windows memory manager keeps the prolog updated such that if the data segment moves, the exported functions that refer to it get fixed up again to point to the new address.

Note that the NODATA keyword in a Windows .DEF file tells Windows not to patch the function prolog. This is necessary in situations where e.g. an exported entry point simply jumps to another exported function, or if the function has no need to access the data segment.

Now, what about that BP incrementing and decrementing? Windows depends on being able to walk the stack, and therefore applications and libraries must keep the stack frames in a format that Windows will understand.

When the Windows memory manager moves around segments, it must know whether they are referenced in stack frames that are already pushed on the stack. For example, if Windows tries to move a code segment that directly or indirectly called into the currently executing code, it has to either detect the situation and not move the segment, or move it and adjust the stack. What Windows can not do is move the segment and leave the stack as is. The same is true for default data segments.

Non-default data segments are not a problem because they are either locked and cannot move, or are unlocked and therefore correctly written Windows applications do not keep any pointers into such segments.

Incrementing BP before pushing serves an important purpose: It tells Windows that the BP value was pushed by a far function, i.e. there will be both an offset and a segment on the stack. Obviously, for this scheme to work, stacks must be always word-aligned. Fortunately Windows ensures that they are aligned initially, and it takes some effort to misalign them (because there’s no easy way to push an odd number of bytes on the stack).

Comparison with OS/2

It is instructive to compare 16-bit Windows with 16-bit OS/2. The two systems were in many ways very close relatives. Both used the same executable format (NE) with only minor differences. Both used segment-based memory management. Both used the same development tools from Microsoft.

By virtue of using protected mode, OS/2 required less cooperation from the programmer. In protected mode, a segment selector was at the same time the equivalent of a Windows handle and a segment address. Programmers therefore did not need to bother with carefully locking and unlocking segments.

OS/2 applications also did not require any special prolog and epilog code for externally callable functions, and there was no need to explicitly export window procedures etc. from the NE module; there was also no equivalent of (and no need for) MakeProcInstance. In other words, the OS did not need to unwind application stacks, and it didn’t need to patch entry points.

Thanks to the 80286 memory management hardware, segments could be moved, discarded, and reloaded entirely behind an application’s back. There was no need for GlobalLock/GlobalUnlock, eliminating a source of programming errors.

Like Windows DLLs, OS/2 DLL entry points did need a special prolog to set the DS register to the DLL’s data segment, but on OS/2 no special support from the OS was needed. And of course OS/2 DLLs likewise had to be built with the /Aw switch or equivalent, indicating that SS != DS.

Overall, the 286 hardware did a lot of the heavy lifting, and memory management was less work (with less room for bugs) for both the OS and the programmer.

Testing

The Windows SDK provided tools designed to stress the Windows memory management. For example, errors related to incorrect segment locking/unlocking will not show up if there is no memory pressure and the mismanaged segment stays in place. Such bugs can remain hidden and in the worst case, only manifest under difficult-to-reproduce scenarios.

The SHAKER tool in the Windows 1.0 SDK was used to “shake” memory and force segments to be discarded and moved around. This was intended to stress the memory management and reveal memory management bugs which would remain dormant under typical conditions.

Another tool was HEAPWALK, primarily a diagnostic utility capable of displaying the currently allocated segments and their owners. However, HEAPWALK was also able to allocate all available memory and free it up in 1K increments, simulating low memory conditions.

Shaker and HeapWalker were still shipped with the Windows 3.0 SDK, not least because Windows 3.0 running in Real mode was minimally different from Windows 1.0 as far as memory management was concerned.

These tools were necessary because although the memory management in Windows was sophisticated, the hardware to back it was lacking (certainly before Windows 3.0 running in protected mode). Instead of letting the hardware catch errors like attempts to access unallocated memory, programmers had to use specialized tools to try and induce errors and hope that bugs will manifest in visible ways. This was not an exact science because in the 8086 architecture, every memory address was valid, and reads and writes always succeeded.

The Windows 3.1 SDK replaced the Shaker tool with Stress, a new utility which was designed to test application behavior under low-resource conditions — limited memory in various Windows internal heaps, running out of disk space, running out of file handles, etc.

Since Windows 3.1 only ran in protected mode, some of the earlier memory management issues were no longer applicable, but low-resource conditions were as relevant as ever.

Summary

16-bit Windows introduced a fairly sophisticated memory management system. Due to lack of hardware support, significant discipline was required on the part of application programmers. If the wrong compiler switches were used, or functions weren’t properly exported, or segments were not correctly locked and unlocked… all bets were off.

References

1. Peter Norton’s Windows 3.0 Power Programming Techniques, Peter Norton and Paul Yao, 1990, page 613.

This entry was posted in Development, Microsoft, PC history, Windows, x86. Bookmark the permalink.

71 Responses to Win16 Memory Management

Lars says:

June 28, 2026 at 12:50 pm

I can’t recall if I’ve seen it here, but Adrian King has in his “Inside Windows 95” the following anecdote: “During the development of the first version of Windows, signs proclaiming SS != DS were popular in many programmers’ offices. The signs were intended to be a constant reminder to the developers. They hoped the signs would lead to fewer bugs.”

I was sure it was Raymond Chen, but apparently not.
Michal Necasek says:

June 29, 2026 at 12:56 pm

Here’s the thing. Microsoft’s compilers (certainly the C compiler) have always assumed that SS == DS. Always. If they didn’t, it would be impossible to write small model programs because you could not take the address of an automatic variable and pass it to a basic function like fread() or scanf(). Yet for something so basic to work, SS must equal DS.
Victor Khimenko says:

July 1, 2026 at 11:24 am

You are mixing two entirely unrelated things. Of course Microsoft compilers (and all other compilers, too) assumed that SS is equal to DS in a small model programs. Indeed it would be entirely stupid to do anything else. And small mode (and tiny more that squeezed everything into one segment) was important for compatibility with CP/M.

However that was only one more out of six (or five? not sure if Microsoft C compiler had separate huge mode) available and it was, certainly, NOT the mode used for Windows.

https://devblogs.microsoft.com/oldnewthing/20200728-00/?p=104012
Michal Necasek says:

July 1, 2026 at 2:06 pm

Actually now I’m not sure what you are talking about. SS being equal (or not) to DS is unrelated to the memory model. By default, SS == DS in large model too. Quoting the Microsoft C 5.0 User’s Guide, page 154: “In compact-, large-, and huge-model programs, initialized global and static data are placed in the default data segment. The address of this segment is stored in the DS and SS registers.” Granted, in large model it does not make that much difference but the relationship is still there.

Or are you trying to say that the “SS != DS” sign was meant for the people writing Windows itself, not Windows applications? That would make sense.

For reference, MSC 5.0 supported five models (small, medium, compact, large, huge). MSC 6.0 added a sixth (tiny) although I am not entirely sure how much it differed from the small model in practice. The original MSC 3.0 only had three models (small, middle, large).
jakethompson1 says:

July 2, 2026 at 6:08 am

I got from the context that the sign would be for Windows developers, but the message is more broad than that, since anyone working on a DLL needs to know it, which would include plenty of application developers.

But isn’t SS!=DS at least adjacent to memory models? I’d consider it a specialization on the memory model used for kernels, drivers/TSRs, and 16-bit Windows DLLs. There is even an equivalent of this concept in today’s gcc, where kernel programming is “special,” which has -mcmodel=kernel (indicating “negative” 32-bit addresses) vs. the default -mcmodel=small (“positive” 32-bit addresses). It has to do with allowing assumptions to efficiently use sign extension to encode addresses/offsets when generating instructions.
jakethompson1 says:

July 2, 2026 at 6:32 am

Also it’s funny to think that multithreaded programming in C, had it ever made it to the 8086 and friends, would’ve also inevitably led to SS!=DS.
Michal Necasek says:

July 2, 2026 at 11:42 am

Well it did exist on the 286 with OS/2… and it didn’t have to mean SS != DS. As long as the thread stacks were all in the default data segment (possible with 2-4 threads and small stacks), the SS == DS relationship could still hold. With a larger number of threads, yes, the thread stacks had to be in different segments. The multi-threaded examples in the OS/2 1.0 programming all use SS == DS and put thread stacks in the default data segment.
Michal Necasek says:

July 2, 2026 at 11:44 am

Microsoft certainly thought SS<>DS was memory model adjacent, since it is specified through the same /Axxx switch to the compiler. That said, it is orthogonal to the memory model.

And yes, DLL writers always had to work with SS != DS.
Stefan Ring says:

July 2, 2026 at 3:51 pm

> That said, it is orthogonal to the memory model.

I’m not sure I fully agree. When pointers are only 16 bit, as in your small memory model scenario earlier, it is almost inevitable that there exists only a single data segment, which in turn makes it reasonable to load it into all segment registers.

I admit that the program could allocate additional memory elsewhere and set up its own stack there, which would break that assumption, but would make dealing with pointers ridiculously painful. A pointer would no longer be just a pointer; you would manually need to distinguish between “stack pointers” and “data pointers”. Maybe this is what you had in mind.
Michal Necasek says:

July 2, 2026 at 7:34 pm

Look at Microsoft’s compiler documentation. The concept of a default data segment exists in every memory model.

The way large data model works is that static objects larger than some threshold are placed into separate (far) data segments. But small items still go into DGROUP, to be addressable via DS/SS.

Correction: It’s a bit more complicated, so I’ll just quote the MS C 5.0 manual here:

By default, the compiler allocates all static and global data items within the default data segment in the small and medium memory models. In compact-, large-, and huge-model programs, only initialized static and global data items are assigned to the default data segment. The /Gt option causes all data items whose size is greater than or equal to number bytes to be allocated to a new data segment. When number is specified, it must follow the /Gt option immediately, with no intervening spaces. When number is omitted, the default threshold value is 256. When the /Gt option is omitted, the default threshold value is 32,767.

You can use the /Gt option only with compact-, large-, and huge-model programs, since small- and medium-model programs have only one data segment. The option is particularly useful with programs that have more than 64K of initialized static and global data in small data items.
Kevin k says:

July 5, 2026 at 1:50 am

This all reminds me why I was lucky that my professional career, which overlaps this, was luckily 32 bit programming or later. Though ranging from where 16mb was a decently powered computer to working on more modern systems. Though generally always had to consider CPU and memory requirements since couldn’t assume “unlimited” memory.

And why for personal purposes, though I did some Windows programming in the 16 bit world, I early bailed on it to 32 bit OS/2 when it became available. But fully aware of juggling different DOS memory models depending on the needs of the program.

As for the tiny memory model, it appears that it is probably the oldest of the memory models, since it is the model that supports .COM programs. But I can see where, by the time C compilers were produced for DOS, they probably targeted the .EXE as being fully supported, and providing more capability than just trying to fit everything into 64kb of address space with no benefit I can see vs the small model which could give you 3 times the memory while still using 16 bit pointers. Just split between data/code/stack.
Michal Necasek says:

July 6, 2026 at 7:43 pm

Yes, the tiny model was the oldest, basically 8080 compatible (as if there were no segment registers). It also made a lot of sense back when 64K RAM meant a beefy 8086 machine.

I believe small model was next, essentially splitting code and data but keeping all pointers near. Note that small model must have stack and data in the same segment, otherwise addresses of variables on the stack could not be passed to routines which take near data pointers.

Large model is also old; it made all pointers far, which made everything bigger and slower, but was fairly easy to implement in a compiler. Then came the medium and compact models, and I believe the last one was the huge model, with support for > 64K data objects.

Memory models and near/far pointers definitely add a lot of complexity to programming, but they did solve a real problem.
jakethompson1 says:

July 7, 2026 at 4:00 am

Refreshing on the details https://web.archive.org/web/20260508122238/http://man.cat-v.org/unix_8th/5/a.out I believe tiny and small are analogous to OMAGIC and NMAGIC a.out format, so they should’ve been intuitive to programmers of the time.

Programming Windows advises not to use large. Although it is supported, all the data segments will be fixed in memory instead of moveable. Big read-only data values (which I suggested earlier could be stashed in the CS: that uses them, but C compilers don’t necessarily make that easy) should be resources and GlobalAlloced when needed; big read-write data should use GlobalAlloc as well. The book calls out PageMaker (98 code segments and one data) implying if a program of such complexity can be medium, so can yours.
jakethompson1 says:

July 7, 2026 at 4:06 am

I’m not sure that huge is really a separate model from large, rather just large plus an agreement between the compiler/linker and runtime loader as to what happens when a single object is too big to fit into a segment. In real mode, the segment might be incremented by 1000h on spillover, while in protected mode (an important detail for Windows/286 programs obviously), perhaps by 8h. It only seems to apply to select functions like memcpy. I presume this segment incrementing logic was confined to libc so that it could the correct implementation could be substituted for real vs. protected.
Michal Necasek says:

July 7, 2026 at 3:29 pm

I think it’s fair to say that the huge model is a superset of large model. But it was a lot more than the runtime, the compiler itself could generate code to e.g. work with arrays > 64K.

I believe Microsoft’s runtime had two variables, __AHINCR and __AHSIFT, which allowed the code to do the right thing at runtime. This was particularly interesting for Family API programs which ran in real mode on DOS and in protected mode on OS/2.

Windows/286 did not run in protected mode by the way.
Josh Rodd says:

July 8, 2026 at 4:20 pm

Huge memory was a rarely-used thing on an 8086 (or 286 in real mode, etc.) anyway, since there just isn’t that much memory to go around. Most programs would manage data over 64K in a special manner anyway, including tricks so that it could swap in/out of disk, or use EMS, etc.

In 16-bit OS/2 programs, huge memory was a real convenience since you could now just comfortably have larger memory objects without having to effectively write your own paging system, but it came with a performance penalty. Windows programmers didn’t seem to do this very often.

In the case you did need to use over 64K of memory (for example, an image editor), you’d just allocate multiple 64K blocks and then figure out how to piece together the access yourself. Using 32-bit far pointers also made memory access slower.

In practical terms, DOS applications, TSRs, drivers, etc. tended to use a segment address as the equivalent of a far pointer, but that’s only a 16-bit pointer. You just load a segment register and go.

Most of the 8086’s limitations would have been much easier to deal with if segment addresses had had an 8 bit offset instead of 4 bit, so that the 8086 could have natively addressed 16MB instead of 1MB. I have a little “thought experiment” brewing with an 8086 CPU emulator that implements this and then fiddling with the BIOS and DOS so that they stop doing segment arithmetic so that this works.
Michal Necasek says:

July 8, 2026 at 5:32 pm

I’m pretty sure OS/2 1.x supported > 64K bitmaps using huge addressing.

There were in fact some almost-8086-compatible embedded CPUs that used a different segment shift, with the ability to address 16MB of memory in real mode. But now I cannot find what exactly that was.

I assume Intel didn’t do that because when the 8086 was designed, 1MB was already vastly more than a typical target system would have. And the 286 did add the ability to address 16MB… Intel just never could have imagined that people would be stuck writing real-mode code for so long.
Richard Wells says:

July 8, 2026 at 11:18 pm

The NEC V33A had the ability to access 16MB through the extended/expanded address mode. Datasheet says expanded; users guide says extended. The other exotic 186 derivative (HP Hornet) could handle 4MB but I haven’t checked the documentation for special instructions.

Huge addressing wasn’t much needed for DOS programs. Just stitch together a couple of segments and watch the offsets to keep track of where one was. Could do the same under Windows 2 but needed either a runtime or large page frame so nothing else would be inconvenienced. Having half of the conventional memory in one immovable block would make everything else really slow as lots of code would need to be discarded to condense enough memory to create a block large enough to load another segment.

Just a side note: CP/M-86 manuals contemplated the use of both the stack segment and the extra segment to address additional memory. I don’t know of any program that was designed that way. I suspect the small size of most stacks and the conversion of existing 8080 code which expected data and stack to be adjacent into multiple segments made the dedicated stack segment redundant.
Michal Necasek says:

July 9, 2026 at 9:45 am

As I mentioned before, small data model very much depended on SS == DS, because anything else would require far data pointers to be passed around. That said, having separate DS and SS segments was extremely useful, because it allowed programmers to temporarily change both DS and ES registers and still have a functioning stack for handling interrupts and such.
Alex Czarnowski says:

July 12, 2026 at 4:53 pm

Whenever I read about 16bit Windows my minds tend to fluctuate towards BigWin and other failed Windows Extenders. It would be awesome if somebody could find copies of those short-lived Windows evolution era.

Great writeup Michal – thank you – reading it was a real treat. It reminded me how much time I’ve spent disassembling parts of Windows and how poor the early Windows debuggers were. Then Periscope for Windows came finally, and later SoftICE.
Michal Necasek says:

July 13, 2026 at 12:23 pm

BigWin cost something like five thousand dollars I believe, and unfortunately it’s probably gone. At least I have not ever seen a copy.

The only surviving Windows extender is probably Watcom Win386. I don’t know how much it was “failed” versus obsoleted by Win32s, when it finally became usable.

And yes, SYMDEB was probably okay back in 1985, but WDEB386 was still just improved SYMDEB, and in 1990 it was not that great. Even in Win9x times Microsoft did not have a really good kernel debugger, unlike the NT side where WinDbg had been around since forever. Which is why people bought SoftICE of course.