IBM XENIX 1.0 Incompatibility Details

Some time ago I wrote about IBM PC XENIX 1.0 and why it won’t work on 386 and later processors. Thanks to a kind reader, I’ve been able to analyze the object files used to link the kernel, and I believe I now understand exactly how XENIX uses the reserved space in LDT descriptors and why my patched kernel doesn’t fully work. This problem is likely common to all early editions of Microsoft XENIX 3.0, not just the IBM-branded release.

What XENIX stores in the reserved word is the size of the corresponding segment on disk; the value for a given descriptor is returned by a kernel-internal function called dscrsw(). This is used when an executable file gets loaded and also when swapped out. The latter is likely the reason why the segment size on file is stored in the descriptor table at all.

Incidentally, why is the troublemaking routine called dscrsw anyway? The answer to that riddle can be found in /usr/include/sys/relsym86.h: The last word of the descriptor table structure is defined as d_sw, or “software defined word, unused”. If only.

My patch made dscrsw() return the segment limit rather the physical (on-disk) size. That happens to work well for executables which have a single segment not followed by any other data… which happens to be the vast majority of XENIX 3.0 executables. In that case, the OS ends up reading exactly as much as it should (typically less than what it was asked for).

For files with symbols, XENIX ends up reading more than it should and possibly causes some corruption. Similar trouble occurs with multi-segment executables, and those may not work at all. There are remarkably few of those shipped with IBM XENIX. In the base package, it’s only vi and vsh (Visual Shell). While vsh is easy to live without, vi is much more critical.

On the whole, the patching was about a 90% success but in the meantime, a much better solution turned up; more about that later.

But Why?

The descriptor mess made me wonder again why XENIX was written that way. The oldest relevant documentation available to me, the Intel iAPX 286 Operating Systems Writer’s Guide (order no. 121960-001) from 1983, speaks quite clearly: The last word of any descriptor type is “reserved for iAPX 386, must be zero”.

That documentation was available well before IBM XENIX 1.0 was released (late 1984), so there was at least some sloppiness involved. But there is another, more charitable explanation.

It is possible that in earlier versions of 286 documentation, the last descriptor word was merely marked as reserved or unused, and the implication was that it would be used on a 386 if some not-yet-defined bit marked the segment as a “386” segment. But in the end, that’s not how it worked, and IBM XENIX 1.0 is living proof—the 386 uses the last word always, regardless of descriptor type. In other words, there was a silent change in behavior where existing 286 binaries simply behave differently on a 386.

And to be fair, that difference is spelled out quite clearly in the Intel 80386 Programmer’s Reference Manual (order no. 230985-001): “Because the 80386 uses the contents of the reserved word (last word) of every descriptor, 80286 programs that place values in this word may not execute correctly on the 80386.” That indicates the problem was known to Intel… in 1986.

But Microsoft’s XENIX 3.0 (aka IBM XENIX 1.0) for the 286 was unlucky enough that it was released before any 386 hardware whatsoever was available, so no one could notice that it doesn’t work at all on the newer CPU. Fortunately, CPU vendors learned from such mistakes and consistently enforce reserved bits in newer designs (i.e. attempts to set reserved bits cause faults).

This entry was posted in 286, 386, Microsoft, Xenix. Bookmark the permalink.

18 Responses to IBM XENIX 1.0 Incompatibility Details

  1. Jeff says:

    Does this mean that if an 80386 emulator were written to ignore the last descriptor word in 286-type descriptors, problem solved?

  2. Michal Necasek says:

    Yes, although I’m not sure it would still qualify as a 386 emulator 🙂 A 286 emulator on the other hand would fit the bill exactly.

  3. crazyc says:

    How would you tell a 286 type descriptor though? Win3.1 and Win95 (presumably…) place 16bit segments above 24MB so just being a 16bit segment isn’t enough.

  4. Michal Necasek says:

    You can’t, that’s exactly the problem. 32-bit (for some definition of 32-bit) OSes can and do place purely 16-bit segments anywhere in the 32-bit linear address space. OS/2 does that too.

    (BTW I think you mean above 16MB, or more than 24 address bits.)

  5. Richard Wells says:

    Even if the documentation had clearly shown that Xenix 286 code would not run on a 386, the same methodology might still have been followed.

    1) Unix based OSes expected to recompile all code with every new CPU and most designers believed that 5 year old executables could be thrown away.

    2) The price tag for 1MB of RAM when Xenix 286 was being developed was over $1,000 plus the expansion card needed added several hundred more dollars. Not using the reserved bits wastes about $30 for each LDT. It is also very difficult to make memory access faster than reading memory that is already being read.

    Better performing code now always beat out inferior code that might work on future hardware without changes.

  6. crazyc says:

    >(BTW I think you mean above 16MB, or more than 24 address bits.)

    Bits, bytes what’s the difference? 🙂

    I’ve looked at Intel Xenix 286 3.0 for the System 310 (which boots in MAME) and I don’t see any segment descriptors that store anything in the top 16 bits. Maybe Intel cleaned that up on their own or maybe it’s a newer release (the copyright is 1984 though). The disk images are at bitsavers.

  7. Michal Necasek says:

    Different/wrong mindset — you’re probably right about that. They didn’t understand or have experience with what it meant to write shrink-wrap software.

    Memory and performance… I don’t buy that argument at all. Someone worried about RAM prices or speedy execution would not touch Xenix with a 10-foot pole in the first place.

  8. Michal Necasek says:

    The copyright may be 1984, but the files could be newer. Hmm, looking at the timestamps, it looks like around October 30th, 1984. The IBM XENIX 1.0 core files are from around October 10th, 1984, more or less the same vintage.

    Fortunately the object files constituting the kernel are available for both Intel and IBM 286 XENIX. And it’s interesting. For both IBM and Intel, the object files are dated October 7/8. They look like they were compiled from very similar source, but are not identical — either the compiler used was not quite the same, or the compile options were different. The IBM object files are better optimized than the Intel ones.

    In both cases, mch.o contains a _dscrsw routine. It’s almost identical, but in the IBM version it accesses segment 0x8 and in the Intel version segment 0x140. For IBM, that’s a mapping of the GDT; for Intel I’m almost certain it’s the same, the descriptors are just organized a bit differently.

    Both kernels definitely use the _dscrsw routine, the calling code isn’t quite identical; there’s some extra complexity in the Intel kernel. Without seeing the Intel one running, I can’t quite tell if it’s just minor variation or a deliberate change to get that extra word out of the descriptor tables.

    In April 1984, Intel already put out a “product preview” for the iAPX 386 (as they called it then). It’s highly likely that existing 286 Xenix would be one of the first things they’d try running on prototype 386 hardware. So I think it’s possible that by October 1984, Intel noticed the incompatibility and fixed it.

  9. crazyc says:

    Looks like I was to hasty. I ran vi which you said was problematic and I see it happening now.

  10. Michal Necasek says:

    Ah, okay. The IBM XENIX does it even during boot, it won’t get anywhere on a 386. Sounds like the Intel package is a bit different but still would have trouble on a 386.

  11. Super says:

    Looking into it a bit, Intel’s February 1982 “Introduction to the iAPX286” (210308-001) documentation doesn’t mention that the unused space in the LDT is reserved for the 386 (in fact, it doesn’t mention the 386 at all). It even outright states that the descriptors are 48-bit structures.

    Seems to be a decent indication that pre-1983 documentation on the 286 wasn’t at all clear on this point. If XENIX was a very early port to the 286 then it’s likely they were working from draft documentation. Who knows, maybe some draft documents on the 286 even referred to the space as a “software defined word”?

  12. Michal Necasek says:

    There is indication that the XENIX port to 286 happened in 1983. IBM XENIX 1.0 was released quite a bit later, after the 386 was announced and in fact after pre-production 386 silicon was available. But of course the question is what documentation Microsoft used when doing the port.

    Nice find there. The Introduction to the iAPX286 is not super detailed, but it does say that the descriptors contain 3 words (48 bits) and implies that they are spaced every 4 words, presumably to keep the indexing into descriptor tables straightforward.

    So yes, it’s possible that the documentation available to Microsoft when they ported XENIX to the 286 said nothing about the 4th descriptor word and implied that it’s just padding. By the time XENIX 286 was released, the 286 documentation was already changed to reserve the word. But it took some time before Intel identified the incompatibility and Microsoft fixed Xenix.

  13. Richard Wells says:

    It wasn’t just Xenix that had problems with 286 code on the 386; I have read magazine articles pointing to similar problems with iRMX-286 on a 386. The very early Intel sample 286 protected mode code is not something I have been able track down. K286 was $27,000 which resulted a lack of available copies and protected mode iRMX saved online starts with iRMX-II from 1988. That whole fun 1982-1983 period when Intel and everyone else was learning how to do protected mode design has largely vanished from the record.

    It has been interesting, in retrospect, seeing the philosophy on memory usage on microcomputers change during the 80s from saving every bit possible like with CP/M’s use of the high-order bits on file extensions to flagrantly leaving large blocks unused.

  14. Since you cycled back around to XENIX, I was wondering if you made any headway on the XENIX 386 2.2.3 voodoo. I’m sorta tempted to try a crack at it as a part time project. Assuming it’s just not data corruption, part of me makes me think we’re looking at an extremely arcane hack to allow a stripped down kernel to load the filesystem from floppy. Since you got disk one and the kernel to bootstrap, I’m guessing those were one of the images that weren’t unusual.

    On XENIX 286, part of me suspects you found a way to trap read/writes on the LDT to make the 80386 happy, and put the data xenix wants somewhere else. Maybe that’s madness. I’m curious at your solution.

  15. Michal Necasek says:

    Haven’t really had a chance to look at the Xenix 386 2.2.3 disks again. If you want to have a go at it, the best would be probably looking at the TeleDisk images and figuring out if some of the sectors with duplicate numbers look more plausible than others.

    Re 286 XENIX, my solution is much more work and yet much more straightforward, and I’m going to publish a post about it within the next few weeks 🙂 What you suggest would be probably technically possible but a fairly major surgery (which the patient might not survive).

  16. Michal: Can you post the SHA1SUMs for what you have for Xenix 386? I found some reference notes to the teledisk format, but part of me is sorta tempted to try and simply see if I can even extract useful binaries out of it, and perhaps build N2 with the components from that.

    I have noted that Googling seems to show a complete version of the Xenix 386 2.3.1 release. Might be interesting to see if that can mount your teledisk written 2.2.1 disks. See if the problem is corruption, or actually a case of utter madness in the N2 disk.

  17. Michal Necasek says:

    So, somehow I have raw disk images of SCO 386 Xenix 2.2.3, and I think they’re the same thing as the weird TeleDisk images. The X and B disks are version 2.2.2c, but the N (boot) disks are version 2.2.3c. I think that was somewhat typical for SCO. My images are dated 12/25/1996 and I’m sorry to say I have no idea where they had come from.

    Let me post MD5 sums of those TeleDisk images…

  18. Michal Necasek says:

    My TeleDisk images:

    4EB56ED320F483B63EC4F6024F3F2898 BASIC2.TD0
    695CF3C5A742262C45CDACBC23FBBD81 EU1.TD0
    A85532D6B53B18FE96FC652B05259B49 EU2.TD0
    4096FB247F4BFF455E006D09027A683A EU3.TD0
    F44EED1390A18816AA2C03EF034C45DA EU4.TD0
    D2B645981AEF7908FB7B99899FB91E14 EU5.TD0
    0E1D05ABAAED186F529968E3E7DD1647 OPS1.TD0
    494644DF319357C6ED64E5278C3BC64D OPS2.TD0
    321FEAE661CC05462A36C0FF1A3E930C OPS3.TD0
    68570892828E6255ED12B88F2298E958 OPS4.TD0
    367AF4F3C679DA6F045039B89DFE8705 OPS5.TD0
    4E8E56849F3DB7B949BD4FF3DB66F90B OPS6.TD0

    And what I think are the corresponding raw images:

    69CA22B49DBA8F8420E973C81D013342 B1.img
    332A8E4BB36EAE835E4DE24A81DFBF3D B2.img
    84BF21AD662B8ED76AAFE3C1FF066279 X1.img
    3B4A30D69628D5C7D46F607BC652DCCF X2.img
    EAF4E885A2B65D7CE76D50D79DDF930B X3.img
    8DFCBF3FE113D8A3AF202145C448B197 X4.img
    D3D3BB3570F36348005DB59FA3046074 X5.img
    F0EFDE471D724C7801E8FCC3EE03FE95 N1.img
    33BF7FD8FC2FE0DCC83774CA2224705D N2.img
    4929D1658709805802F308399D6498B7 N3.img
    09F70C58FED4909FEC082788AA75A47F N4.img
    E04526EC63F299BCAD2C7B0D863301CA N5.img
    BEEC8AB53647A93C1E9E649299FD68E8 N6.img

    Again, this is Xenix 2.2.3c although the base and extended utilities are version 2.2.2c.

Leave a Reply

Your email address will not be published. Required fields are marked *