So after some furious disassembling, assembling, and linking, things got this far:
It took longer than it ought to have because although IDA is great, I couldn’t figure out how to make it work with GW-BASIC’s bizarre segment usage. The problem is that the BASIC data segment is tacked onto the end of the code, and during initialization, CS equals DS and variables have high offsets in the segment. But after initialization, CS no longer equals DS and DS points at the beginning of the data, which means variables are at low offsets within DS. I failed to convince IDA that the same data is accessed through completely different offsets at different points in the code. After a lot of trying and failing, I’m still unsure if I’m doing something wrong or if it’s really a situation that IDA can’t adequately handle.
At any rate, I ended up with a bit over four thousand lines of a GW-BASIC OEM module, lovingly borrowed from Compaq’s BASICA.EXE version 1.13.
The original most likely had the OEM code split into several modules, which is evidenced by small gaps in the code. Those would be caused by paragraph alignment of the segments.
Using the previously disclosed information, the source modules were all assembled with MASM 1.06 (no errors) and linked with LINK V2.01 (Large), executable dated 4-01-83. The order of the modules definitely matters, at least to some extent. The linker warns that there is no stack segment (true; the original Compaq executable doesn’t have one either) and shows one error about an offset width exceeding field with in the MATH module. While that error is also legit, the code is just written that way. Despite those issues, the linker still produces a working executable. Here’s the map file in case anyone is curious.
The linker error turned out to be somewhat significant. The offending code in MATH1.ASM is this:
SIN30: MOV AL,LOW OFFSET $FAC
The linker correctly complains that it can’t shove the 16-bit $FAC offset into an 8-bit register. What the authors probably actually intended is this:
SIN30: MOV AL,BYTE PTR $FAC
Now, where it gets really interesting is that out of the four GW-BASIC 1.x executables examined (Compaq BASICA.EXE 1.13 and 1.14, Eagle GWBASIC 1.10, Corona GWBASIC 1.12.03), three in fact use the latter, corrected version of the code. Only Compaq’s BASIC 1.13, clearly the oldest of the bunch, uses the buggy code sequence which matches the source code released by Microsoft.
As previously mentioned, not one of those four binaries is an exact match for the released source. Compaq’s version 1.13 is the closest, but not identical. At this point it is anyone’s guess if any OEM ever released a GW-BASIC version that was built from the exact source code that Microsoft recently published.
Note: The OEM module is not yet in a shape suitable for publishing so don’t ask.