When I learned that Microsoft released the GW-BASIC source code, I was mildly curious to find out what is or isn’t there. The short answer is that there’s a whole lot, but a lot is also missing. Spelling note: Both “GW-BASIC” and “GW BASIC” can be found in the source code. The hyphenated spelling will be used here for consistency.
The first question is: When is the source code from? Microsoft marked the source files February 10, 1983, but that’s almost guaranteed to be wrong. The date comes from comments in the code: “This translation created 10-Feb-83 by Version 4.3”. That reflects running some sort of master BASIC source code through a translator generating 8086 code. The source code was almost certainly modified after that date.
My current best guess is that the source code is roughly from mid-1983. But that’s only a guess.
Assembling the Source
The next order of business was figuring out how to assemble the source code. The Microsoft source release provides absolutely no clues on this front. There is no makefile (although perhaps it’s too old for one), no batch file, no build notes, nothing.
The GW-BASIC source code makes several mentions of Intel’s ASM86, but the source uses far too many MASM specifics. It is likely that some older version used ASM86, but not the released source.
Armed with a collection of MASM versions, I tried assembling the source. It did not go well. Nothing could be assembled. MASM 5.1 seemed to get the furthest, which was odd because it’s really far too new (1988); moreover, MASM 5.1 has a built in INSTR operator which clashes with an INSTR symbol in the GW-BASIC source code.
It turned out that MASM 5.1 was merely more tolerant of UNIX line endings. Old MASM versions require DOS style (CR/LF) line endings and get very upset otherwise, spitting out confusing errors.
After massaging the source files to make them more palatable to MASM, things got more interesting. Long story short, almost all the files can be assembled with Microsoft MASM 1.00 or 1.10, as well as IBM MASM 1.0. There are known problems with the very old MASM versions that can be avoided by reducing conventional memory size to 512 KB.
Most files cannot be assembled with Microsoft MASM 1.12 or later, or IBM MASM 2.0. The problem is generally better diagnostics in newer MASM versions which refuse questionable constructs in the GW-BASIC source code.
These are the kinds of statements that MASM 1.12 and later refuses:
MOV DX, OFFSET 256*100+OPCNT
MOVS ?CSLAB,WORD PTR ?CSLAB
The exception is the GWMAIN module. MASM 1.x versions fail to assemble it because they run out of memory. The module can be successfully assembled with IBM MASM 2.0 or Microsoft MASM 3.0. No amount of pleading convinced MASM 1.x to work.
This raises some question marks. IBM MASM 2.0/MS MASM 3.0 are really too new (1984) for the GW-BASIC source code. It is possible that Microsoft used development versions of MASM; it is known (see page 337) that Microsoft shipped the bulk of GW-BASIC to OEMs in object code form and OEMs needed to supply glue code required for GW-BASIC to interface with their platform. It is thus possible that the code could not be actually assembled with a generally available off-the-shelf tool.
There is also some possibility that Microsoft did use MASM 1.0 or 1.1 but not hosted on DOS. At any rate, IBM MASM 1.0 plus IBM MASM 2.0 can be used to assemble the source code, and so can Microsoft MASM 1.10 plus MASM 3.0.
There was also an easily resolved mystery related to the GW-BASIC math package. There are two source files, MATH1.ASM and MATH2.ASM. Neither can be assembled. But if they are merged together, e.g. by including both from a master source file, assembly succeeds. The MATH module may have been split because the source code is almost 180KB and certainly would not fit on a 160KB floppy.
Update: Shortly after writing the above, I hit paydirt. MASM 1.06, ostensibly from 1982, can cleanly assemble all of the GW-BASIC source files, with no syntax errors and no running out of memory. A copy can be found here (as MACRO86.EXE) and here; the two executables have different date stamps but are in fact bit for bit identical. Why both older and newer MASM versions run out of memory on GWMAIN.ASM remains a mystery for now, but we now know that there was at least one MASM version that could assemble everything on a PC.
Comparing with a Binary
The next todo item was finding a GW-BASIC binary that’s close to the released source code. It quickly turned out that most GW-BASIC binaries are either older or newer. The right ones show
(C) Copyright Microsoft 1982
but may display various version numbers. They may or may not mention GW-BASIC. In the end I zeroed in on two binaries. One was GWBASIC.EXE dated Nov 11, 1983, file size 56,832 bytes, showing the following:
EAGLE GWBASIC Version 1.20 11/11/83
(C) Copyright Microsoft 1982
The other was BASICA.EXE dated May 13, 1983, file size 54,272 bytes. The sign-on message was:
The COMPAQ Personal Computer BASIC
(C) Copyright COMPAQ Computer Corp. 1983
(C) Copyright Microsoft 1982
Both of these are a very good but not perfect match for the released source code. I am almost certain that the Compaq version is slightly older than source code (because there are a few bits missing), while the Eagle version is slightly newer (because there are a few extra bits). That implies the released source code is older than November 1983 but possibly newer than May ’83.
Mapping Out the Binary
I concentrated on the Eagle Computers GWBASIC.EXE since it seemed to be a slightly better match for the source code. I was able to match all of the source code with the binary and arrived at the following sequence of source modules (note that BI stands for BASIC Interpreter):
GWDATA.ASM GWMAIN.ASM OEM.ASM GWEVAL.ASM GWLIST.ASM IBMRES.ASM BIMISC.ASM DSKCOM.ASM BIPTRG.ASM BIPRTU.ASM BISTRS.ASM FIVEO.ASM GENGRP.ASM ADVGRP.ASM MACLNG.ASM GWSTS.ASM GIO86.ASM GIODSK.ASM GIOKYB.ASM GIOSCN.ASM GIOLPT.ASM GIOCOM.ASM GIOCON.ASM GIOTBL.ASM SCNEDT.ASM SCNDRV.ASM CALL86.ASM NEXT86.ASM MATH.ASM (MATH1.ASM + MATH2.ASM) KANJ86.ASM GIOCAS.ASM ITSA86.ASM GWRAM.ASM GWINIT.ASM BIBOOT.ASM
OEM.ASM is a hypothesized OEM-supplied module which is not part of the GW-BASIC source code distribution. It is not a trivial piece of code and accounts for over 6,000 bytes of object code in the Eagle GWBASIC.EXE (more than 10% of the total).
It is likely that other GW-BASIC implementations order the modules differently, although the order of some of the modules at the beginning and end may be fixed (for example GWDATA.ASM needs to be first).
Reading the source code is fascinating. The code has clearly long history:
--------- ---- -- ---- ----- --- ---- ----- COPYRIGHT 1975 BY BILL GATES AND PAUL ALLEN --------- ---- -- ---- ----- --- ---- ----- ORIGINALLY WRITTEN ON THE PDP-10 FROM FEBRUARY 9 TO APRIL 9 1975 BILL GATES WROTE A LOT OF STUFF. PAUL ALLEN WROTE A LOT OF OTHER STUFF AND FAST CODE. MONTE DAVIDOFF WROTE THE MATH PACKAGE (F4I.MAC).
Paul Allen was clearly involved for a while:
FIVEO 5.0 Features -WHILE/WEND, CALL, CHAIN, WRITE /P. Allen
There is no indication that Bill Gates or Paul Allen were involved by the time the product became GW-BASIC.
The source code is written, as it was then common, in ALL CAPS (although not completely).
One of the most jarring things is that, as it was also common in the bad old days, identifiers are limited to six characters. That leads to ugly, cramped, and hard to decipher identifiers like FRMQNT or SKPMRF or LEVFRE or XCESDS. The 6-character limitation is also applied to file names.
The code is generally quite unstructured and very hard to follow. The PROC keyword is not used at all. Procedures are used, but rather loosely. Code very frequently jumps into the middle of another routine or returns from a routine by using a JMP rather than RET. As a consequence, there are only minimal attempts to keep values in registers and almost all data is kept in memory. The jumpy programming style also makes it impossible to use local variables on the stack. No doubt the code is written that way because it was originally targeting the Intel 8080.
The code contains a nice collection of “what not to do” Intel recommendations. To be fair, those recommendations don’t really apply to the 8086. The style violations include mixing of code and data and jumping into the middle of an instruction.
For example, calls to the SYNCHR routine are followed by one byte of data (excerpt from FIVEO.ASM):
CALL SYNCHR DB OFFSET 54O ;Must be comma CMP AL,LOW 54O ;Ommit line # (Use ALL for instance)
The byte is not code, it is data. SYNCHR pops the return address off the stack, processes the data and increments the address, then pushes it back.
The other type of abuse is even more interesting (excerpt from GWMAIN.ASM):
PUBLIC SNERR SNERR: MOV DL,LOW OFFSET ERRSN ;"SYNTAX ERROR" DB 271O ; SKIP ;"LXI B," OVER THE NEXT 2 PUBLIC DV0ERR DV0ERR: MOV DL,LOW OFFSET ERRDV0 ;DIVISION BY ZERO DB 271O ; SKIP ;"LXI B," OVER THE NEXT 2 PUBLIC NFERR NFERR: MOV DL,LOW OFFSET ERRNF ;"NEXT WITHOUT FOR" ERROR DB 271O ; SKIP ;"LXI B," OVER THE NEXT TWO BYTES
Note that LXI is an 8080 instruction, clearly revealing where the idea had come from. When the caller jumps to one of the labels, it will execute a MOV DL followed by a sequence of MOV CX instructions. The CX value is ignored and only the contents of DL is used.
Both of these techniques make disassembly somewhat difficult and confusing, although only very slightly so when one is armed with the source code.
Understanding how GW-BASIC manages memory takes a bit of effort. As was common and necessary in the old days, GW-BASIC discards initialization code and uses the recovered memory for other purposes. The label CSEND indicates the end of resident code with the following comment: “All code loaded after this label is resident only until routine MAPINI initializes the new memory map.”
It should be noted that GW-BASIC effectively uses the small memory model. The CS segment register points to code and DS/ES/SS all have the same value pointing to the data segment. The data segment size is variable and depends on the available memory (but can’t be more than 64K). There is no attempt at exploiting the segmented nature of the 8086 architecture; that makes sense given the 8-bit heritage and the fact that early PCs did not have all that much RAM in the first place.
Within the BASIC data segment, memory is subdivided into several areas. The basic layout is documented in the file GWINIT.ASM (see comment “Memory map for GW-BASIC”). There is stack overflow checking which is invoked for all larger memory allocations; as mentioned above, GW-BASIC does not use local stack variables, which means its stack usage is otherwise very minimal.
It would be handy to find an existing GW-BASIC executable which is an exact match for the released source code. So far I’ve not been successful and in fact the vast majority of Microsoft BASIC interpreters are either older (BASIC 5.x) or newer (GW-BASIC 3.x) versions.
It should also be possible to reverse engineer/disassemble/reconstruct the missing OEM source module (or modules) required to produce a complete GW-BASIC executable. That is likely to be a fair amount of work.
Another possibility is that Microsoft built it using MASM running MS-DOS on a S-100 system which didn’t have the PC 640KB limit. I remember reading that prior to DOS extenders Microsoft used a S-100 system to link the linker because so much memory was required to do so.
No, that does not make sense. MASM makes no attempt to use all available conventional memory. Also, MASM 1.06 has no trouble assembling the GWMAIN module with ~130K free conventional memory, (it seems to need a bit over 120K free).
>It should also be possible to reverse engineer/disassemble/reconstruct the missing
>OEM source module (or modules) required to produce a complete GW-BASIC
>executable. That is likely to be a fair amount of work.
Nevertheless, it’s already (mostly) done. See https://github.com/tkchia/GW-BASIC
Sure, if you just want to steal the code from an existing binary, it’s not that hard. In fact I’m surprised that part isn’t complete yet.
About Gates vs. Allen in “a lot of stuff”, the original 8080 4K source said:
00560 PAUL ALLEN WROTE THE NON-RUNTIME STUFF.
00580 BILL GATES WROTE THE RUNTIME STUFF.
00600 MONTE DAVIDOFF WROTE THE MATH PACKAGE.
In a web comment that MAY be from the horse’s mouth, that commenter says “When it says Paul Allen wrote the non-runtime stuff that means the development environment which was an amazing piece of work he did on the PDP-10 that made development work very productive including simulation and symbolic debugging.”
The comment comes from a discussion about the easter egg in Commodore’s 6502 Basic at https://www.pagetable.com/?p=43. Pagetable also goes into the original 6502 source and the MACRO-10 language its written in: https://www.pagetable.com/?p=774
Excellent blog post, as usual. Thanks for the archeology — I was curious myself which version of GW-BASIC this source drop was supposed to be for. (I’ve also written a blog post about GW-BASIC this weekend, where I describe my ongoing effort to port it back to the Z80: https://tia.mat.br/posts/2020/06/21/converting-gwbasic-to-z80.html — so any missing puzzle pieces, like this blog post, are appreciated.)
I’m not even sure about the version number. My impression is that it’s “the first GW-BASIC”, which may have been called GW-BASIC 1.0 except IBM and Compaq didn’t. It’s definitely newer than MS BASIC 5.x and a superset of it. The only clue in the code is this:
FIVEO=1 ;GENERATE VERSION WITH RELEASE 5.0 FEATURES
GWLEV2=0 ;Version 2.0 of GW BASIC-86
GWLEV2=0 ;GW BASIC version 2.0 features
(The GWLEV2 define can be found in two different files with the different comments.)
I take that to mean that it’s not GW-BASIC 2.0. Note that the above defines are not referenced anywhere in the source code, they appear to have been set for the mysterious translator which produced the 8086 source code.
From my research GW-BASIC 2.0 should have DOS directory support, and the released source does not — CHDIR, RMDIR, etc. is there but stubbed out.
I would think the OEM layer also had specific graphics routines as well. The Canon AS-100 computer was an 8086 non-IBM compatible computer that had some neat graphics for its day.
It came out with MS-DOS 1.1 so I think it was GWBASIC 1. The manual was done by Canon for A size pages and tries to as helpful as possible, Ir was not as cool as the IBM Documentation.
The core graphics logic was all generic, but yes, OEMs of course needed to supply code to set graphics modes, draw pixels, and the like.
I seem to have Compaq BASIC version 1.14, if you want to have a look. Its from Compaq MS-DOS v1.12g. File is 54304 bytes and it is dated November 28th, 1983.
Definitely worth checking out. The older Compaq BASICA.EXE I’ve been looking at is in some areas disturbingly different from the published source. How can I get hold of the newer BASIC executable?
Cool, I wasn’t aware that some of the older 8-bit MS BASIC source code was out there. It’s definitely closely related.
The Compaq Personal Computer DOS 1.12 is now available at archive.org:
Have fun 🙂
Thanks! At first glance I’m skeptical, because it says “(C) Copyright Microsoft 1982, 1983″… but I will take a closer look.
I got these versions in my files,
if you need something just drop me an email, thanks for interesting reading.
basic compiler-5.31(ibm 1.00).rar
basic compiler-5.60(ibm 2.00).rar
gwbasic-3.20-monocrome graphics (mbasic).rar
Leandro Pereira, I’m afraid that, while educational, your effort may be wasted. If you want a 8080/Z80 version, the source to Microsoft Basic-80 5.2 has already been leaked and is not too hard to find.
I think the released GW-BASIC would have been an archived copy of the final MS-DOS 1.x compatible code base. March 83 has the release of the XT and the DOS 2 compatible BASIC which would by followed by a GW-BASIC offering to match. The BASIC code was too large to be kept universal; GW-BASIC needed multiple code segments while MSX BASIC needed to be able to page split ROMs.
This form of BASIC was nearing its end. GW-BASIC had only a few minor updates (Tandy sound and EGA) after DOS 2. Coco BASIC updates were done by Microware. The only other major revision of MS classic BASIC after 1983 was the off shoot of Handheld BASIC f0r 8088 portables which implemented slightly fewer features to fit in a very tight ROM budget. The GW-BASIC code would have needed extensive redesign to handle the concept of sub-functions as introduced in DOS 3.
Based on what I see with the disassemblies, the source code was not the final GW-BASIC 1.x. In fact most of the GW-BASIC 1.x binaries I looked at (Compaq 2x, Corona, Eagle) are newer than the published source (there’s slightly more code/functionality). In fact the oldest(?) of those executables, Compaq BASICA.EXE version 1.12, appears to be the closest match in several areas.