IBM Blue Lightning: World’s Fastest 386?

One of the OS/2 Museum’s vintage boards is a genuine Made in U.S.A. Alaris Cougar. These boards were produced by IBM for Alaris and are a bit unusual: There’s a small IBM DLC3 processor in plastic package soldered on board, and there’s also a Socket 2 which accepts regular 5-Volt 486DX/SX processors or a Pentium OverDrive. If a standard ceramic-packaged 486 or OverDrive processor is installed, the on-board DLC3 is disabled.

Alaris Cougar

The IBM DLC3, sometimes designated as BL3 and better known as Blue Lightning, has an aluminum heatsink glued on but requires no fan. After 20 years, the information whether it’s the 75MHz or 100 MHz variant has been lost, but the board is stable when the processor runs at 100 MHz (3 x 33 MHz). And incidentally, the OPTi chipset and notably the Adaptec VL-bus IDE controller are quite good performers, often doing better than newer PCI-based 486 systems.

The Blue Lightning CPU is an interesting beast. There is not a whole lot of information about what the processor really is, but it can be pieced together from various scraps of information. Around 1990, IBM needed low-power 32-bit processors with good performance for its portable systems, but no one offered such CPUs yet. IBM licensed the 386SX core from Intel and turned it into the IBM 386SLC processor (SLC reportedly stood for “Super Little Chip”).

Later on, IBM updated the processor to support 486 instructions. It is worth noting that there were still the SLC variants available—nominally a 486, but with a 16-bit bus.

The licensing conditions reportedly prevented IBM from selling the SLC processors on the free market. They were only available in IBM-built systems and always(?) as QFP soldered on a board.

One of the more notable users of the 486SLC and SLC2 processors was IBM’s first ThinkPad laptop series, the 700C (25 MHz SLC, upgradable) and 720C (50 MHz SLC2) from 1992 and 1993, respectively. Blue Lightning processors were also used in some IBM PS/2 desktops.

IBM SLC2 Upgrade

The Cougar board of course sports a DLC3, i.e. a clock-tripled variant with 32-bit bus. This processor is very interesting: It’s essentially a 386 core, updated to handle 486 instructions (there weren’t too many), and equipped with a whopping 16KB of write-back L1 cache.

The 386-ness of the Blue Lightning is most apparent with regard to FPU architecture. The CPU itself has no built-in coprocessor, and most software recognizes it as a 486SX. However, unlike a 486SX, the Blue Lightning can use a regular 387 coprocessor (with the accompanying poor performance relative to a 486DX).

The Cougar is equipped with a 387 socket right next to the soldered CPU. The board came with a Cyrix FasMath coprocessor… which sadly appears to be fried. When the FPU is inserted, the board doesn’t boot at all. Without the coprocessor it works fine. Another FasMath in OS/2 Museum’s has corroded(?) pins which have a tendency to fall off, but after finding a functioning FPU, the system does work and is usually recognized as a 486DX by software.

Performance

Characterizing the Blue Lightning performance is tricky, as it doesn’t much resemble the standard Intel or AMD 486s. The processor core still largely behaves as a 386, which means that performance per clock cycle isn’t great. The catch is that it’s a 386 which a) runs at up to 100MHz, and b) is equipped with superb L1 cache.

Once again, it’s 16KB of write-back L1 cache. Of the common 486s, only the late-model Intel DX4 processors and AMD’s Am5x86 CPUs had L1 cache that was both 16KB and write-back (there were Intel DX4s with 16K write-through cache, and some AMD CPUs with 8K write-back cache).

This impacts the CPU performance in interesting ways. When comparing a 100 MHz IBM DLC3 to a typical Intel DX4 with write-through cache, two things are immediately apparent. First, the 486 core of a DX4 is noticeably faster at reading from the cache, and achieves about 95MB/s bandwidth, compared to approximately 63MB/s on the DLC3. However, the DLC3 can also write at 63MB/s, while the DX4 massively drops to just 31MB/s. The cache behavior is strongly influenced by the fact that the 486 uses 16-byte cache lines while the DLC3 only uses 4-byte cache lines.

The net result is that the DLC3 performance varied depending on exactly what it was used for. In general, it was slower than a DX4 at the same clock speed, but in certain cases it could be faster. It certainly did achieve 486-class performance and a 100 MHz blue lightning was comparable to or slightly better than a 66 MHz 486DX2.

Another confusing area is floating-point performance. When a 486DLC is compared to a 486SX, it does quite well. It is commonly known that a 486SX cannot be equipped with a stand-alone coprocessor, it can only be replaced by a 486DX with a built-in FPU (whether it’s called 487SX or something else).

There is simply no 486DLC variant with a built-in FPU, but a regular 387 can be added. The downside is that the math performance is then similar to a 386+387, and therefore far below that of a 486DX.

IBM intended the Blue Lightning for the typical desktop or portable user with minimal need for math computation. That covered the vast majority of users, but for math-heavy applications, the DLC3 simply wasn’t suitable.

Remarks

The SLC/DLC processors are not to be confused with IBM’s later 486 DX/DX2/DX4 processors, some of which may have been also marketed under the Blue Lightning brand and were commonly available in ceramic PGA packages. Those CPUs were built under a license from Cyrix and were more or less identical to Cx486 processors available under the Cyrix, Texas Instruments, and ST brands.

The 486DLC chips had an interesting deficiency: Despite having a 32-bit address bus and being able to access more than 16MB memory, the internal cache was limited to the first 16MB (presumably due to short cache line tags, designed for the address-space limited  SLC processors). The MSR specifying cacheable regions then only reserved 8 bits for the number of cacheable 64K blocks above 1MB. This limitation probably had little practical impact at the time, as very few systems with Blue Lightning CPUs would have sported more than 16MB RAM. However, the effect can be observed on the above mentioned Alaris Cougar board when equipped with 20MB RAM or more.

The CPUTYPE utility from Undocumented PC claims that a 100 MHz 486DLC3 runs at 104-105 MHz. This is almost certainly caused by a misconception—the utility expects 486 timings for the DIV instruction, but with a 386 core the DLC3 processor really uses 386 timings. Since the DIV instruction is in fact slightly faster on a 386 (38 vs. 40 clocks for a 32-bit register/register divide), CPUTYPE ends up overestimating the CPU frequency slightly. Some other utilities have similar trouble measuring the clock speed; SYSINFO from Norton Utilities is not one of them.

The Blue Lightning is a very interesting example of an older CPU design that modern manufacturing process is applied to. When Intel initially released the 386 in 1985, they had significant difficulties producing chips that could reliably run at 16 MHz, and on-chip cache was deliberately left out because Intel could not manufacture the die with a cache large enough that it’d make a real impact.

Several years later, IBM was able to add a significant cache and run the processors with clock doubling and tripling at frequencies almost ten times faster than the initial 12 MHz 386s. This brought the old 386 design to a point where it easily outperformed many 486s while keeping the power consumption low.

Finally, it needs to be mentioned that the IBM 386SLC was designed to solve some of the same problems as Intel’s 386SL, although the 386SL was meant to be used in conjunction with the SL chipset which IBM presumably wasn’t too interested in. But the Intel 386SL is a different story.

Documentation

No technical documentation of the 486SLC/DLC processors appears to be available. It did exist, but was likely distributed in printed form only. By the time electronic distribution of processor documentation became standard, the 486DLC was already obsolete. Any pointers to detailed Blue Lightning documentation are welcome.

This entry was posted in 386, IBM, Intel, PC hardware, PC history. Bookmark the permalink.

62 Responses to IBM Blue Lightning: World’s Fastest 386?

  1. Octocontrabass says:

    Since the IBM DLC3 is based on Intel’s 386, I’m curious if it has the POPA bug. It should be pretty easy to test, just a handful of bytes in a DOS .COM executable:

    60 61 66 67 8D 00 CD 20

    If it hangs, it has the POPA bug.

  2. Michal Necasek says:

    The CPU bug test in NSSI says the DLC3 does not have the POPA/POPAD bug. I’ll try your opcode sequence soon.

    As far as I recall, all of Intel’s 386 have the POPA/D bug, even the ones made in the late 1990s, right? Intel probably figured that by the time they could fix it, it wasn’t going to solve any real problems. IBM OTOH would have been in the opposite situation and may have wanted to fix the bug so that software expecting a 486 isn’t tripped up by a 386-only bug.

  3. Octocontrabass says:

    They never fixed it in the 386DX, since you can find spec updates for the F stepping that still list it. I’m not sure about the 386SX. Some of the non-PC-compatible embedded variants appear to have fixed it, since they don’t list it in their spec updates.

    Since you’ve got it to run NSSI, I’m curious if the reported model/stepping looks more like a 386 or 486. (And while we’re at it, are the unimplemented bits of CR0/MSW set like a 386 or clear like a 486? AMI BIOSes use MSW to detect the presence of a L1 cache.)

  4. Michal Necasek says:

    When I run NSSI with EMM386 enabled, it won’t show the CPU stepping. When I disable EMM386 (boot with F5), NSSI hangs on “checking for IBM MSRs”. So… not very useful.

    However, the machine’s BIOS (MR BIOS) shows the “CPU Rev” as 8439h, and I believe that is the value returned in DX after reset. So that’s 486 but with an additional bit set (486SLC supposedly reports A439h). The 386ID utility confirms the 8439h signature.

    CR0 reads as 0000FFF1h.

    Your POPA bug test passes, i.e. does not hang. IBM must have fixed it.

  5. Octocontrabass says:

    You might be able to stop NSSI from hanging with the /SAFE parameter. It worked for me on a particularly awful 486 board. (But then it didn’t report the CPU signature…)

    That signature sounds more like it’s in the 386 format (model/family/stepping/stepping) instead of the 486+ format (type/family/model/stepping). It seems to line up pretty nicely considering that bit 13 is the only non-stepping-related difference between an Intel 386DX and 386SX signature.

    Your CR0 value shows the CPU is in protected mode, which means your memory manager could be faking it. Assuming the low 16 bits are accurate, AMI’s cache detection logic would fail to enable the cache during POST. I have to wonder if that’s intentional, since it was sold in so many 386 upgrade kits.

    In theory, one might be able to use those reserved CR0 bits to differentiate an IBM 486SLC/DLC from an Intel 486, without fiddling with MSRs. (But what about the 386SLC? Hmm.)

  6. Michal Necasek says:

    The CR0 value was read from protected mode (DOS extender) but no memory manager. In fact DOS/4GW runs the application in ring 0 anyway, so there’s no software between the user code and the CPU, memory manager or not.

    The signature looks pretty 486-ish to me actually, with a ‘4’ for the family indicating a 486. For the 386, it was “component” in DH and “revision” in DL, for example 0304 on C0-step 386. In practice not too different because there’s not a lot that can be generalized from the model/stepping values.

  7. Octocontrabass says:

    Are there really only 8 bits for the cache region size? I disassembled part of your board’s BIOS and it seems to think the 486DLC and later revisions of the 486SLC extended the 8-bit cache limit field to 24 bits. It goes out of its way to ensure the additional bits are never set on the later-revision 486SLC.

    Unfortunately, the disassembled code seems to always set the cache region to 16MB regardless of installed RAM, so those extra bits could just as easily be intended for something else.

  8. Michal Necasek says:

    Without any databooks (which definitely did exist, but apparently haven’t been seen in this century), we can only guess. The 486DLC was intended for IBM’s low-end PCs which probably didn’t even get near 16 MB RAM. It’s possible the DLC was never reworked to cache more because it wasn’t considered a problem. It’s possible that the DLC was extended to cache more than 16 MB, but the BIOS didn’t set it up right. It’s possible the DLC was extended to cache more, but it didn’t work quite right so the BIOS made sure it wasn’t used.

  9. Technoid Mutant says:

    I had fits with these processors. The CPU’s ran too hot, and the boards they were mated with were so cheaply made that they failed on the bench. I really WANTED them to work, but I couldn’t sell them.

  10. Michal Necasek says:

    Interesting. The CPU I have does run on the hotter side, but then again it’s 100 MHz with just a passive heatsink. I’ve had it for ~20 years and it’s been quite stable. Maybe I was lucky.

  11. Octocontrabass says:

    Well, I somehow managed to miss the part where CTCHIP34 describes the additional 16 bits for cache control above 16MB.

    Here’s a DOS .COM executable that might extend the cache to cover 20MB (if I’ve correctly guessed how those bits work). I stole the cache on/cache off code from the BIOS.

    fa
    66 b9 00 10 00 00
    0f 32
    24 7f
    0f 30
    0f 09
    0f 08
    41
    0f 32
    b6 40
    0f 30
    49
    0f 08
    0f 32
    0c 80
    0f 30
    fb
    cd 20

  12. Pingback: Learn Something Old Every Day, Part VIII: RTFM | OS/2 Museum

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.