But I couldn’t get anywhere–the boot floppy (or bootable DVD) would just hang. And since SuSE defaults to a graphical boot up, the boot disk would hang with a black screen and absolutely no information.
Soon enough I found out that disabling local APIC support through a Linux kernel argument ‘(disableapic’) gets rid of the hang. And when it did hang, in text mode I could at least see how far it (Linux kernel 2.4.10-4GB) got:
The “host bus clock speed” is actually the APIC timer frequency. And obviously 0.0 MHz is not right, even though the CPU clock speed is spot on. So how did Linux arrive at the nonsensical number?
Looking at the source code initially didn’t provide any real clues. The APIC timer calibration code looked sane and it was hard to see how it could arrive at zero as its result.
When I started digging, I found out that the APIC state does not look right—as if the APIC registers were never programmed. Tracing the code I found that Linux writes to APIC registers at an address that is nowhere near the actual APIC physical address (FEE00000h).
That led me to a routine called init_apic_mappings() which calls detect_init_APIC(). And there I found code which, instead of checking whether the CPUID reports APIC presence, only uses a very incomplete heuristic starting with the comment “Workaround for us being called before identify_cpu()“. The gist of the function is that if the CPU was an AMD but not family 6, or Intel but not family 5 (Pentium) or 6 (Pentium Pro/II/III), Linux would decide that there is no local APIC.
OK, but then how is Linux trying to calibrate a non-existent APIC? Well, init_apic_mappings() explains that: “If no local APIC can be found then set up a fake all zeroes page to simulate the local APIC and another one for the IO-APIC.“
It’s not clear what problem that solves. It’s much clearer which problem it causes: The fake APIC naturally does not have a functioning timer, and when setup_APIC_timer() waits for the timer to roll over, it keeps waiting forever.
The code was obviously not exactly well thought out, but it probably worked on most machines available at the time. When Pentium 4 systems came out (family 15 rather than family 6), it failed the same way my VM did—Linux completely confused itself and hung trying to initialize a fake APIC timer. The problem was shrewdly blamed on bad MP tables, broken BIOS, and all kinds of nonsense, when the bug was in the Linux kernel all along (and was fixed by also accepting Intel family 15 as having an APIC).
Now here’s the fun part. Using the VirtualBox VM debugger, it’s possible to show the console buffer including the part that already scrolled off:
VBoxDbg> info vgatext --------------- 80x25 (+19 before, +0 after) --------------- Uncompressing Linux... Ok, booting the kernel. Linux version 2.4.10-4GB ([email protected]) (gcc version 2.95.3 20010315 (Su SE)) #1 Fri Sep 28 17:20:21 GMT 2001 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000001fff0000 (usable) BIOS-e820: 000000001fff0000 - 0000000020000000 (ACPI data) BIOS-e820: 00000000df000000 - 00000000dfffffff (reserved) BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved) On node 0 totalpages: 131056 zone(0): 4096 pages. zone(1): 126960 pages. zone(2): 0 pages. No local APIC present or hardware disabled Kernel command line: linuxrc=auto2,yast2 initrd=initrd ramdisk_size=65536 enable ----------------------- screen start ----------------------- apic BOOT_IMAGE=linux SuSE=,1203200100,D04D9BB2.,41AB expert=1 Initializing CPU#0 Detected 3899.963 MHz processor. Console: colour VGA+ 80x25 Calibrating delay loop... 7785.67 BogoMIPS Memory: 510176k/524224k available (1289k kernel code, 13660k reserved, 381k data , 124k init, 0k highmem) Dentry-cache hash table entries: 65536 (order: 7, 524288 bytes) Inode-cache hash table entries: 32768 (order: 6, 262144 bytes) Mount-cache hash table entries: 8192 (order: 4, 65536 bytes) Buffer-cache hash table entries: 32768 (order: 5, 131072 bytes) Page-cache hash table entries: 131072 (order: 7, 524288 bytes) CPU: L1 I Cache: 32K (64 bytes/line), D cache 32K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) CPU: AMD Ryzen 7 3800X 8-Core Processor stepping 00 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. POSIX conformance testing by UNIFIX enabled ExtINT on CPU#0 ESR value before enabling vector: 00000000 ESR value after enabling vector: 00000000 Using local APIC timer interrupts. calibrating APIC timer ... ..... CPU clock speed is 3900.1878 MHz. ..... host bus clock speed is 0.0000 MHz. cpu: 0, clocks: 0, slice: 0 ------------------------------------------------------------
The scrolled-off portion contains a big clue: “No local APIC present or hardware disabled”. That is exactly the problem, Linux decided (incorrectly) that there is no APIC, and then (incorrectly) tried to initialize it.
There are two ways to work around the problem in VirtualBox. One is, as mentioned above, the ‘disableapic’ kernel argument. That tells Linux not to even look for an APIC and sidesteps the problem. The other option is to change the CPU profile to a CPU which has an APIC and is one of the CPU families known to the old kernel, such as this:
VBoxManage modifyvm OldLinux --cpu-profile "Intel Core2 X6800 2.93GHz"
That placates the buggy Linux kernel and convinces it to use the real APIC registers rather than the inadequate fakes. The APIC timer can then be successfully calibrated and the VM boots up. Note that pretending the CPU is a Core 2 X6800 works just fine on a Ryzen host CPU.