Last weekend I had the pleasure of debugging a curious case of older PCI configuration code (circa 2005) failing on newer (post-2010) hardware. The code was well tested on many 1990s and 2000s PCs (and some non-PCs) with PCI/AGP/PCIe and never showed problems. But at least on some newer machines (in my case a Sandy Bridge laptop) it just hung.
I quickly found out that code meant to walk over a PCI capability list got stuck in a loop. But the reason why it got stuck was unexpected. For reasons lost to the mists of time, the code reads the next capability pointer and the capability type (total of 16 bits of information) as two bytes, not one word. That was not the problem though. The problem was that the code always read a full DWORD from the config space and discarded the top bits. On the problematic system, that worked as long as the read was DWORD-aligned, but not otherwise. The code logic may seem unusual but it worked perfectly well on so many older systems, so why not anymore? And who violated the PCI specification, the code or (unlikely) the hardware?
Let’s start with the second question. Is such a misaligned access something that should work or not? A very strong case can be made on both sides: On the one hand, the code works perfectly fine on all PCs made prior to 2010 or so, and such fundamental behavior should not change. On the other hand, Intel presumably knows how PCI should work and wouldn’t get that wrong.
When one looks closer, a very unusual thing happens—the picture becomes fuzzier instead of clearer. It turns out that the PCI specification does not actually explicitly define this behavior.
Let’s quickly summarize how the PCI configuration works. So-called configuration mechanism #1 (there used to be mechanism #2, too) uses two I/O ports in the host CPU’s I/O space: CF8h aka CONFIG_ADDRESS, and CFCh aka CONFIG_DATA. The address port has some interesting behaviors but that’s irrelevant here; the key takeaway is that the address port selects a 32-bit DWORD in the PCI configuration space; it is not possible to address bytes/words. However, it is possible to read/write bytes or words using byte enables (that is general PCI behavior). And that’s where the seams between PCI and x86 CPUs start showing.
From the CPU’s point of view, the CONFIG_DATA port actually responds not only at the CFCh address but also at CFDh, CFEh, and CFFh. A byte read from CFEh, for example, translates into a DWORD-sized PCI config space access with only the 3rd byte enabled. It works as everyone expects, the PCI bus gets a 32-bit access but the CPU (and device) only deals with a single byte.
Now what happens if you read not a byte from CFEh but a DWORD? Well… in the usual case, some logic in the host bridge/chipset will probably break it down to two word accesses, one at CFEh and the other at D00h. The data (when reading) will be glued back together and the CPU gets a DWORD back. Everyone is happy.
Except when you have a modern CPU, there is not necessarily any chipset involved. There are PCI devices in the CPU itself and, well, these kinds of accesses now behave differently. In the case at hand, a DWORD read from within CONFIG_DATA that’s not DWORD-aligned will simply return FFFFFFFFh (computer speak for “there ain’t nothing here, dude”).
My conclusion is twofold. The code was always kind of wrong, because it probably read from I/O port D00h, at least in some cases. But that was almost certainly entirely harmless and the code worked as expected.
The other conclusion is not so much a conclusion as just wondering if Intel even realized that they were changing the existing semantics there. Perhaps they didn’t.