Video trouble with Solaris 10 GA and U1

The initial (GA) release of Solaris 10 from 2005 and the first update (Solaris 10 1/06) both have an interesting bug which causes trouble when installing those specific versions on VirtualBox and quite likely also on some mildly unusual hardware. The symptom is a user process crash when the installer is starting (core dumped twice), possibly followed by a black screen. The reason for the misbehavior is interesting to say the least.

Solaris 10 was released with a somewhat schizophrenic video subsystem. There’s Sun’s old X server, Xsun. There’s also the Xorg 6.8.0 X server. And just to make it more interesting, Xsun can also use many Xorg driver modules.

During installation, kdmconfig probes for installed video devices. It first checks whether a device supported by the Xsun server is available. To that end, it executes /usr/openwin/bin/Xsun -probe. Xsun in turn will execute the xf86Probe() routine which calls Sun-specific DoXsunProbe(). Eventually execution will end up in xf86MatchPciInstances() which contains a block of Solaris specific code not found in the standard Xorg 6.8 source tree. The purpose of that code is to print a message identifying the PCI display device which the Xorg code found, along with the matching Xorg driver name.

The code prints the name of the device and its vendor, derived from a compiled-in database of PCI vendor and device identifiers. It is assumed that the Xorg driver code might run on devices not explicitly supported and fall back to the VESA driver. Therefore, a provision is made for the case where the PCI vendor and/or device ID is unknown. In that case, the names will be set to “Unknown Vendor” and “Unknown Board”, respectively.  So far so good.

Unfortunately, a subtle bug had crept in. When the vendor name is not found, the device name is replaced with “Unknown Vendor”. That’s right, the vendor name is left alone and the device name is overwritten. That’s a bit of a problem because later when the function tries to print the vendor name, it will crash in strlen() since a NULL pointer had been passed.

This is a bug from a category which is very difficult or impossible to find in normal product testing. When the code was written, the built-in PCI database presumably contained the names of all known PCI display device vendors. But a new hardware vendor (including virtual hardware vendors) is going to throw a monkey wrench into the works and trigger the code branch which incorrectly handles unknown PCI vendors, crashing the Xsun server.

When that happens, kdmconfig itself crashes too, likely due to another untested and unexpected corner case. To add insult to injury, the console may end up being black. That can happen because the X server reprograms the attribute controller color select register (AR14) in such a way that the VGA controller uses a section of the DAC color look-up table (LUT) which is not normally used for text modes and is likely to be zeroed (i.e. generating only black). The Solaris console driver never touches the AR14 register, which unfortunately also means that it won’t properly restore it even when the console text mode is re-set.

Interestingly, if Solaris 10 is installed despite the obstacles (perhaps by using serial console), it is possible to get full X11 functionality. When kdmconfig is run manually, it can be used to set the default X server to Xorg, avoiding the problems with Xsun. That will finally allow X11 to run.

For S10U1, an easy option is installing using the text mode installer from a console session, choosing the “Solaris Interactive Text (Console session)” selection early in the installer. That unfortunately does not work with S10 GA which will execute kdmconfig anyway, crash, and cause the console to be all black. Installing over a serial console avoids that problem.

It might be also possible to avoid the problem by setting up a RPC bootparams server on the network supplying X11 server configuration information to the Solaris 10 installer, but this was not tested.

The black screen problem no longer occurs in Solaris 10 6/06 (S10U2). The graphical installer functions normally when launched from the installation disc and kdmconfig runs without crashing.

Unfortunately, on some host systems, we’re not out of the woods yet. The first time the installed Solaris 10 OS is booted from disk, it will hang with the 64-bit kernel.

To be continued…

This entry was posted in Solaris, VirtualBox. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.