For many years, software was delivered predominantly on floppies. This was true especially in the world of PCs where by definition (almost) every system contained at least one floppy drive and prior to the mid-1990s and mass arrival of CD-ROMs, there was no other standardized distribution medium (compare e.g. with UNIX workstations where software was typically delivered on tapes).
Since floppies, or indeed any storage medium, do not have unlimited lifespan, there is a need to preserve the contents of floppy disks. As time marches on, this is becoming a more pressing issue, especially with 5¼” disks; ironically, finding functioning drives is becoming harder than locating error-free floppies. But even with error-free disks and working drives, the next question is how exactly to preserve the data. There are numerous options, each with advantages and drawbacks.
The most basic and least satisfactory method involves simply archiving the files on the disks. For some software (e.g. Windows 3.1 and much of Microsoft’s software of the era), this is adequate and the product can be fully installed. This method loses certain information such as disk volume labels or boot sectors.
In many cases, file archives are inadequate. One of the more common problematic situations is archiving bootable operating system disks; a file archive is more or less worthless. In some cases, installers rely on disk volume labels and archiving files is not sufficient. Then there are disks which do not necessarily contain a file system at all—this is common with many UNIX distribution disks which may contain raw dumps of tar or cpio archives.
There’s also a question of convenience. With a VM running some ancient operating system with no (or outdated) networking facilities, moving files around can be surprisingly challenging.
Raw disk dumps are the next level in floppy archival. Since they capture information on a sector rather than file level, they are suitable for archiving most disks, including bootable disks as well as disks with non-DOS filesystems.
Raw dumps should always cover the entire capacity of the storage medium. Interesting information (fragments of files or memory) can often be found in unallocated areas of the disk, especially with older software.
In most situations, raw dumps are the preferred archival format as they are easy to create, easy to use, and typically capture all necessary information. This is especially true of (roughly) post-1990s software which was mass produced and delivered for disk replication in the form of disk images. A raw disk image essentially re-captures the master image. Certain non-essential information may be lost, such as sector interleaving. This is rarely significant.
There are ample tools for manipulating raw disk images, such as rawwrite or dd. Just as importantly, nearly every emulator/virtualizer can use such images. Unfortunately, raw dumps are not the ultimate solution either. Especially copy protected disks cannot be archived this way.
A notable drawback of raw images is that they cannot contain any meta-information such as checksums or descriptions. However, that isn’t an insurmountable obstacle as raw images can be stored together with other files with checksum-protected archives.
A good example of a product that cannot be adequately archived using raw images is Lotus 1-2-3 R3.0 and several other Lotus products. The copy protection scheme used by Lotus does not rely on any kind of “out of spec” disk contents, just a sector with deliberately corrupted CRC. An unreadable sector cannot be represented in a raw image; yet if the sector is made readable, the Lotus product will refuse to install, correctly recognizing a modified disk.
Copy protection can be cracked, but that is not a satisfactory solution. First, it requires tampering with the product (then it is no longer an original), and second, the effort and skill required is far higher than the effort needed to archive a floppy.
Another case where raw images are inadequate is any non-standard, non-uniform disk format, such as disks with varying numbers of sectors per track or varying sector sizes, or generally any disk with “unusual” geometry. Since the raw image does not contain any geometry information, the user is forced to guess based on the image size; if the guess is incorrect, the image may not be usable.
The classic solution to this problem is TeleDisk, originally developed by Sydex. The TeleDisk format to a large extent mirrors the capabilities of the PC floppy controller. TeleDisk stores each track and each sector separately, with sectors ordered in physical order of appearance. Thus TeleDisk can preserve disks with any kind of custom interleave (and any order and values of sector IDs, including duplicate sector IDs), any sector sizes, and any track layout, including layouts with missing tracks etc. Bad sector information is also preserved.
In short, TeleDisk captures and preserves much of the metadata which raw images cannot. There’s little TeleDisk cannot handle, at least not when considering PC floppies. In other words, if TeleDisk can’t deal with a disk, the PC floppy controller probably can’t either.
The major drawbacks of TeleDisk are that the software is not freely available, generally only runs on DOS, the file format is not fully documented, and most emulators/virtualizers cannot read the TeleDisk format. This makes TeleDisk use quite impractical.
An alternative to TeleDisk is IMD, which solves most of the same problems and also shares most of the drawbacks. One notable difference is that the IMD format is well documented and source code for IMD utilities is provided.
The ultimate in floppy archival is Kryoflux which works on the lowest level—magnetic flux transitions. Kryoflux can preserve any floppy in any format, and can handle non-standard or deliberately out of spec formats.
The major drawback of Kryoflux is that it requires special hardware. While Kryoflux is very useful for non-PC disks, there is very little need for it in the PC arena. The PC floppy controller is a relatively high-level device, which means that software authors could not get too creative when recording floppies. And especially with OS/2 and Windows software, copy protected or otherwise “creative” floppies were extremely rare (with a notable exception of IBM’s XDF disks).
Usage and Needs
File archives are often adequate, though barely, and should be avoided for preservation as too much metadata is needlessly lost; and in too many situations, a file dump is not good enough. Raw images cover the vast majority of needs, probably well above 99% of cases (especially for software sold around 1990 and later). The ubiquity and ease of use makes raw image dumps the preferred format, despite the—arguably minor—drawbacks.
In cases where a raw image is not sufficient, TeleDisk or similar often does the job. Unfortunately, working with these formats is considerably more difficult, which forces them to be used only when truly necessary. If one such format were to be widely supported by emulators/virtualizers, that would considerably change the equation, but that appears to be unlikely given the relative rarity of disks which would require it. Another problem with “advanced” formats is that most of them are designed for storing and transferring floppy images but nor for emulating a read/write medium with random access.
And at least when dealing with PC floppies, the need for a hardware-based solution like Kryoflux is really minuscule.
Interestingly, in the world of CD-ROMs, the situation is quite similar. A raw image (so-called ISO image) is adequate in nearly all cases, even though it loses certain meta-information which other, less widely used formats can preserve. But that’s a different topic.
A secondary problem is conversion of existing floppy images in various more or less exotic formats to something usable by modern software, typically raw disk images. This often requires running DOS; perhaps the worst offenders are various self-extracting floppy images which require an actual (or at least emulated) floppy drive.
In many cases, the floppy images were created from standard disks and could be easily converted into raw images, if one had the appropriate tool, i.e. a tool that isn’t horribly cumbersome to use.
Now a question to readers… I wrote a simple image conversion utility for “internal use” at the OS/2 Museum. Currently the tool can take HDCP (HD-Copy), DIM, WCD, and many TeleDisk images and convert them into a raw image (obviously such conversion isn’t always possible!). I also have code to extract/convert compressed IBM .DSK images which could be incorporated.
The utility is not particularly polished and hasn’t been widely tested (I can only test with images I have, etc.). Would there be interest in such a utility, either in source or binary form, and would anyone be willing to help with development and testing? Note that the image backend is extensible and adding support for other formats would be easy, though decoding them may not be.