Preserving Floppies

For many years, software was delivered predominantly on floppies. This was true especially in the world of PCs where by definition (almost) every system contained at least one floppy drive and prior to the mid-1990s and mass arrival of CD-ROMs, there was no other standardized distribution medium (compare e.g. with UNIX workstations where software was typically delivered on tapes).

Since floppies, or indeed any storage medium, do not have unlimited lifespan, there is a need to preserve the contents of floppy disks. As time marches on, this is becoming a more pressing issue, especially with 5¼” disks; ironically, finding functioning drives is becoming harder than locating error-free floppies. But even with error-free disks and working drives, the next question is how exactly to preserve the data. There are numerous options, each with advantages and drawbacks.

File Archives

The most basic and least satisfactory method involves simply archiving the files on the disks. For some software (e.g. Windows 3.1 and much of Microsoft’s software of the era), this is adequate and the product can be fully installed. This method loses certain information such as disk volume labels or boot sectors.

In many cases, file archives are inadequate. One of the more common problematic situations is archiving bootable operating system disks; a file archive is more or less worthless. In some cases, installers rely on disk volume labels and archiving files is not sufficient. Then there are disks which do not necessarily contain a file system at all—this is common with many UNIX distribution disks which may contain raw dumps of tar or cpio archives.

There’s also a question of convenience. With a VM running some ancient operating system with no (or outdated) networking facilities, moving files around can be surprisingly challenging.

Raw Dumps

Raw disk dumps are the next level in floppy archival. Since they capture information on a sector rather than file level, they are suitable for archiving most disks, including bootable disks as well as disks with non-DOS filesystems.

Raw dumps should always cover the entire capacity of the storage medium. Interesting information (fragments of files or memory) can often be found in unallocated areas of the disk, especially with older software.

In most situations, raw dumps are the preferred archival format as they are easy to create, easy to use, and typically capture all necessary information. This is especially true of (roughly) post-1990s software which was mass produced and delivered for disk replication in the form of disk images. A raw disk image essentially re-captures the master image. Certain non-essential information may be lost, such as sector interleaving. This is rarely significant.

There are ample tools for manipulating raw disk images, such as rawwrite or dd. Just as importantly, nearly every emulator/virtualizer can use such images. Unfortunately, raw dumps are not the ultimate solution either. Especially copy protected disks cannot be archived this way.

A notable drawback of raw images is that they cannot contain any meta-information such as checksums or descriptions. However, that isn’t an insurmountable obstacle as raw images can be stored together with other files with checksum-protected archives.

Tricky Disks

A good example of a product that cannot be adequately archived using raw images is Lotus 1-2-3 R3.0 and several other Lotus products. The copy protection scheme used by Lotus does not rely on any kind of “out of spec” disk contents, just a sector with deliberately corrupted CRC. An unreadable sector cannot be represented in a raw image; yet if the sector is made readable, the Lotus product will refuse to install, correctly recognizing a modified disk.

Copy protection can be cracked, but that is not a satisfactory solution. First, it requires tampering with the product (then it is no longer an original), and second, the effort and skill required is far higher than the effort needed to archive a floppy.

Another case where raw images are inadequate is any non-standard, non-uniform disk format, such as disks with varying numbers of sectors per track or varying sector sizes, or generally any disk with “unusual” geometry. Since the raw image does not contain any geometry information, the user is forced to guess based on the image size; if the guess is incorrect, the image may not be usable.

The classic solution to this problem is TeleDisk, originally developed by Sydex. The TeleDisk format to a large extent mirrors the capabilities of the PC floppy controller. TeleDisk stores each track and each sector separately, with sectors ordered in physical order of appearance. Thus TeleDisk can preserve disks with any kind of custom interleave (and any order and values of sector IDs, including duplicate sector IDs), any sector sizes, and any track layout, including layouts with missing tracks etc. Bad sector information is also preserved.

In short, TeleDisk captures and preserves much of the metadata which raw images cannot. There’s little TeleDisk cannot handle, at least not when considering PC floppies. In other words, if TeleDisk can’t deal with a disk, the PC floppy controller probably can’t either.

The major drawbacks of TeleDisk are that the software is not freely available, generally only runs on DOS, the file format is not fully documented, and most emulators/virtualizers cannot read the TeleDisk format. This makes TeleDisk use quite impractical.

An alternative to TeleDisk is IMD, which solves most of the same problems and also shares most of the drawbacks. One notable difference is that the IMD format is well documented and source code for IMD utilities is provided.

Flux Preservation

The ultimate in floppy archival is Kryoflux which works on the lowest level—magnetic flux transitions. Kryoflux can preserve any floppy in any format, and can handle non-standard or deliberately out of spec formats.

The major drawback of Kryoflux is that it requires special hardware. While Kryoflux is very useful for non-PC disks, there is very little need for it in the PC arena. The PC floppy controller is a relatively high-level device, which means that software authors could not get too creative when recording floppies. And especially with OS/2 and Windows software, copy protected or otherwise “creative” floppies were extremely rare (with a notable exception of IBM’s XDF disks).

Usage and Needs

File archives are often adequate, though barely, and should be avoided for preservation as too much metadata is needlessly lost; and in too many situations, a file dump is not good enough. Raw images cover the vast majority of needs, probably well above 99% of cases (especially for software sold around 1990 and later). The ubiquity and ease of use makes raw image dumps the preferred format, despite the—arguably minor—drawbacks.

In cases where a raw image is not sufficient, TeleDisk or similar often does the job. Unfortunately, working with these formats is considerably more difficult, which forces them to be used only when truly necessary. If one such format were to be widely supported by emulators/virtualizers, that would considerably change the equation, but that appears to be unlikely given the relative rarity of disks which would require it. Another problem with “advanced” formats is that most of them are designed for storing and transferring floppy images but nor for emulating a read/write medium with random access.

And at least when dealing with PC floppies, the need for a hardware-based solution like Kryoflux is really minuscule.

Interestingly, in the world of CD-ROMs, the situation is quite similar. A raw image (so-called ISO image) is adequate in nearly all cases, even though it loses certain meta-information which other, less widely used formats can preserve. But that’s a different topic.

Image Conversion

A secondary problem is conversion of existing floppy images in various more or less exotic formats to something usable by modern software, typically raw disk images. This often requires running DOS; perhaps the worst offenders are various self-extracting floppy images which require an actual (or at least emulated) floppy drive.

In many cases, the floppy images were created from standard disks and could be easily converted into raw images, if one had the appropriate tool, i.e. a tool that isn’t horribly cumbersome to use.

Now a question to readers… I wrote a simple image conversion utility for “internal use” at the OS/2 Museum. Currently the tool can take HDCP (HD-Copy), DIM, WCD, and many TeleDisk images and convert them into a raw image (obviously such conversion isn’t always possible!). I also have code to extract/convert compressed IBM .DSK images which could be incorporated.

The utility is not particularly polished and hasn’t been widely tested (I can only test with images I have, etc.). Would there be interest in such a utility, either in source or binary form, and would anyone be willing to help with development and testing? Note that the image backend is extensible and adding support for other formats would be easy, though decoding them may not be.

This entry was posted in PC history, Virtualization. Bookmark the permalink.

19 Responses to Preserving Floppies

  1. mrpijey says:

    Running a preservation society ourselves I think it would be highly beneficial to be able to try the conversion tool you mention, even when preserving a floppy with Teledisk or HD-Copy there would be a great need of a tool where you can simply convert it to raw IMG or whatever so you can extract and manipulate the file and image itself. And I am also an owner of a Kryoflux unit so it would be even better to be able to compare and evaluate several formats, and with a handy conversion tool it would be gold.

    Sign me up if you need a tester!

  2. Michal Necasek says:

    Let me review my notes… okay, the HDCP format is fairly useless. The only advantage it has compared to raw images is that it includes geometry information (nice), but it doesn’t even have an unambiguously recognizable header (boo!). You’d have to try very hard to find a floppy that HDCP can correctly represent but raw format can’t. HDCP also doesn’t seem to have any provision for imaging single-sided disks, and the RLE compression it uses is worthless nowadays. The format was useful 20 years ago, but times have changed.

    TeleDisk is of course a very different kettle of fish, as I wrote in the article 🙂

    I’ll get in touch with you privately about the rest.

  3. Rauli says:

    In Java?

  4. Michal Necasek says:

    Nah, just C. It should run on DOS, too 🙂

  5. John Elliott says:

    From my point of view, I’d welcome contributions of additional formats to LibDsk, which among its currently-supported formats handles TD0 (uncompressed), CopyQM, .CFI, raw, and various other formats used by emulators.

    One disadvantage of raw dumps is that they tend to assume a particular mapping of cylinder/head/sector to logical track. In the Spectrum preservation world, that’s given rise to two mutually incompatible raw formats for the ‘+D’ interface — one in cylinder/head/sector order, and one in filesystem order (head/cylinder/sector).

  6. William Barnett-Lewis says:

    Having fought recently with teledisk images & imd images to get them into a raw format Qemu could understand as part of my project to get together a complete Xenix 386 system, I can sympathize all too well. Many times the few archived disks I’d found were in TeleDisk format which I only learned much later could be understood by the IMD utilities and eventually converted to a raw format. In the end, I was able to get the images into a state that all could be read and loaded into Qemu so that custom could install them but it wasn’t as easy as it could be. For that matter, how much longer will these images survive it it’s this difficult now? How much has been lost and how much is in danger of being lost? I’ve gotten my work with Xenix out as a torrent recently, so hopefully it’ll spread out at least into the retro-computing community.

  7. Michal Necasek says:

    The thing about raw images is that in the PC arena, they’re a classic 99% solution. They are not ideal, but they can be easily created from and restored to physical disks (dd, rawrite), they’re understood by every emulator/virtualizer (often as the only floppy format!), and in the vast majority of cases, they do the job just fine. And changing that is somewhere between very hard and impossible.

    Nice job with LibDsk BTW 🙂

  8. Michal Necasek says:

    The funny thing is that Xenix really does not need any fancy format like TeleDisk. I’m not aware of any PC UNIX (or clone) which supported any sort of “strange” floppy format at all. Heck, some of those disks didn’t even have any filesystem on them, they were simply used as a raw block device.

    I suspect the reason there are so many TeleDisk images around is that the tools were really good. The authors clearly knew what they were doing and both TeleDisk and AnaDisk are about the best I’ve seen. Their electronic floppy distribution tools (self-extracting diskettes) were widely used as well.

    But… nowadays running DOS executables is getting to be a pain in the butt (not hard, but inconvenient) and actual floppy drives are just about extinct. The times have changed.

  9. I have quite a few floppies to preserve, as you might imagine. I can test image software, preferably running on eComStation. I like the idea of images that can be read from Virtual PC.

  10. Michal Necasek says:

    The one tool I most often used with OS/2 to create/restore floppy images was the one that came with the OS… XDFCOPY. It will handle all normal floppies as well as the special high-capacity XDF ones. The images are usable with Virtual PC, VirtualBox, etc.

    If I remember correctly, IBM’s DIUNPACK was also able to list the contents of images created that way (as long as they were standard FAT-format floppies of course), and extract the individual files.

  11. tahrey says:

    I know of a thing called “Pasti” for the Atari ST which works on the sort of medium level that you need … it basically saves the exact data that comes out of the floppy controller output when you read the disc, including CRCs and such, without having to go right down to the magnetic domains. Don’t know if there’s anything like it for the PC, but if you have any double-density 3.5″ discs, or 5.25″ ones, access to an Atari, and an ST-compatible drive for them (such things do exist), it should work the same way, as it doesn’t care what the data recorded on the disc is… and the two systems share quite a lot of cross compatibility when it comes to floppies.

    Of course, if the disc uses a completely custom setup that requires use of the software recorded upon it to read (ie Track 0 is standard, and loads up some kind of small, TSR style custom disc controller…) then you’re out of luck either way…

    Also, having been looking for something else along those lines myself earlier on, I’ve come to know (via Wikipedia and then a much more technical website with actual manual scans on it as PDFs) that, e.g. the old Western Digital floppy/hard disc combination controller cards had a function called “Read Long” (and indeed, “Write Long”) which were able to do this for you… it was a default part of the standard to allow fairly simple software to be able to read the raw sectors including CRC from a disc, and even write them back without any controller interference (for softies to produce or duplicate their own copy-protected discs, one presumes). So if you can get hold of one of those and the programmer’s reference, you could come up with your own software to do the job and then hire your services out. There must be still enough people out there with boxes of unpreserved floppies they want to save the programs and data from that it could be a nice little sideline for years to come…

    Also, I’ve got an old 1.2MB 5.25″ internal drive from a 286 knocking around somewhere, and last time I plugged it in and tested it, it seemed to work. How much do you want to pay? 🙂

  12. Michal Necasek says:

    I’m not too familiar with the Atari ST. I do know that the Amiga had a very different disk controller (or perhaps in some sense lacked one!) and software had a much greater latitude when dealing with disks. PCs are different, the FDC is fairly high-level, and it’s possible to create disks that cannot be replicated using a PC floppy controller, even though PC floppy controllers can work with such disks.

    Read Long is a hard disk controller command, not a floppy controller one. The FDC command Read ID can be used to discover the sector structure of a floppy, and Read Track can be used to read the sector data, including sectors with bad CRC. The software that uses these commands to archive most disks was written about 25 years ago and it’s called TeleDisk 🙂

    To get better capabilities, custom hardware is needed. From what I’ve read, it sounds like KryoFlux can read/write just about anything because it skips the FDC entirely and controls the drive on its own.

    I still do have working 5.25″ drives so I don’t need yours… yet 🙂

  13. Robcfg says:

    Hi Michal!

    Too often I’ve encountered the problem of converting between different floppy image formats, so I’ll be glad to help test your conversion tool as I feel it’s something we need to help with preservation.

    I have an old K6-2 400mhz machine to use Teledisk and other older programs, and an Athlon 1Ghz machine with linux to run Kryoflux.

    Best regards,
    Rob

  14. David Graham says:

    Some of you are probably already familiar with it, but I have an old donationware TSR program from the 1990’s called 800II: DOS Format Enhancer ver. 1.66. IIRC it helped me with a Tandy 2000 format problem back then. Found a copy just now at: http://esca.atomki.hu/paradise/sac/utildisk.html . They have a copy of 800 II v1.80 if it helps anyone. It can fomat across disk types and has many sector and track options available at the command line for special formats. From the documentation I have:

    “Format &
    Floppy Suitable Drives DOS FORMAT Example
    ——————————————————————-
    360K DD 5.25″ 360K/1.2M, 3.5″ 720K/1.44M FORMAT d: /T:40 /N:9
    400K DD 5.25″ 360K/1.2M, 3.5″ 720K/1.44M FORMAT d: /T:40 /N:10
    720K DD 5.25″ 1.2M, 3.5″ 720K/1.44M FORMAT d: /T:80 /N:9
    800K DD 5.25″ 1.2M, 3.5″ 720K/1.44M FORMAT d: /T:80 /N:10
    1200K HD 5.25″ 1.2M, 3.5″ 1.44M FORMAT d: /T:80 /N:15
    1360K HD 5.25″ 1.2M, 3.5″ 1.44M FORMAT d: /T:80 /N:17
    1600K HD 3.5″ 1.44M FORMAT d: /T:80 /N:20

    It is also possible to use some other formats. With the FORMAT command, you can use from 1 to 85 for /T: (tracks), depending on the type of drive (however, most formats using a value over 80 for tracks may not be completely reliable). All formats except for 1600K can be made bootable by the FORMAT command /S switch.

    800 II allows you to use DISKCOPY between drives of different types, provided that they both support the format of the floppy to be copied. This way you can now use DISKCOPY between 5.25″ and 3.5″ drives.

    800 II uses as little as 944 bytes of resident memory, and can be loaded high with an appropriate memory manager.”

    It works well for me, but I only run on real hardware from the 1990’s – don’t know about VM. Don’t know why this shouldn’t be of help to someone, but sadly the copy on the net doesn’t seem to have much English documentation. The author is Italian.

  15. Darkstar says:

    Hi,

    I recently found some old chinese(?) CD images on the net which contain hundreds of .img files created by hdcopy. As the files themselves are only numbered and thus I have to check them all, it’d be really helpful if you could provide your tool to mass-convert hdcopy images back to regular .img files.

    Alternatively, if you happen to have a link describing the format/compression, that would be sufficient as well, as I can code my own converter then

    thanks,
    -Darkstar

  16. Michal Necasek says:

    See e-mail (two to be precise).

  17. Jonathan Berry says:

    I was going to add that raw dumps have 256 valid choices for each byte, but the real world has 257: 0-255 plus unreadable. However, the discussion here reveals that the challenges are far more complex. Nice to read the grown-up subjects on this site.

    Could the verbose XML be part of the solution? One attraction of using it is extensibility, not just 256 to 257, but so many other factors.

    I came to this hoping to find a virtual environment to run Xerox Ventura Publisher 2, since there are hiccups in XP for example. XVP2 ran well in DOS or OS/2 or Win9x. Though haven’t yet seen a solution better than to fire up a legacy machine with the software already installed.

    An imperfect way of approaching old software (including floppies) is to preserve it as an installed package, a virtual machine, an appliance. One can easily find virtual machines for linuxes; the absence of the same for say Win98SE or OS/2 must have to do with copyright and EULA considerations, but I wonder if enforcing 25+-year-old EULAs is in the public good.

  18. Michal Necasek says:

    It’s not that simple (or maybe it’s simpler?). Each byte has a value between 0 and 255, but the bytes live together in a sector and that sector may or may not have a valid CRC. It also matters whether one considers only floppies as they’re visible through a PC floppy controller or magnetic media more generally.

    All the infrastructure is there for prepackaged “appliances” (OVA format, easy export/import) but yes, the insane copyright law is a problem. Unless (say) Microsoft officially states that Windows 3.1 or Windows 95 can be freely redistributed, no company will do it because Microsoft should shut them down any minute. They wouldn’t bother, but a stupid law is still a law.

    I don’t know exactly what your needs are but I imagine that if your have some substantial documents in Ventura Publisher, you’re pretty much stuck with VP because getting VP running (however hard) is several orders of magnitude less labor intensive than converting the documents to some other format.

    What exactly is the problem with Ventura Publisher original disks? Something unusual about them?

  19. Andre says:

    I see the kryoflux mentioned, but not the catweasel. Does anyone have experience with those controllers? What are the advantages from one over the other (except internal vs external hardware)

Leave a Reply

Your email address will not be published. Required fields are marked *