After more or less accidentally coming across a BBS listing of various high-capacity floppy formatting programs, I began wondering: How much data can really be stored on a diskette in a PC floppy drive? And what’s the relationship between formatted and unformatted capacity? When I started doing the math, I realized that the problem is both simpler and more complex than I had thought. And that one megabyte is not like another.
Note: This discussion is limited to 3½” high-density floppies, by far the most common format, unless otherwise noted.
2.0 MB Unformatted Capacity
Since floppies store essentially analog signals, how is their theoretical capacity calculated? There are no addressable memory cells like those in RAM chips, so how does one arrive at 2 MB? The math is actually remarkably straightforward and has little to do with the medium and everything to do with the floppy controller (FDC) and drive.
There are several constants which determine the unformatted capacity: 80 tracks (really cylinders), 2 sides, 500 kbps, and 300 rpm.
A standard FDC reads and writes 3½” HD media at 500 kbps using MFM (Modified Frequency Modulation). In other words, every second the controller could read or write 500 kilobits of actual data.
A standard 3½” floppy drive rotates at 300 rpm, that is 5 revolutions per second (300 / 60). Given a 500 kbps data rate, the FDC can record exactly 100 (500 / 5) kbit (kilobits), or 12,500 bytes, on a single track. At 80 cylinders and 2 sides, that’s 80 * 2 * 12,500 or exactly 2,000,000 bytes (2 MB) of data.
1.44 MB Formatted Capacity, Or Is It?
Once a diskette is formatted, some of the bits and bytes on every track are used to store sector IDs or CRCs, and some space is left unused to give the FDC a bit of breathing space in the form of gaps (especially necessary when writing data). Exactly how much capacity is “lost” to the necessary overhead determines the formatted capacity.
The standard PC format of 3½” HD media uses 18 sectors (512-byte sectors that is) per track. That is 9,216 bytes or 9 KB of user-accessible data. At 80 tracks and two sides, that’s 80 * 2 * 9,216 or 1,474,560 bytes. If one megabyte is defined as one million bytes, that would be 1.475 MB. If one megabyte is defined as 1024 * 1024 bytes, that would be 1.406 MB.
So where does 1.44 MB come from? Well, it uses a unique definition of megabyte not normally used anywhere else in the industry: The capacity is 1,440 kilobytes or 1.44 MB, if one accepts this strange definition of megabyte being 1,000 KB.
Now this has an interesting corollary: It is commonly said that a 3½” floppy with 2.0 MB unformatted capacity has 1.44 MB formatted capacity. As explained above, those figures use different definitions of megabyte!
More, More, More!
Over the years, a number of people realized that the standard PC floppy format is quite conservative. No wonder—with 2.0 MB unformatted capacity going down to only 1.475 MB formatted (using the same definition of megabyte), a little over 25% of the capacity is “wasted” on all the overhead.
There are two basic methods of increasing storage capacity: Storing more tracks on a disk and reducing the format overhead. These are not mutually exclusive.
The most extreme form of squeezing more tracks on a disk is formatting 48 tpi (tracks per inch) 5¼” media as 96 tpi in high-density drives. This doubles the number of tracks on a disk and thus doubles the capacity. In my experience this method is unreliable, probably in part because it exceeds the rated capacity of the media. It’s also limited to 5¼” drives and worst of all, the result is still not nearly as good as actual high-density (1.2 MB standard formatted capacity) disk.
A less extreme method is adding just a few tracks, usually increasing the capacity from 80 to 82 or 84 tracks. This is not a problem for the medium because the disk is coated with magnetic substrate uniformly; if a disk can reliably hold 80 tracks, it can also hold 84. Software can typically deal with such disks without modification as well.
But there is a catch: Not every drive can move the read/write head to the 82nd or 84th track; this is a mechanical limitation. That makes this method problematic—it increases the capacity by only up to 5% at the risk of making the floppy unusable in some systems. The upshot is that formatting floppies to 82 or 84 tracks makes sense for local backups, but cannot be used for distribution media.
The last and most interesting method involves reducing the format overhead. The goal is not to squeeze more bits onto the medium than there’s officially space for, but rather recover some of the format overhead and turn it into user-accessible data. Hence media reliability is not affected at all. The biggest obstacle is the FDC, which is simultaneously too smart and not smart enough.
The most straightforward approach is simply increasing the number of (standard 512-byte) sectors per track by reducing the size of the gaps between sectors. This has the major advantage in that special drivers are usually not required; software which can handle 15 or 18 sectors per track can usually handle 20 or 21 just as well.
A widely used representative of this method is Microsoft’s DMF, which uses 21 sectors per track for a total capacity of 21 * 0.5 * 2 * 80 or 1,680 KB on a standard 3½” HD diskette.
This is still relatively far from the theoretical ideal, the unformatted capacity (DMF utilized about 86% of the unformatted capacity). The problem is that there is fixed per-sector overhead (IDs, checksums) which cannot be eliminated. The way to reduce the overhead is to use bigger sectors which have better natural ratio of usable data to overhead. And that’s where the real fun starts.
There are two major problems with using larger sectors. The first is a software problem: The BIOS and operating systems usually cannot handle non-standard sector sizes, which requires special drivers and thus hassle for users. The second is a hardware problem: The FDC only supports sectors whose size is a power of two, which creates a whole host of new complications.
To get the per-sector overhead to absolute minimum, there would have to be a single sector per track. But as explained above, the unformatted capacity of a single track of a 3½” HD floppy is 12,500 bytes, which very inconveniently falls right between the two closest possible sector sizes, 8 KB and 16 KB. 8 KB is less than even the conservative standard 1.44 MB format (which provides 9 KB per track), and 16 KB cannot possibly fit onto a single track.
Most people never heard of FORMAT1968/READ1968 utilities written in 1992 by Oliver Fromme, better known as the author of the popular HD-Copy utility. The FORMAT1968 utility claimed to squeeze 1,968 KB of data onto a standard 3½” HD floppy.
The FORMAT1968 utility used three 4 KB sectors per track (12 KB per track) and 82 tracks (really cylinders), which gave 12 * 2 * 82 or 1,968 KB.
The reason why this utility remained unknown is that it didn’t work on many systems. The problem is that not all drives run at exactly 300 rpm; even manufacturer specifications typically allow 1-2% slower or faster speed. And if a drive rotates faster, the capacity goes down—because the FDC has less time to read or write the data on each track.
A drive that spins 1.5% faster (304.5 rpm) could only store about 98,522 bits or 12,315 bytes per track. That’s not enough to store 3 sectors holding 12,288 bytes of data due to the required per-sector overhead (which is at least about 30 bytes per sector plus required gaps). A similar problem would occur if the FDC processed data at a rate slightly slower than 500 kbps. Of course if the drive rotated slightly slower, there would be more room on the disk… but that cannot be assumed to be the case.
Using bigger (8 KB) sectors is out of the question (one is not enough, two can’t fit), and using smaller sectors (2 KB or 1 KB) only makes the sector overhead worse.
Mix and Match
The only way to reduce the sector overhead and still be able to actually read and write the disks on all systems is to use a mix of sector sizes. One 8 KB sector may be used because that provides the lowest relative overhead. The question is then how to utilize most of the remaining slightly less than 4 KB of available space.
The approach chosen by XDF uses one sector each in 8 KB, 2 KB, 1 KB, and 512 byte sizes. That adds up to 11.5 KB per track or 1,840 KB (11.5 * 2 * 80) per disk. That is about 94.2% of the unformatted capacity, rather better than the 73.7% utilization of the standard 1.44 MB format. For reasons noted above, XDF only used the standard 80 tracks per side. With 82 tracks, it would have gotten up to 1,886 KB of user-accessible storage.
1,886 KB is in fact the capacity provided by the 2M utility by Ciriaco García de Celis (using 82 tracks/cylinders), even though that utility uses a slightly different physical format. The major difficulty with this approach is convincing standard FDCs to format such disks. The solution was previously described in the XDF article (formatting with 128-byte sectors but supplying sector IDs indicating larger sectors).
Goodbye, Sectors! Well, Almost…
The 2MGUI (GUI in this case stands for Guinness, not Graphical User Environment) utility (by the same author as 2M) went further and reduced the sector overhead to the barest minimum: One sector per track. Obviously since the FDC does not support arbitrary sector lengths, some trickery must have been involved.
Reading arbitrary length sectors is in fact not particularly difficult. As long as the sector length stored in the sector ID on the medium is longer than the actual length, the DMA controller can be programmed for the desired length and all requested bytes will be transferred. The read command will presumably fail (because it won’t find a valid CRC), but that can be solved by manually calculating a checksum and storing it in the sector’s data field.
Writing such sectors is considerably more difficult. 2MGUI formats tracks with one nominally 128-byte sector but its length indicating 16 KB (for HD floppies) or 32 KB (for ED media). That’s the easy part, but it explains how a single over-size sector can take up an entire track.
Writing the sector data is tricky because even if the FDC receives less than a full sector’s worth of data, it will keep writing zeros until the end of sector and will calculate and write a CRC. Since writing a full 16KB sector would overwrite the sector’s header and the first few thousand bytes of data at the beginning of the track, 2MGUI resets the FDC before that can happen.
2MGUI can operate in non-DMA mode where the data is fed to the FDC directly byte by byte. This is relatively straightforward because the FDC is reset after writing the desired number of bytes. In DMA mode, the 8254 PIT (Programmable Interval Timer) is used to precisely measure how long it takes to write the desired amount of data. Once the time elapses, the FDC is reset.
Regardless of whether non-DMA or DMA mode is used, interrupts must be disabled for the entire duration of track write (at least 200 milliseconds), in the former case to avoid underruns and in the latter case to avoid overwriting the beginning of the sector/track.
This is why 2MGUI remained a proof-of-concept utility and why its documentation mentions that it is not suitable for use in multi-tasking environments. However, all the negatives aside, it is highly likely that 2MGUI truly reaches the limit of floppy capacity achievable on PC hardware.
As to why 2MGUI works at all, understanding the FDC operation provides the answer. A read command starts delivering data to the host as soon as it finds the requested sector ID, which 2MGUI does provide. The fact that the end of the sector is missing does not in any way affect the data stored before the cut-off. The same is true for writing sectors—the fact that the write command is forcibly terminated before it completes does not in any way impact the data written before the command was aborted.
The maximum capacity achievable with 2MGUI cannot be generally stated because it depends on the hardware used. It can be over 2,000,000 bytes when using 82 tracks and slower-rotating drives.
It is an interesting fact that any 2MGUI disk (at least with a sufficient gap) should be readable in any system, but it may not be writable. The reason for this is that the FDC data separator locks on the data rate actually delivered by the drive, and can thus handle somewhat more (or less) than the nominal 500 kbps. On the other hand, when writing data, the nominal data rate is used and systems with faster-spinning drives simply won’t be able to write as many bytes per track.
All standard data formats, and even extended data formats like 2M or XDF, are designed with enough safety margin to be readable and writable on all systems. Their purpose was always to increase storage capacity, not implement a new form of copy protection.