A few days ago I scanned a document and unwisely used MS Paint on Windows 7 to touch up a small selection of the scanned pages. Once the dozen or so files were re-touched, I combined them into a PDF together with the untouched originals. Today I tried OCRing the resulting PDF in Acrobat Pro, only to be greeted with an error message complaining that “this page” cannot be processed because it’s larger than 45 x 45 inches.
By now I’m used to the fact that Acrobat Pro oscillates between really wonderful and amazingly shoddy, so I wasn’t surprised that there was not the slightest hint which of the 302 pages might be causing trouble.
After pinning down the troublemaking manually, it of course turned out to be one of the edited ones… and the problem was that MS Paint somehow butchered the DPI values stored in the file. Oops…
Most of the files I edited in MS Paint ended up with DPI of 599.999 instead of the original 600 (WTF?!), but that was close enough to the original not to cause problems. However, one file somehow ended up with a DPI of 96.012 (don’t look at me, I have no idea either!). Because Acrobat pays attention to the DPI, this made the scanned page look huge in terms of physical dimensions. So how could I get out of this mess?
There’s no obvious way to fix the DPI in MS Paint itself or in Preview. Hex-editing the file would probably work, but that seemed like a last resort. Luckily, OS X comes with a tool that can do this easily:
sips (scriptable image processing system).
sips, fixing the DPI values was trivial:
sips --setProperty dpiHeight 600 --setProperty dpiWidth 600 \ scan_0182.tiff --out scan_0182a.tiff
I replaced the messed up page in the PDF and voila, the dimensions were right and Acrobat no longer refused to perform the OCRing. Phew!