Reading the "Metro" on my Hanlin V3
It was recently brought to my attention that the free newspaper "Metro" has an online version, where you can download the entire newspaper in PDF form; or use it in an online version, which consists of a series of images of each page, where if you click on an article, the usemap of that image is a link to the relevant article in HTML form.
Of course my hanlin doesn't have a webbrowser (it has a somewhat limited HTML parser, but that does not understand even hyperlinks, let alone usemaps), so the PDF version is what I need. Unfortunately, the only option which "Metro" provides is a set of one-page PDF files. They are somewhat readable on my hanlin (someone with worse eyes than my own would probably have trouble reading the small letters, but for me it's not a problem); however, the fact that rather than just using the 'page turn' buttons on my hanlin I have to close the current file, open the next, and reconfigure the zoom all over, means that this multi-PDF thing isn't exactly great, and that I'd prefer just getting one PDF file instead.
So I got to work, and tried out a few things.
There doesn't appear to be anything to merge multiple PDF files into one. There is 'psmerge', but that only does PostScript files, not PDF ones. But that's not a problem, because PDF files can easily be converted to postscript and back, right?
Well, no.
The first pdf-to-postscript converter that I tried was 'pdf2ps', a wrapper script around ghostscript. Unfortunately, using that results in some ugliness:
As you can see in the second image, using ps2pdf results in some quality loss. Ghostscript apparently doesn't understand the letter 'A' very well, and (more importantly) loses the anti-aliasing that is part of the pdf file. Converting this file back to PDF and storing it on the hanlin results in an unreadable text.
But it gets worse. Once I had converted all those PDF files to postscript using pdf2ps and merged them with psmerge, the output was gibberish. Or Klingon, take your pick:
In case you were wondering, yes, that is the exact same fragment (counting lines sucks) at the exact same zoom level. Clearly this was a dead end.
I found that there was a second pdf-to-ps converter, called 'pdftops', which uses the xpdf code base to do its thing. This converter produced clearly better PDF output; I could not see any difference between the Xpdf rendering of the original file, and the gv rendering of the pdftops output. Also, the psmerge output does not garble things as badly:
Sadly, it still loses anti-aliasing, and thereby the readability of the document on my ebookreader device. Moreover, the pdftops output seems to confuse psmerge, to the extent that it hangs on some pages.
Another approach I tried was to import the pdftops output into scribus, and create a multi-page PDF file with that program. Unfortunately, however, scribus does not seem to like the pdftops output.
Xpdf also has a 'pdftoppm' converter. With that, I was able to create a multi-page .pdf file that was not garbled, and did still contain anti-aliasing. Unfortunately, since the PPM format is a raster image format, the PDF file that is created in this way does not scale very well, resulting in artifacts and, again, an unreadable PDF file on the hanlin device.
So it appears that for now I'll be stuck with storing multiple files on my device. Sigh. I wish that wouldn't be necessary...
Update: So I needed pdftk.
useful tool for manipulating PDF documents If PDF is electronic paper, then pdftk is an electronic stapler-remover, hole-punch, binder, secret-decoder-ring, and X-Ray-glasses. Pdftk is a simple tool for doing everyday things with PDF documents. Keep one in the top drawer of your desktop and use it to: - Merge PDF documents - Split PDF pages into a new document - Decrypt input as necessary (password required) - Encrypt output as desired - Fill PDF Forms with FDF Data and/or Flatten Forms - Apply a Background Watermark - Report PDF on metrics, including metadata and bookmarks - Update PDF Metadata - Attach Files to PDF Pages or the PDF Document - Unpack PDF Attachments - Burst a PDF document into single pages - Uncompress and re-compress page streams - Repair corrupted PDF (where possible)
Author: Sid Steward ssteward@accesspdf.com
Play a bit with the parameters, but this command did create a 2 page pdf for me: gs -sDEVICE=pdfwrite -sOutputFile=file.pdf pdf1.pdf pdf2.pdf
I downloaded the pages from metro in pdf1.pdf and pdf2.pdf and ran above command. It returned warnings about the pdf not being correct, but it still produced a valid 2-page pdf.