Reading the "Metro" on my Hanlin V3

It was recently brought to my attention that the free newspaper "Metro" has an online version, where you can download the entire newspaper in PDF form; or use it in an online version, which consists of a series of images of each page, where if you click on an article, the usemap of that image is a link to the relevant article in HTML form.

Of course my hanlin doesn't have a webbrowser (it has a somewhat limited HTML parser, but that does not understand even hyperlinks, let alone usemaps), so the PDF version is what I need. Unfortunately, the only option which "Metro" provides is a set of one-page PDF files. They are somewhat readable on my hanlin (someone with worse eyes than my own would probably have trouble reading the small letters, but for me it's not a problem); however, the fact that rather than just using the 'page turn' buttons on my hanlin I have to close the current file, open the next, and reconfigure the zoom all over, means that this multi-PDF thing isn't exactly great, and that I'd prefer just getting one PDF file instead.

So I got to work, and tried out a few things.

There doesn't appear to be anything to merge multiple PDF files into one. There is 'psmerge', but that only does PostScript files, not PDF ones. But that's not a problem, because PDF files can easily be converted to postscript and back, right?

Well, no.

The first pdf-to-postscript converter that I tried was 'pdf2ps', a wrapper script around ghostscript. Unfortunately, using that results in some ugliness:

original PDF
source, as rendered by Xpdf output of
pdf2ps, as rendered by gv

As you can see in the second image, using ps2pdf results in some quality loss. Ghostscript apparently doesn't understand the letter 'A' very well, and (more importantly) loses the anti-aliasing that is part of the pdf file. Converting this file back to PDF and storing it on the hanlin results in an unreadable text.

But it gets worse. Once I had converted all those PDF files to postscript using pdf2ps and merged them with psmerge, the output was gibberish. Or Klingon, take your pick:

psmerge
output, as rendered by gv

In case you were wondering, yes, that is the exact same fragment (counting lines sucks) at the exact same zoom level. Clearly this was a dead end.

I found that there was a second pdf-to-ps converter, called 'pdftops', which uses the xpdf code base to do its thing. This converter produced clearly better PDF output; I could not see any difference between the Xpdf rendering of the original file, and the gv rendering of the pdftops output. Also, the psmerge output does not garble things as badly:

better psmerge
output, as rendered by gv

Sadly, it still loses anti-aliasing, and thereby the readability of the document on my ebookreader device. Moreover, the pdftops output seems to confuse psmerge, to the extent that it hangs on some pages.

Another approach I tried was to import the pdftops output into scribus, and create a multi-page PDF file with that program. Unfortunately, however, scribus does not seem to like the pdftops output.

Xpdf also has a 'pdftoppm' converter. With that, I was able to create a multi-page .pdf file that was not garbled, and did still contain anti-aliasing. Unfortunately, since the PPM format is a raster image format, the PDF file that is created in this way does not scale very well, resulting in artifacts and, again, an unreadable PDF file on the hanlin device.

So it appears that for now I'll be stuck with storing multiple files on my device. Sigh. I wish that wouldn't be necessary...

Update: So I needed pdftk.