Merging Even and Odd Pages from a Scanned Two-Sided Document[ ]
At home, for a while we’ve had a Brother DCP-L2540DW multifunction laser printer/scanner/copier, which has worked great for us. Connected effortlessly to the WiFi and to both our Windows laptop and my Debian server. The Windows dashboard is well laid out and well designed. It even has a document feeder for copying and scanning—the big hitch there, though, is that it only supports single-sided scanning; I haven’t found an option in the software to stitch together the even and odd pages. Well, with some recent experience under my belt in reading PDFs with
minecart (hmmm, still need to write that follow-up post…), and with a couple of ~20-page double-sided documents in hand that I needed to scan, I decided it was finally time to figure out how to do this.
The quickest way to scan both sides of a document using a one-sided sheet feeder gives you two PDFs, one with the odd-numbered pages in ascending order and one with the even-numbered pages in descending order. Thus, any solution has to interleave the pages while also taking the even-numbered pages back-to-front. While I had found
minecart to be pretty user-friendly for reading PDFs, it does not provide functionality for writing them. A quick Google search led to this Stack Overflow question, about reversing the pages of a PDF with
pyPdf . Score!
Long story short:
pyPdf is Python 2 only, and thus wasn’t an option for me. However, it has a spinoff project,
PyPDF2 , that is compatible (at least) with Python 3.6. With the two scan PDFs named as
doc_even.pdf, running the following
scanmerge.py script (also posted as a Gist) as “
> python scanmerge.py doc” produces the desired
doc.pdf with the pages nicely interleaved, and in the correct order:
import itertools as itt import sys import PyPDF2 as PDF def main(): fbase = sys.argv pdf_out = PDF.PdfFileWriter() with open(fbase + "_odd.pdf", 'rb') as f_odd: with open(fbase + "_even.pdf", 'rb') as f_even: pdf_odd = PDF.PdfFileReader(f_odd) pdf_even = PDF.PdfFileReader(f_even) for p in itt.chain.from_iterable( itt.zip_longest( pdf_odd.pages, reversed(pdf_even.pages), ) ): if p: pdf_out.addPage(p) with open(fbase + ".pdf", 'wb') as f_out: pdf_out.write(f_out) return 0 if __name__ == "__main__": if len(sys.argv) != 2: print("Wrong number of arguments!") sys.exit(1) sys.exit(main())
Clearly, there’s room for further improvement/development here – it would be a lot more robust to use a proper argument parser like
argparse, and a more general script would allow for the reversal of the even pages to be optional. For my main use-case, though, it worked great! Note that it’s necessary to hold open the I/O streams for the input PDFs (here,
f_even) until after the write of the output PDF is completed. Waiting to write the output PDF until after closing the input I/O streams yields a PDF with a bunch of blank pages.
Incidentally, the more I use
itertools, the more I like it. I’ve used
zip_longest more times than I can count, and I’ve used
repeat at least once apiece. Here,
zip_longest (plus the subsequent
if p: ...) interleaves the pages while accounting for the possibility that there is one fewer even page than odd, and
chain.from_iterable cleanly assembles the interleaved pages from the sequence of page-pairs generated by