Merging Even and Odd Pages from a Scanned Two-Sided Document

[ itertools pdf python ]

At home, for a while we’ve had a Brother DCP-L2540DW multifunction laser printer/scanner/copier, which has worked great for us. Connected effortlessly to the WiFi and to both our Windows laptop and my Debian server. The Windows dashboard is well laid out and well designed. It even has a document feeder for copying and scanning—the big hitch there, though, is that it only supports single-sided scanning; I haven’t found an option in the software to stitch together the even and odd pages. Well, with some recent experience under my belt in reading PDFs with minecart (hmmm, still need to write that follow-up post…), and with a couple of ~20-page double-sided documents in hand that I needed to scan, I decided it was finally time to figure out how to do this.

The quickest way to scan both sides of a document using a one-sided sheet feeder gives you two PDFs, one with the odd-numbered pages in ascending order and one with the even-numbered pages in descending order. Thus, any solution has to interleave the pages while also taking the even-numbered pages back-to-front. While I had found minecart to be pretty user-friendly for reading PDFs, it does not provide functionality for writing them. A quick Google search led to this Stack Overflow question, about reversing the pages of a PDF with pyPdf . Score!

Long story short: pyPdf is Python 2 only, and thus wasn’t an option for me. However, it has a spinoff project, PyPDF2 , that is compatible (at least) with Python 3.6. With the two scan PDFs named as doc_odd.pdf and doc_even.pdf, running the following scanmerge.py script (also posted as a Gist) as “> python scanmerge.py doc” produces the desired doc.pdf with the pages nicely interleaved, and in the correct order:

import itertools as itt
import sys

import PyPDF2 as PDF


def main():
    fbase = sys.argv[1]

    pdf_out = PDF.PdfFileWriter()

    with open(fbase + "_odd.pdf", 'rb') as f_odd:
        with open(fbase + "_even.pdf", 'rb')  as f_even:
            pdf_odd = PDF.PdfFileReader(f_odd)
            pdf_even = PDF.PdfFileReader(f_even)

            for p in itt.chain.from_iterable(
                itt.zip_longest(
                    pdf_odd.pages,
                    reversed(pdf_even.pages),
                )
            ):
                if p:
                    pdf_out.addPage(p)

            with open(fbase + ".pdf", 'wb') as f_out:
                pdf_out.write(f_out)

    return 0


if __name__ == "__main__":

    if len(sys.argv) != 2:
        print("Wrong number of arguments!")
        sys.exit(1)

    sys.exit(main())

Clearly, there’s room for further improvement/development here – it would be a lot more robust to use a proper argument parser like argparse, and a more general script would allow for the reversal of the even pages to be optional. For my main use-case, though, it worked great! Note that it’s necessary to hold open the I/O streams for the input PDFs (here, f_odd and f_even) until after the write of the output PDF is completed. Waiting to write the output PDF until after closing the input I/O streams yields a PDF with a bunch of blank pages.


Incidentally, the more I use itertools, the more I like it. I’ve used chain and zip_longest more times than I can count, and I’ve used count, cycle, and repeat at least once apiece. Here, zip_longest (plus the subsequent if p: ...) interleaves the pages while accounting for the possibility that there is one fewer even page than odd, and chain.from_iterable cleanly assembles the interleaved pages from the sequence of page-pairs generated by zip_longest.

Written on January 15, 2019