A First Attempt at Executable Packaging with pex[ ]
For an ongoing project here at work, I’ve written some Python tooling for automatically importing some chromatograms generated by the instrument software and extracting peak data from them. This workflow is necessary because our instrument is old, and the software that drives it was never actually intended to generate chromatograms: the only way to get the time scan data out is to … print to PDF.
So, the data (which is the intensity of the emission signal for chromium over time measured from an ICP-OES) looks like this:
And what we want, ultimately, is for it to look like this:
Also, we want the peak and trace information exported to Excel-readable formats in various ways, for downstream analysis.
Skipping directly to the end of the development story (the full telling of which must wait for another post):
it works! Through a combination of
and a nifty PDF-inspecting library called
I can pull what I need out of the PDFs, detect the peaks in the data trace, and calculate
(what are hopefully) all of the properties of the peaks that we’ll need.
Then, the plotting niftiness of
and the XLSX export capabilities of
work nicely to generate the outputs we need.
With the working code in hand, though, the question presents itself: How to enable other people,
the ones who will actually be collecting the data, to use this tool? Well, happily,
the fine folks over at Talk Python
and Python Bytes had
a few times
of the tool
(see also here
which can create (almost) single-object Python executables.
Sounded like a great first thing to try.
pex has a lot of options, and a lot of different ways you can go about hooking into the
code you compile into the
.pex file. What I’m describing here is
what worked for me on this first attempt to use it.
The first step was to pull the code out of Jupyter and set it up as a proper Python package,
which I called
icicp. (The data being analyzed is from an
with an ion chromatography
pre-separation; thus, IC-ICP-OES, and thus
Having built packages a few times before, it was a pretty straightforward process.
pex allows you to link to a default entry point when you build the
icicp.pex, so that
it can just be invoked as
python icicp.pex to automatically execute the desired chain
of code. (Note that you DON’T have to specify this entry point in your
pex to be able to use it.)
I’d converted the Jupyter notebook cells responsible for actually driving the
workhorse code into a
runner.py helper module; a
main() entry point function in that
module worked nicely as the target for this default
So, at the point of being ready to start configuring
pex, my package tree looked like this:
icicp/ |-- icicp/ | |-- __init__.py | | : __version__ | |-- chroma.py | |-- core.py | |-- peak.py | |-- runner.py | : main() |-- setup.py |-- requirements-pex.txt
Since this code isn’t currently meant for public distribution
(too inflexible and application-specific),
from setuptools import setup from icicp import __version__ setup( name="icicp", version=__version__, description="IC-ICP-OES Chromatogram Analyzer", packages=["icicp"], python_requires=">=3.6", )
pex doesn’t care if you’re running it inside a virtual environment, or within
the root directory of the source tree of a package on disk, or wherever—no matter what, it doesn’t
inspect its environment to see if there might be code you want it to include. It stuffs
exactly the packages you tell it to into the built
.pex file, and ONLY those packages
(plus any dependencies).
I haven’t yet figured out how to tell
pex to include an on-disk source tree, so
I had to make a wheel of the
icicp package (
python setup.py bdist_wheel).
For the other requirements, I specified pinned versions for my top-level dependencies
attrs==18.1 numpy==1.15 matplotlib==2.1.2 minecart==0.3.0 openpyxl==2.5.9 peakutils==1.3.0 scipy==1.1 tqdm==4.28.1
pip install pex in my working virtual environment, the following command
(Windows 10 environment) successfully built a working
C:\...\icicp> pex -v -r requirements-pex.txt dist\icicp-0.1.3-py3-none-any.whl -e icicp.runner:main -o icicp.pex icicp==0.1.3 -> icicp 0.1.3 attrs==18.1 -> attrs 18.1.0 numpy==1.15 -> numpy 1.15.0 matplotlib==2.1.2 -> matplotlib 2.1.2 minecart==0.3.0 -> minecart 0.3.0 openpyxl==2.5.9 -> openpyxl 2.5.9 peakutils==1.3.0 -> PeakUtils 1.3.0 scipy==1.1 -> scipy 1.1.0 tqdm==4.28.1 -> tqdm 4.28.1 cycler>=0.10 -> cycler 0.10.0 pytz -> pytz 2018.7 python-dateutil>=2.1 -> python-dateutil 2.7.5 six>=1.10 -> six 1.11.0 pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 -> pyparsing 2.3.0 pdfminer3k -> pdfminer3k 1.3.1 et-xmlfile -> et-xmlfile 1.0.1 jdcal -> jdcal 1.4 ply>=3.4 -> ply 3.11 pytest>=2.0 -> pytest 4.0.1 atomicwrites>=1.0 -> atomicwrites 1.2.1 py>=1.5.0 -> py 1.7.0 pluggy>=0.7 -> pluggy 0.8.0 more-itertools>=4.0.0 -> more-itertools 4.3.0 colorama; sys_platform == "win32" -> colorama 0.4.1 setuptools -> setuptools 40.6.2 pex: Building pex: 79813.3ms pex: Resolving distributions: 79804.3ms Saving PEX file to icicp.pex
pex caches packages, apparently including local wheels. Thus, using the
approach of building
icicp into a wheel, I had to bump the version number inside
__init__.py any time I made changes to the
icicp code, or else
pex would use the
cached version and my changes wouldn’t get incorporated into
pex cache is stored by default in
~\.pex\; presumably deleting the relevant
wheels here would have worked to pull in the revised code.
pex does expose a
--disable-cache option, but this might result in
all dependencies being re-downloaded unnecessarily, since I don’t know whether
pip’s cache before reaching out to PyPI.
.pex files can be made executable
fairly straightforwardly on Linux,
it’s not so simply done on Windows. Rather than trying to associate the
.pex extension with
Python, I went the route of creating a simple launch script,
@echo off echo Unpacking and running IC-ICP data workup tool... call python icicp.pex pause
As long as a suitable system Python version is available, when placed
in the same directory as
icicp.pex, double-clicking this script kicks off the
internal code using
icicp.runner:main() as the point of entry.
The way the data analysis code is set up, all the user needs to do is drop
the PDFs they want analyzed into the same folder and run the script:
Unpacking and running IC-ICP data workup tool... Collecting list of PDF files... 100%|############| 23/23 [00:00<?, ?it/s] Importing chromatogram data from PDFs... 100%|############| 16/16 [00:00<00:00, 32.25it/s] Generating image and Excel outputs for each PDF... 100%|############| 16/16 [00:04<00:00, 3.36it/s] Generating summary Excel file for all PDFs... ...Done! Output stored in 'output_20181204_131957' Press any key to continue . . .