Processing HPLC data using JupyterLab

When analyzing HPLC data or GC data, one often ends up using the commercial software that came with the device to find peaks and plot signals. In my case, I was working with an older instrument and wanted to try to find a solution in Python. I came up with the following rough code, shown in an iPython notebook - this code is included in the GitHub repo for Tsuji et al., Nature, 2024. Basically, Scikit-learn is used to find peaks in the data, and everything is visualized using plotnine. I haven’t tried to calculate peak areas yet.

The code below is rough and would need to be cleaned up before it was ready for general use, but I wonder such code might come in handy for future chromatograph work. See the two plots at the end of the notebook to get a sense of what the code can do.

I apologize for the dearth of comments (!) and the plotnine warnings, but basically, the analysis below does the following:

  1. Defines functions for working with hyper-spectral HPLC signal data - in my case, these data are absorbances at a range of wavelengths across a range of retention times
  2. Uses a “master function” to extract out HPLC traces and identify peaks at a specific absorbance wavelength of interest, for three HPLC samples. The absorbance spectra (across the full wavelength range) are plotted for each identified peak
  3. Merges the results from the three samples to generate two figures:
    • HPLC profiles (over time) for three samples at the same wavelength
    • Absorbance spectra of the highest peaks in the HPLC profile figure, for each sample

Note: some of the code lines might get cut off in the inserted notebook below, so if you’d like to freely explore the full notebook file, see the notebook on GitHub. You can also check out its brief associated README.


Interesting that this is possible in pure Python! I might clean up this code more someday in future.