MOCCA (Multivariate Online Contextual Chromatographic Analysis) is an open-source Python project to analyze HPLC–DAD raw data.
Automation and digitalization solutions in the field of small molecule synthesis face new challenges for chemical reaction analysis, especially in the field of high-performance liquid chromatography (HPLC). Chromatographic data remains locked in vendors’ hardware and software components limiting their potential in automated workflows and contradicting to FAIR data principles (findability, accessibility, interoperability, reuse), which enable chemometrics and data science applications. In this work, we present an open-source Python project called MOCCA (Multivariate Online Contextual Chromatographic Analysis) for the analysis of open-format HPLC–DAD (photodiode array detector) raw data. MOCCA provides a comprehensive set of data analysis features including a peak deconvolution routine which allows for automated deconvolution of known signals even if overlapped with signals of unexpected impurities or side products. By publishing MOCCA as a Python package, we envision an open-source community project for chromatographic data analysis with the potential of further advancing its scope and capabilities.
Open-source project: https://github.com/HaasCP/mocca
Documentation: https://mocca.readthedocs.io/en/latest/
Corresponding scientific publication (preprint): https://doi.org/10.26434/chemrxiv-2022-0pv2d, this content is a preprint and has not been peer-reviewed.
We recommend creating an isolated conda environment to avoid any problems with your installed Python packages:
conda create -n mocca python=3.9 conda activate mocca
Install
mocca
and its dependencies:pip install mocca
If you want to use
mocca
's reporting functionality:pip3 install -U datapane==0.14
If you want to use Allotrope (adf) file format:
pip install h5py pip install git+https://github.com/HDFGroup/h5ld@master
If you want to use
mocca
using JupyterLab notebooks:pip install jupyterlab ipython kernel install --user --name=mocca
MOCCA is currently best used via JupyterLab notebooks. The notebooks folder of the GitHub repository contains a tutorial notebook with corresponding HPLC–DAD test data for the first steps.
Additionally, a full test data set from the scientific publication is added (cyanation of aryl halides via well plate screening). The corresponding notebook contains full data analysis details from the raw data level until the presented visualizations in the manuscript (Fig. 7e) and SI (Fig. S17).
Preprint:
Haas, C. P., Lübbesmeyer, M., Jin, E. H., McDonald, M. A., Koscher, B. A., Guimond, N., Di Rocco, L., Kayser, H., Leweke, S., Niedenführ, S., Nicholls, R., Greeves, E., Barber, D. M., Hillenbrand, J., Volpin, G., and Jensen, K. F. Open-Source Chromatographic Data Analysis for Reaction Optimization and Screening. ChemRxiv 2022. https://doi.org/10.26434/chemrxiv-2022-0pv2d.
This project has been set up using PyScaffold 4.1.1. For details and usage information on PyScaffold see https://pyscaffold.org/.