In the Figure below a schematic representation of the whole pipeline is depicted. The red blocks with the rounded edges represent the final input and output. The blue rectangular blocks represent tools or websites used in the pipeline, and the yellow blocks with a wavy bottom represent intermediate inputs and outputs. The picture files generated by some scripts are not included in this pipeline.
MS2LDA was run through the GNPS website on the three .mzML files containing the measured MS2 spectra to generate Mass2Motifs. To run MS2LDA on GNPS, first a classical molecular network was generated on GNPS. After the MS2LDA analysis on GNPS was finished, the .dict file containing the information obtained through the MS2LDA analysis on GNPS (e.g. Mass2Motifs, Mass2Motif fragments or losses, etc.) was uploaded on MS2LDA.org using the upload tab in the create experiment option. From MS2LDA.org the .csv containing the extracted fragment and loss Mass2Motif fragments or losses, and the .csv containing all fragmentation spectra and Mass2Motifs matching details were downloaded. The consensus spectra in .mgf format from the classical molecular network and the two .csv files from MS2LDA with the Mass2Motifs, the Mass2Motif fragments or losses, and the spectrum identifiers of experimental spectra that contained certain Mass2Motif fragments or losses were used as an input for the pipeline.
Tutorial on classical molecular networking on GNPS: https://www.youtube.com/watch?v=PqTuex0nsGk&t=3s
Tutorial on MS2LDA on GNPS: https://www.youtube.com/watch?v=0wKUmjPy40s
Documentation MS2LDA on GNPS: https://ccms-ucsd.github.io/GNPSDocumentation/ms2lda/
see https://github.com/iomega/ms2query for installation and run instructions
A separate conda environment was made to run this script. This environment included the following packages:
However, it should be noted that not all these packages are neccessary to run the script!!
A separate conda environment was made to run this script. This environment included the following packages:
However, it should be noted that not all these packages are neccessary to run the script!!
conda install -c conda-forge rdkit
e.g. python3 select_Mass2Motif_frag_and_loss.py /lustre/BIF/nobackup/seele006/MSc_thesis_annotation_Mass2Motif_fragments_data/select_Mass2Motifs/input/MS2Query_output.csv /lustre/BIF/nobackup/seele006/MSc_thesis_annotation_Mass2Motif_fragments_data/select_Mass2Motifs/input/MS2LDA_spectra_and_motif.csv /lustre/BIF/nobackup/seele006/MSc_thesis_annotation_Mass2Motif_fragments_data/select_Mass2Motifs/input/MS2LDA_motif_and_fragments.csv /lustre/BIF/nobackup/seele006/MSc_thesis_annotation_Mass2Motif_fragments_data/select_Mass2Motifs/input/consensus_spectra_from_GNPS_classical_molecular_network.mgf /lustre/BIF/nobackup/seele006/MSc_thesis_annotation_Mass2Motif_fragments_data/select_Mass2Motifs/output
A separate conda environment was made to run this script. This environment included the following packages:
However, it should be noted that not all these packages are neccessary to run the script!!
https://pypi.org/project/massql/
e.g. python3 massql.py /lustre/BIF/nobackup/seele006/MSc_thesis_annotation_Mass2Motif_fragments_data/MassQL/input/motif_massql_querries.txt /mnt/LTR_userdata/hooft001/mass_spectral_embeddings/datasets/GNPS_15_12_21/ALL_GNPS_15_12_2021_positive_annotated.pickle /lustre/BIF/nobackup/seele006/MSc_thesis_annotation_Mass2Motif_fragments_data/MassQL/output/out_spectrum /lustre/BIF/nobackup/seele006/MSc_thesis_annotation_Mass2Motif_fragments_data/MassQL/output/out_files /lustre/BIF/nobackup/seele006/MSc_thesis_annotation_Mass2Motif_fragments_data/MassQL/output/json_enzo/GNPS.mgf /lustre/BIF/nobackup/seele006/MSc_thesis_annotation_Mass2Motif_fragments_data/MassQL/output/json_enzo/GNPS.json
MassQL documentation: https://mwang87.github.io/MassQueryLanguage_Documentation/
MassQL sandbox (try-out queries): https://msql.ucsd.edu/
GNPS public spectral libraries: https://gnps.ucsd.edu/ProteoSAFe/libraries.jsp
A separate conda environment was made to run this script. This environment included the following packages:
However, it should be noted that not all these packages are neccessary to run the script!!
see https://github.com/NLeSC/MAGMa/tree/master/job
conda install -c conda-forge rdkit
e.g. python3 MAGMa_final.py /lustre/BIF/nobackup/seele006/MSc_thesis_annotation_Mass2Motif_fragments_data/MassQL/output/out_spectrum /lustre/BIF/nobackup/seele006/MSc_thesis_annotation_Mass2Motif_fragments_data/MAGMa/output/MAGMa_results_database_for_every_spectrum_from_massql /home/seele006/thesis/motif_massql_querries.txt /lustre/BIF/nobackup/seele006/MSc_thesis_annotation_Mass2Motif_fragments_data/MAGMa/output/pic_mass2Motif_frag
Pipeline developed by Rogers et al. https://github.com/iomega/motif_annotation/blob/master/annotate_motifs.py
The annotations for each Mass2Motif mass fragment and neutral loss were combined in a tsv-formatted output file. The frequency that each SMILES annotation for a Mass2Motif fragment or loss was obtained, was also tracked in the .tsv file. If the molecular weight of the SMILES annotated to the neutral loss was not similar 1 decimal after the comma to the weight of the Mass2Motif neutral loss, the molecular weight of the SMILES structure was also written to the .tsv file.