PROXIMAL2 PIPELINE

Read more about the method in our [paper] .

1. INPUT GENERATION

GenerateInput.py: script to generate the metabolite and reaction files to generate the operators. The queries need to be created separately and indipendently. It analyzes the reactions and remove the cofactors from the substrates and products. If a reaction represents a transformations just between cofactors or all the substrates or all the products are cofactors, the reaction will be removed. To annotate the metabolites in chemical structures there are few databases that are parsed: PubChem, HMDB, KEGG, MetaNetX, RetroRules.

The function takes as INPUT two files, that should be saved within the input folder:

reactions: .csv file of the reactions of interest. Tabulator as separator. It must have the following columns: "id", "formula", "EC".
metabolites: .csv file of the metabolites that can be involved in the reactions. Tabulator as separator. It must have the following columns, even if empty: "name", "hmdb", "kegg", "metanetx". If the file is empty or a metabolite in the reactions is not present in the file, it will be used just the name from the reaction and the eventual information present in RetroRules, otherwise it will be excluded.

The OUTPUT of the function are saved in the input folder as well, labeled as:

reachableMolecules.csv: .csv containing the structures of the molecules included in the templates.
templateReactions.csv: .csv having the definition of the reaction templates.

2. ENZYME PROMISCUITY ANALYSIS

runPROX2.py: main script to run that import all the needed functions

In proximal_functions folder there are 4 code files:

Operators2: to create the operators.
Products2: to check within the query the possibility to apply the operators.
GenerateMolFiles2: to create the predicted products.
Common2: collect few functions needed in the other steps.

PROXIMAL2 FILES AND GENERAL EXPLANATION

INPUT

In the APPLICATION FILES section there is the files import.

molecules_of_interest = queries of interest. Defined in a .csv file with no header and the tabulator as separator. The algorithm uses Smiles. If the query is expressed by InChI, the algorithm already implemented the generation of the Smiles (comment the line if not needed).
metabolites = molecules included in the reaction templates with their structure defined as Smiles. Generated by GenerateInput.py script.
reaction_list = list of reactions of interest. Generated by GenerateInput.py script.

Moreover, there is the definition of the pathways for the outputs (operators and products) and the final list of compound pairs:

OP_CACHE_DIRECTORY: pathway to operator output folder.
OUTPUT_DIRECTORY: pathway to product output folder.
path_finalReactions: pathway where to store the final list of compound pairs.

OUTPUT

There will be create a folder for any query of interest label with the generated ID "MetX", where X represents the number corresponding to the counting of the queries. Within the folder of the query, there will be created the folders containing the products related to any applied pair.

Any product output within the folder of the related pair is defined in json format, containing the following fields:

GeneratedProduct:
- smiles: may be not generated. It means the algorithm can apply the modifications but RDKit do not find chemical sense about the prediction.
- mol: mol text generated throughout the algorithm.
TemplateReaction:
- ec: enzyme related to the template reaction pair.
- ID: reaction IDs used for the prediction.
- Substrate: Substrate of the used pair.
- Product: Product of the used pair.
QueryInformation:
- name: name of the molecule query.
- ID: ID of the molecule query.
- smiles: original Smiles of the query.

STEPS

ExtractPairs: pair extraction to generate the proper association and redundancy removal (in the input folder will be saved the definitive list of reaction pairs.)
GenerateOperators: operators generation.
GenerateProducts: check possible application to query.
GenerateMolFiles: generate the final product.

FILES INCLUDED

In the input file are present the following files and folder:

cofactors: .csv file including name and inchi of some cofactors, to remove them from the reactions.

INSTALLATION, REQUIREMENTS AND TEST APPLICATION

Set up the environment:

conda create -n p2 -c conda-forge -c bioconda rdkit pubchempy bioservices

conda activate p2

pip install kcfconvoy

conda install -c anaconda pandas scikit-learn

conda install -c conda-forge networkx=2.5

Optional, if intel CPU, install performance enhancements:

conda install -c conda-forge scikit-learn-intelex

Once downloaded the PROXIMAL2 folder, to run the algorithm download the RetroRules database (https://retrorules.org/dl/retrorules_dump) and extract in the input folder.

TEST

To run the algorithm with the test files, run the runPROX2.py files as it is. The products will be saved in the test/TESToutput/products folder.

Note that some reactions will take a couple of minutes to build the operators.

APPLICATION OF OTHER DATASETS

To investigate the promiscuity of other reactions, defines the inputs as explained in the previous INPUT section, comment the lines from 16 to 22 and modify the line 29 of runPROX2.py as following:

if True:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PROXIMAL2 PIPELINE

1. INPUT GENERATION

2. ENZYME PROMISCUITY ANALYSIS

PROXIMAL2 FILES AND GENERAL EXPLANATION

INPUT

OUTPUT

STEPS

FILES INCLUDED

INSTALLATION, REQUIREMENTS AND TEST APPLICATION

TEST

APPLICATION OF OTHER DATASETS

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
input		input
proximal_functions		proximal_functions
test		test
GenerateInput.py		GenerateInput.py
README.md		README.md
runPROX2.py		runPROX2.py

HassounLab/PROXIMAL2

Folders and files

Latest commit

History

Repository files navigation

PROXIMAL2 PIPELINE

1. INPUT GENERATION

2. ENZYME PROMISCUITY ANALYSIS

PROXIMAL2 FILES AND GENERAL EXPLANATION

INPUT

OUTPUT

STEPS

FILES INCLUDED

INSTALLATION, REQUIREMENTS AND TEST APPLICATION

TEST

APPLICATION OF OTHER DATASETS

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages