Metadata cleanup

Please use the template for your metadata for having same column names and minimum needed information for the query. If no structure information is provided, it is queried from PubChem by Name search.

For querying other databases, some need a local file and/or special access otherwise set it to False in the jobs.py:

Natural Product information:

Dictionary of Natural Product (access): Link

LOTUS: run prepare_wikidata_lotus_prefect.py for updating the data (otherwise use the provided file)

Drug information

Broad institute - Drug Repurposing Hub: Download

DrugBank (access needed): Download and run drugbank_extraction.py on that file

DrugCentral (SQL dump file): Download

Requirements for running:

pip install requirements

run with prefect 2

prefect server start

Either remove the prefect.yaml and create new during deployment (see below) OR change the directory within the file to your local path

under pull:
- prefect.deployments.steps.set_working_directory: directory: C:\path\to\your\project

Deployment:

Start serving the flow locally by running metadata_cleanup_prefect.py, the service will then run and wait for jobs to be submitted (jobs.py)
or by running deployment in the terminal

prefect deploy metadata_cleanup_prefect:cleanup_file --name local-deploy --pool local-work

Create new prefect.yaml if needed:

If prefect.yaml cannot be found, type n in the terminal for the two following questions, untill the question appears to save configuration: Would you like to save configuration for this deployment for faster deployments in the future? [y/n]: y

Create and run a worker pool

create worker pool with the name defined in the deployment (e.g., see metadata_cleanup_prefect.py main).

prefect work-pool create --type process local-work
prefect work-pool update --concurrency-limit 5 local-work

start worker in pool to process

prefect work-pool update --concurrency-limit 5 local-work
prefect worker start --pool local-work

Run jobs

Define jobs in jobs.py and run on prefect deployment. The option to creat automatically chuncks of your file are disabled for now.

Sequence creation

The sequence creation is setup for the Orbitrap ID-X (Xcalibur). The metadatasheet needs to hava a unique sample id, a plate name (batch identifier) as plate_id and the vial or well location as well_location to run. For more information see sequence_creation.py.

Well visualization (piechart if a compound mixture was used in each well)

The example shows a 384 well plate and each well contains 10 different compounds. Depending on their polarity detection, the wells are colored as piecharts (ratio of detection).

For more information go to the documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 208 Commits
data		data
examples		examples
notebooks		notebooks
pictures		pictures
sql		sql
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
broadinstitute_client.py		broadinstitute_client.py
chembl_client.py		chembl_client.py
chemfont_database.ini		chemfont_database.ini
chemfont_postgresql_query.py		chemfont_postgresql_query.py
chemfont_search.py		chemfont_search.py
date_utils.py		date_utils.py
dictionary_of_np_client.py		dictionary_of_np_client.py
drug_utils.py		drug_utils.py
drugbank_client.py		drugbank_client.py
drugbank_extraction.py		drugbank_extraction.py
drugcentral_client.py		drugcentral_client.py
drugcentral_database.ini		drugcentral_database.ini
drugcentral_postgresql_query.py		drugcentral_postgresql_query.py
hmdb_extraction.py		hmdb_extraction.py
jobs.py		jobs.py
jobs_combine_divided_data.py		jobs_combine_divided_data.py
jobs_divide_data.py		jobs_divide_data.py
library_utils.py		library_utils.py
lotus_client.py		lotus_client.py
main.py		main.py
mapper.py		mapper.py
meta_constants.py		meta_constants.py
metadata_cleanup.py		metadata_cleanup.py
metadata_cleanup_deployment.py		metadata_cleanup_deployment.py
metadata_cleanup_prefect.py		metadata_cleanup_prefect.py
npatlas_client.py		npatlas_client.py
pandas_utils.py		pandas_utils.py
prefect.yaml		prefect.yaml
prepare_wikidata_lotus_data_prefect.py		prepare_wikidata_lotus_data_prefect.py
pubchem_client.py		pubchem_client.py
pubchem_lite.py		pubchem_lite.py
public_library.ipynb		public_library.ipynb
rdkit_atom_count.py		rdkit_atom_count.py
rdkit_functional_group.py		rdkit_functional_group.py
rdkit_mol_identifiers.py		rdkit_mol_identifiers.py
requirements.txt		requirements.txt
rest_utils.py		rest_utils.py
sequence_creation.py		sequence_creation.py
smarts_utils.py		smarts_utils.py
stats_utils.py		stats_utils.py
structure_classifier_client.py		structure_classifier_client.py
structure_cleanup_rdkit.py		structure_cleanup_rdkit.py
synonyms.py		synonyms.py
test_broadinstitute_client.py		test_broadinstitute_client.py
test_chembl_client.py		test_chembl_client.py
test_database_client.py		test_database_client.py
test_drugbank_client.py		test_drugbank_client.py
test_drugcentral_client.py		test_drugcentral_client.py
test_lotus_client.py		test_lotus_client.py
test_metadata_cleanup.py		test_metadata_cleanup.py
test_npatlas_client.py		test_npatlas_client.py
test_pandas_utils.py		test_pandas_utils.py
test_pubchem_client.py		test_pubchem_client.py
test_rdkit_atom_count.py		test_rdkit_atom_count.py
test_rdkit_functional_group.py		test_rdkit_functional_group.py
test_structure_classifier_client.py		test_structure_classifier_client.py
test_structure_cleanup_rdkit.py		test_structure_cleanup_rdkit.py
test_synonyms.py		test_synonyms.py
test_unichem_client.py		test_unichem_client.py
tmap_plotting.py		tmap_plotting.py
unichem_client.py		unichem_client.py
well_visualization.md		well_visualization.md
wellplate_piegrid.py		wellplate_piegrid.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Metadata cleanup

Requirements for running:

run with prefect 2

Create and run a worker pool

start worker in pool to process

Run jobs

Sequence creation

Well visualization (piechart if a compound mixture was used in each well)

About

Releases

Packages

Contributors 2

Languages

License

corinnabrungs/msn_tree_library

Folders and files

Latest commit

History

Repository files navigation

Metadata cleanup

Requirements for running:

run with prefect 2

Create and run a worker pool

start worker in pool to process

Run jobs

Sequence creation

Well visualization (piechart if a compound mixture was used in each well)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages