CoarsenConf

Implementation of CoarsenConf by D. Reidenbach* and A. Krishnapriyan.

CoarsenConf is a coarse grained variational auto encoder for molecular conformer generation.

If you have questions, don't hesitate to open an issue or send us an email at [email protected]

Setting up Conda environment

Create new Conda environment using mcg_environment.yml. You might need to adjust the cudatoolkit version to match your cuda version or set cpuonly.

conda env create -f mcg_environment.yml
conda activate mcg

Generate conformers from SMILES

To generate conformers using the trained model, create a smiles.csv or .pkl file containing at every line smile_str, num_conformers, smile_str (for example CN1C=NC2=C1C(=O)N(C(=O)N2C)C, 10, CN1C=NC2=C1C(=O)N(C(=O)N2C)C) where smile_str is the SMILE representation of the molecule (note: technically the first is the one used as identifier of the molecule and the second the one used to create it but we suggest to keep them the same). Then you can generate the conformers running:

python generate.py

Training model

Following the instruction from Torsional Diffusion download and extract all the relevant data from the compressed .tar.gz folders from this shared Drive putting them in the subdirectory data. These contain the GEOM datasets used in the project (license CC0 1.0), the splits from GeoMol and the pickle files with preprocessed molecules (see below to recreate them) and are divided based on the dataset they refer to. Then, you can start training:

python train_drugs.py

Details on all hyperparameters or how to update to different datasets can be found in configs. The first time the training is run, a featurisation procedure starts and caches the result so that it won't be required the next time training is run.

Running evaluation

In order to evaluate a model on the test set of one of the datasets you need to first download the data (see section above, but the only files needed are test_smiles.csv, list of SMILES strings and the number of conformers, and test_mols.pkl, dictionary of ground truth conformers). Locate the work directory of your trained model and, then, you can generate the conformers with the model via:

python scripts/generate_confs.py

Finally, evaluate the error of the conformers using the following command:

python evaluate_confs.py

Citation

If you use this code, please cite:

@article{reidenbach2023coarsenconf,
  title={CoarsenConf: Equivariant Coarsening with Aggregated Attention for Molecular Conformer Generation},
  author={Danny Reidenbach and Aditi S. Krishnapriyan},
  journal={arXiv preprint arXiv:2306.14852},
  year={2023},
  }

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
model		model
scripts		scripts
utils		utils
LICENSE		LICENSE
README.md		README.md
generate.py		generate.py
mcg_environment.yaml		mcg_environment.yaml
train_drugs.py		train_drugs.py
train_qm9.py		train_qm9.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoarsenConf

Setting up Conda environment

Generate conformers from SMILES

Training model

Running evaluation

Citation

License

About

Releases

Packages

Languages

License

ASK-Berkeley/CoarsenConf

Folders and files

Latest commit

History

Repository files navigation

CoarsenConf

Setting up Conda environment

Generate conformers from SMILES

Training model

Running evaluation

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages