Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code clean-up, increased modularity and more tests. #63

Merged
merged 37 commits into from
Dec 3, 2023
Merged

Conversation

leojklarner
Copy link
Owner

Reference Issues/PRs

Fixes the following issues: #42 #61 #4 #62 #52

What does this implement/fix? Explain your changes

This PR fixes a range of issues and makes the code base more modular, adaptable and easier to use. In particular, this PR:

  • fixes the package requirements and factorises them into molecule, reaction and protein dependencies that can be installed separately. GAUCHE is now pip installable.
  • fixes the SIGP class and restores the ability to use any Grakel kernel.
  • refactors the featurisation methods into fingerprint, string and graph-based ones, mirroring the structure of the paper.
  • makes the data loaders easier to use and extends them to custom .csv datasets.
  • fixes errors in existing unit tests and adds many more, bringing the total number to >200.

What testing did you do to verify the changes in this PR?

I added extensive unit tests for the changes I made, as well as existing parts of the code base, bringing the total number of unit tests to >200. The only breaking change that this PR introduces is a slight simplification of the data loader call when using the benchmark datasets which will need to be adjusted in the notebooks.

Pull Request Checklist

  • Added a note about the modification or contribution to the ./CHANGELOG.md file (if applicable)
  • Added appropriate unit test functions in the ./gauche/tests/* directories (if applicable)
  • Modify documentation in the corresponding Jupyter Notebook under ./notebooks/ (if applicable)
  • Ran python -m py.test tests/ and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., python -m py.test tests/kernels/test_graph_kernels.py)
  • Checked for style issues by running black . and isort .

to work with newer setuptools versions.
with test agains rdkit Tanomoto sims.
graphs and molecules to make dependencies
a bit more modular.
to make it more modular and extendable.
Added more unit tests.
Added type hints to data transform function.
to avoid having PyTorch Geometric as a dependency.
graph kernels that allow node/
edge label checking.
grakel graph kernel.
inputs to allow usage of graphs with rdkit attributes.
to see if kernels work in eval mode.
Accidentally uploaded test notebook.
@@ -55,11 +56,7 @@ def main(
n_trials: Number of random train/test splits for the datasets. Default is 20
test_set_size: Size of the test set for evaluation. Default is 0.2
dataset_name: Benchmark dataset to use. One of ['Photoswitch', 'ESOL', 'FreeSolv', 'Lipophilicity']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this makes things easier!

Copy link
Collaborator

@Ryan-Rhys Ryan-Rhys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing stuff! Looks great!

@leojklarner leojklarner merged commit 31bd9f2 into main Dec 3, 2023
0 of 11 checks passed
@leojklarner leojklarner deleted the env_refactor branch December 3, 2023 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants