Code clean-up, increased modularity and more tests. #63

leojklarner · 2023-12-02T13:41:20Z

Reference Issues/PRs

Fixes the following issues: #42 #61 #4 #62 #52

What does this implement/fix? Explain your changes

This PR fixes a range of issues and makes the code base more modular, adaptable and easier to use. In particular, this PR:

fixes the package requirements and factorises them into molecule, reaction and protein dependencies that can be installed separately. GAUCHE is now pip installable.
fixes the SIGP class and restores the ability to use any Grakel kernel.
refactors the featurisation methods into fingerprint, string and graph-based ones, mirroring the structure of the paper.
makes the data loaders easier to use and extends them to custom .csv datasets.
fixes errors in existing unit tests and adds many more, bringing the total number to >200.

What testing did you do to verify the changes in this PR?

I added extensive unit tests for the changes I made, as well as existing parts of the code base, bringing the total number of unit tests to >200. The only breaking change that this PR introduces is a slight simplification of the data loader call when using the benchmark datasets which will need to be adjusted in the notebooks.

Pull Request Checklist

Added a note about the modification or contribution to the ./CHANGELOG.md file (if applicable)
Added appropriate unit test functions in the ./gauche/tests/* directories (if applicable)
Modify documentation in the corresponding Jupyter Notebook under ./notebooks/ (if applicable)
Ran python -m py.test tests/ and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., python -m py.test tests/kernels/test_graph_kernels.py)
Checked for style issues by running black . and isort .

to work with newer setuptools versions.

with test agains rdkit Tanomoto sims.

graphs and molecules to make dependencies a bit more modular.

adaptable, added tests.

to make it more modular and extendable. Added more unit tests.

Added type hints to data transform function.

to avoid having PyTorch Geometric as a dependency.

feaurisations/kernels.

graph kernels that allow node/ edge label checking.

grakel graph kernel.

inputs to allow usage of graphs with rdkit attributes.

graph kernel formulation.

to see if kernels work in eval mode.

via pip install.

Accidentally uploaded test notebook.

Ryan-Rhys · 2023-12-03T01:28:19Z

benchmarks/run_benchmark.py

@@ -55,11 +56,7 @@ def main(
        n_trials: Number of random train/test splits for the datasets. Default is 20
        test_set_size: Size of the test set for evaluation. Default is 0.2
        dataset_name: Benchmark dataset to use. One of ['Photoswitch', 'ESOL', 'FreeSolv', 'Lipophilicity']


Nice, this makes things easier!

Ryan-Rhys

Amazing stuff! Looks great!

leojklarner added 30 commits November 1, 2023 17:18

Removed urls in requirement files

7e6eeac

to work with newer setuptools versions.

Replaces test against gpflow Tanimoto kernel

78d019a

with test agains rdkit Tanomoto sims.

Black formatting of fingerprint kernel tests.

6c3aaa2

Fixed intersection kernel test.

2668aed

Split featurisation functions into strings,

2186a83

graphs and molecules to make dependencies a bit more modular.

Misc fixes to representation code.

86b09cd

Renamed photoswitch csv file.

9504a61

Renamed DataLoaderMP to MolProp loader.

f7bf13e

Added typing hints for representations.

8a1d9a8

Made mol prop dataloader more modular and

cd2b86b

adaptable, added tests.

Added more molprop dataloader tests.

9e47685

Added support for custom featuriser.

38f3f69

Renamed reaction datasets.

3cfc5d2

Applied same refactor to reaction data loader

e5488b6

to make it more modular and extendable. Added more unit tests.

Removed unused scaling method from dataloader.

7b76589

Added type hints to data transform function.

Black reformatting.

d3eb1d3

Removed pre-trained GNN code (for now)

a6da1ab

to avoid having PyTorch Geometric as a dependency.

Split up requirements for rxn and graph

8f7aacf

feaurisations/kernels.

Fixed test paths.

c01f830

Adjusted GP Regression notebook.

7c3cfd7

Updated additional notebooks.

5920e7f

Made SIGP compatible with newest PyTorch version.

37d17ce

Added bespoke wrappers for

d133c93

graph kernels that allow node/ edge label checking.

Changed import levels.

852f214

Added tailored tests for each

0aa1ed0

grakel graph kernel.

Fixed requirements to make package pip installable

6811892

Adjusted how deepcopy is handled for non-tensorial

da7a208

inputs to allow usage of graphs with rdkit attributes.

Modified general GP tests to work with new

b13329f

graph kernel formulation.

Made graph kernel tests more comples

213358d

to see if kernels work in eval mode.

Isort and black formatting.

8d687e0

leojklarner added 6 commits November 8, 2023 13:10

Isort + black formatting.

ebf3a0b

Update version to 1.0.0

1968ee4

Update setup.py

eb16b82

Added __init__ files to tests to package them.

191cb80

Moved dataset to have access to benchmarks

2bf761a

via pip install.

Fixed benchmarking script.

9772048

leojklarner requested a review from Ryan-Rhys December 2, 2023 13:41

Delete Untitled.ipynb

db8fca7

Accidentally uploaded test notebook.

Ryan-Rhys reviewed Dec 3, 2023

View reviewed changes

Ryan-Rhys approved these changes Dec 3, 2023

View reviewed changes

leojklarner merged commit 31bd9f2 into main Dec 3, 2023
0 of 11 checks passed

leojklarner deleted the env_refactor branch December 3, 2023 17:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code clean-up, increased modularity and more tests. #63

Code clean-up, increased modularity and more tests. #63

leojklarner commented Dec 2, 2023

Ryan-Rhys Dec 3, 2023

Ryan-Rhys left a comment

Code clean-up, increased modularity and more tests. #63

Code clean-up, increased modularity and more tests. #63

Conversation

leojklarner commented Dec 2, 2023

Reference Issues/PRs

What does this implement/fix? Explain your changes

What testing did you do to verify the changes in this PR?

Pull Request Checklist

Ryan-Rhys Dec 3, 2023

Choose a reason for hiding this comment

Ryan-Rhys left a comment

Choose a reason for hiding this comment