Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linting and updates #53

Merged
merged 116 commits into from
Dec 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
116 commits
Select commit Hold shift + click to select a range
53c8fa3
transfer less data to gpu
Sep 1, 2023
f0b43bc
add rigid uitls
Sep 16, 2023
fb8b62a
Merge remote-tracking branch 'origin' into ipmp
Sep 17, 2023
8cf1eb9
Merge branch 'main' into ipmp
Sep 19, 2023
437d2f2
Merge remote-tracking branch 'origin' into ipmp
Sep 22, 2023
c97391d
add graph model hparams
Sep 25, 2023
a5d0eaa
add hparams
Sep 28, 2023
fd43446
add support for loading hparam override
Oct 23, 2023
c2c8b01
add optional hparam override to finetune run config
Oct 23, 2023
c2dcbee
set in memory=True as default for cath and fold classification datasets
Oct 23, 2023
d041a3e
add torch geometric compile
Oct 23, 2023
d83c62e
fix scenario where only seq_pos_enc is a node feature
Oct 23, 2023
eedc9e5
refactor logging to use a single logging call
Oct 23, 2023
005db26
remove duplicated entry
Oct 23, 2023
f184773
minor linting
Oct 23, 2023
90835ea
formatting
Oct 25, 2023
282c6c3
add pre-trained inverse folding config
Oct 25, 2023
eec1308
add EGNN pretraining config
Oct 25, 2023
02abcd8
add find graph encoder hparams
Oct 25, 2023
1847379
add baseline inverse folding config
Oct 25, 2023
881efee
add linters to toml
Oct 25, 2023
034268a
add pre-commit config
Oct 25, 2023
cdfcdb8
Add porject support for py311
Oct 25, 2023
ff44a34
Add cpu index url for torch for CI
Oct 25, 2023
bbcfab7
add cpu torch source
Oct 25, 2023
e42cf36
rollback to max python 3.10 due to lack of torchdrug support
Oct 26, 2023
ac9f5b7
bump graphein to 1.7.4 for PyG 2.4+ support (and backwards compatibility
Oct 26, 2023
736bf63
add warning log
Oct 26, 2023
d3536d7
add list to track processed files in case of overwrite
Oct 26, 2023
4ee8b90
addmodel attribution script
Oct 27, 2023
d24bcf6
update graphein version to 1.7.5+ and add captum dependency
Oct 27, 2023
03271da
add some more docstrings, clean up
Oct 28, 2023
684b3e5
add attribution to cli
Oct 28, 2023
089aff7
update gitignore
Oct 28, 2023
3c6f3e1
Merge branch 'main' into ipmp
a-r-j Nov 3, 2023
b7ffda0
add DDP support #44
Nov 3, 2023
30679e3
update readme
Nov 3, 2023
151aa10
Merge branch 'ipmp' of https://www.github.com/a-r-j/ProteinWorkshop i…
Nov 3, 2023
6589266
update readme
Nov 3, 2023
594a596
ignore igfold dataset in overwrite test
Nov 3, 2023
2007c4b
add IGFold prediction datasets
Nov 3, 2023
63ccd14
Add igfold datamodule
Nov 3, 2023
75dc816
fix binary graph classification config
Nov 7, 2023
9771abd
add cdconv model
Nov 12, 2023
0acaaa5
fix test fixture
Nov 12, 2023
e07cca2
update docs
Nov 12, 2023
56a9c5e
linting
Nov 12, 2023
5549a34
linting
Nov 12, 2023
d9c4ca9
add full model config for finetuning
Nov 12, 2023
fc7521c
linting
Nov 12, 2023
5315f7e
fix device request logging
Nov 13, 2023
6ba482b
add multihot label encoder
Nov 13, 2023
33caa05
speed up positional encoding computation
Nov 16, 2023
6f27516
fix device logging
Nov 16, 2023
ca66359
add num_classes to GO datasets
Nov 16, 2023
103b613
lint cdconv config
Nov 16, 2023
1e6090d
add multilabel classification task configs
Nov 16, 2023
9270e44
refactor f1_max for multilabel classif.
Nov 16, 2023
4be174a
add auprc to classification metrics, linting
Nov 16, 2023
b32afda
linting
Nov 16, 2023
a1b3a74
set in_memory=True as default for GO datasets
Nov 16, 2023
06364b2
clean up EC dataset
Nov 17, 2023
91cbc2f
clean up GO dataset
Nov 17, 2023
40baf0c
improve instantiation test
Nov 17, 2023
d85c6f0
set ec to in memory
Nov 17, 2023
82e4024
fix GO labelling
Nov 17, 2023
2e9d73c
fix metrics memory leak
Nov 18, 2023
64658e1
add ec_Reaction sweep
Nov 19, 2023
075c00b
Ignore local Conda env in project directory
amorehead Nov 20, 2023
dc3ec55
add missing mace ca_angles hparams
Nov 20, 2023
317b4d1
Merge branch 'ipmp' of https://www.github.com/a-r-j/ProteinWorkshop i…
Nov 20, 2023
e3354a3
add addev config
Nov 21, 2023
415e6d3
A dataset loading script for antibody_developability.py
amorehead Nov 21, 2023
3dc7c99
Merge branch 'ipmp' of https://github.com/a-r-j/ProteinWorkshop into …
amorehead Nov 21, 2023
4879e50
add esm BB config
Nov 22, 2023
07830ef
Merge branch 'ipmp' of https://www.github.com/a-r-j/ProteinWorkshop i…
Nov 22, 2023
06bb80f
Add ESM config for all feature schemes
amorehead Nov 22, 2023
c9c3445
add ppi prediction task updates
Nov 22, 2023
f789df2
Merge branch 'ipmp' of https://www.github.com/a-r-j/ProteinWorkshop i…
Nov 22, 2023
f8ea13b
add ppi sweep config
Nov 22, 2023
15fc31b
Update test script for masif_dataset.py
amorehead Nov 22, 2023
b30d40f
Update path for masif_site in test script
amorehead Nov 22, 2023
01caa96
mask additional attributes in PPI site prediction
Nov 22, 2023
9f50f04
resolve sequence tokenization
Nov 22, 2023
74004f8
refactor chain identification
Nov 23, 2023
9d9994c
fix error in error fix
Nov 23, 2023
ee8b3cf
Fix fix of a fix
amorehead Nov 23, 2023
d827c84
exclude erroneous examples
Nov 23, 2023
e4b4d62
fix edge cases
Nov 23, 2023
eb44c49
fix edge cases
Nov 23, 2023
76aa0b5
Merge branch 'ipmp' of https://www.github.com/a-r-j/ProteinWorkshop i…
Nov 23, 2023
758a292
add model io utils
Nov 27, 2023
595b5ac
standardise default features for train and finetune configs #61
Nov 30, 2023
381040c
refactor to new recommended jaxtyping/beartype syntax
Dec 26, 2023
9fbc041
typechecker refactor for esm
Dec 26, 2023
69d0474
typechecker refactor for dataset base
Dec 26, 2023
248478b
lint
Dec 26, 2023
e7123f6
remove merge artifact from poetry.lock
Dec 26, 2023
5173259
fix beartype import
Dec 26, 2023
a0a775a
fix broken lock file
Dec 26, 2023
6be97aa
fix broken poetry.lock and update jaxtyping dependency
Dec 26, 2023
3d39e27
fix broken poetry.lock and update jaxtyping dependency
Dec 26, 2023
d7d22b9
use mamba in test workflow
Dec 26, 2023
be5dd40
fix pyg wheel link for torch > 2.1.0
Dec 26, 2023
735640b
update tests
Dec 26, 2023
8d32aa2
lint
Dec 26, 2023
b15bdf2
fix test
Dec 26, 2023
a0782c4
set dummy labels on example_batch
Dec 26, 2023
cc4364e
fix zenodo url
Dec 26, 2023
cefcb8b
fix zenodo url
Dec 26, 2023
c322df6
fix beartype import name
Dec 26, 2023
3e735c8
add changelog
Dec 26, 2023
19dd12d
add attribution to toc
Dec 26, 2023
22fca65
Update install instructions to PyTorch 2.1.2+, and sync docs with REA…
amorehead Dec 28, 2023
3014dc3
fix malformed HTML in quickstart components
Dec 28, 2023
a46c4ed
minor fixes to docs
Dec 28, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions .github/workflows/code-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,16 +19,20 @@ jobs:
strategy:
matrix:
platform: [ubuntu-latest, macos-latest, windows-latest]
python-version: [3.9, "3.10", 3.11]
python-version: [3.9, "3.10"]
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2.3.1
- name: Setup miniconda
uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
miniforge-variant: Mambaforge
channels: "conda-forge, pytorch, pyg"
python-version: ${{ matrix.python-version }}
use-mamba: true
- id: cache-dependencies
name: Cache dependencies
uses: actions/[email protected]
Expand Down
6 changes: 3 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -143,12 +143,12 @@ proteinworkshop/data/*
!proteinworkshop/data/.gitkeep

logs/
ProteinWorkshop/
wandb/
.DS_Store
.env

# Explanations
explanations/
# Visualisations
visualisations/

# Explanations
explanations/
25 changes: 25 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- repo: https://github.com/ambv/black
rev: 23.9.1
hooks:
- id: black
- repo: https://github.com/jsh9/pydoclint
# pydoclint version.
rev: 0.3.3
hooks:
- id: pydoclint
args:
- "--config=pyproject.toml"
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.1.1
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
30 changes: 30 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,33 @@
### 0.2.6 (Unreleased)



#### Datasets

* Adds to antibody-specific datasets using the IGFold corpuses for paired OAS and Jaffe 2022 [#53](https://github.com/a-r-j/ProteinWorkshop/pull/53/)
* Set `in_memory=True` as default for most (small) datasets for improved performance [#53](https://github.com/a-r-j/ProteinWorkshop/pull/53/)
* Fix `num_classes` for GO datamodules * Set `in_memory=True` as default for most (downstream) datasets for improved performance [#53](https://github.com/a-r-j/ProteinWorkshop/pull/53/)
* Fixes GO labelling [#53](https://github.com/a-r-j/ProteinWorkshop/pull/53/)


### Features
* Improves positional encoding performance by adding a `seq_pos` attribute on `Data/Protein` objects in the base dataset getter. [#53](https://github.com/a-r-j/ProteinWorkshop/pull/53/)

### Models
* Adds CDConv implementation [#53](https://github.com/a-r-j/ProteinWorkshop/pull/53/)
* Adds tuned hparams for models [#53](https://github.com/a-r-j/ProteinWorkshop/pull/53/)

### Framework
* Refactors beartype/jaxtyping to use latest recommended syntax [#53](https://github.com/a-r-j/ProteinWorkshop/pull/53/)
* Adds explainability module for performing attribution on a trained model [#53](https://github.com/a-r-j/ProteinWorkshop/pull/53/)
* Change default finetuning features in config: `ca_base` -> `ca_seq` [#53](https://github.com/a-r-j/ProteinWorkshop/pull/53/)
* Add optional hparam entry point to finetuning config [#53](https://github.com/a-r-j/ProteinWorkshop/pull/53/)
* Fixes GPU memory accumulation for some metrics [#53](https://github.com/a-r-j/ProteinWorkshop/pull/53/)
* Updates zenodo URL for processed datasets to reflect upstream API change [#53](https://github.com/a-r-j/ProteinWorkshop/pull/53/)
* Adds multi-hot label encoding transform [#53](https://github.com/a-r-j/ProteinWorkshop/pull/53/)
* Fixes auto PyG install for `torch>2.1.0` [#53](https://github.com/a-r-j/ProteinWorkshop/pull/53/)
* Adds `proteinworkshop.model_io` containing utils for loading trained models [#53](https://github.com/a-r-j/ProteinWorkshop/pull/53/)

### 0.2.5 (25/09/2024)

* Implement ESM embedding encoder ([#33](https://github.com/a-r-j/ProteinWorkshop/pull/33), [#41](https://github.com/a-r-j/ProteinWorkshop/pull/33))
Expand Down
37 changes: 27 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Configuration files to run the experiments described in the manuscript are provi
- [Running a sweep/experiment](#running-a-sweepexperiment)
- [Embedding a dataset](#embedding-a-dataset)
- [Visualising a dataset's embeddings](#visualising-pre-trained-model-embeddings-for-a-given-dataset)
- [Performing attribution of a pre-trained model](#performing-attribution-of-a-pre-trained-model)
- [Verifying a config](#verifying-a-config)
- [Using `proteinworkshop` modules functionally](#using-proteinworkshop-modules-functionally)
- [Models](#models)
Expand Down Expand Up @@ -67,14 +68,11 @@ Below, we outline how one may set up a virtual environment for `proteinworkshop`

### From PyPI

`proteinworkshop` is available for install [from PyPI](https://pypi.org/project/proteinworkshop/). This enables training of specific configurations via the CLI **or** using individual components from the benchmark, such as datasets, featurisers, or transforms, as drop-ins to other projects. Make sure to install [PyTorch](https://pytorch.org/) (specifically version `2.0.0`) using its official `pip` installation instructions, with CUDA support as desired.
`proteinworkshop` is available for install [from PyPI](https://pypi.org/project/proteinworkshop/). This enables training of specific configurations via the CLI **or** using individual components from the benchmark, such as datasets, featurisers, or transforms, as drop-ins to other projects. Make sure to install [PyTorch](https://pytorch.org/) (specifically version `2.1.2` or newer) using its official `pip` installation instructions, with CUDA support as desired.

```bash
# install `proteinworkshop` from PyPI
pip install proteinworkshop --no-cache-dir

# e.g., install PyTorch with CUDA 11.8 support on Linux
pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118 --no-cache-dir
pip install proteinworkshop

# install PyTorch Geometric using the (now-installed) CLI
workshop install pyg
Expand All @@ -86,7 +84,7 @@ export DATA_PATH="where/you/want/data/" # e.g., `export DATA_PATH="proteinworksh
However, for full exploration we recommend cloning the repository and building from source.

### Building from source
With a local virtual environment activated (e.g., one created with `conda create -n proteinworkshop python=3.9`):
With a local virtual environment activated (e.g., one created with `conda create -n proteinworkshop python=3.10`):
1. Clone and install the project

```bash
Expand All @@ -95,11 +93,11 @@ With a local virtual environment activated (e.g., one created with `conda create
pip install -e .
```

2. Install [PyTorch](https://pytorch.org/) (specifically version `2.0.0`) using its official `pip` installation instructions, with CUDA support as desired (N.B. make sure to add `--no-cache-dir` to the end of the `pip` installation command)
2. Install [PyTorch](https://pytorch.org/) (specifically version `2.1.2` or newer) using its official `pip` installation instructions, with CUDA support as desired

```bash
# e.g., to install PyTorch with CUDA 11.8 support on Linux:
pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118 --no-cache-dir
pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 torchaudio==2.1.2+cu118 --index-url https://download.pytorch.org/whl/cu118
```

3. Then use the newly-installed `proteinworkshop` CLI to install [PyTorch Geometric](https://pyg.org/)
Expand Down Expand Up @@ -252,6 +250,21 @@ python proteinworkshop/visualise.py ckpt_path=PATH/TO/CHECKPOINT plot_filepath=V
```
See the `visualise` section of `proteinworkshop/config/visualise.yaml` for additional parameters.

### Performing attribution of a pre-trained model
a-r-j marked this conversation as resolved.
Show resolved Hide resolved

We provide a utility in `proteinworkshop/explain.py` for performing attribution of a pre-trained model using integrated gradients.

This will write PDB files for all the structures in a dataset for a supervised task with residue-level attributions in the `b_factor` column. To visualise the attributions, we recommend using the [Protein Viewer VSCode extension](https://marketplace.visualstudio.com/items?itemName=ArianJamasb.protein-viewer) and changing the 3D representation to colour by `Uncertainty/Disorder`.

To run the attribution:

```bash
python proteinworkshop/explain.py ckpt_path=PATH/TO/CHECKPOINT output_dir=ATTRIBUTION/DIRECTORY
```

See the `explain` section of `proteinworkshop/config/explain.yaml` for additional parameters.


### Verifying a config

```bash
Expand Down Expand Up @@ -309,6 +322,7 @@ Read [the docs](https://www.proteins.sh) for a full list of modules available in
| `GearNet`| [Zhang et al.](https://arxiv.org/abs/2203.06125) | ✓
| `DimeNet++` | [Gasteiger et al.](https://arxiv.org/abs/2011.14115) | ✗
| `SchNet` | [Schütt et al.](https://arxiv.org/abs/1706.08566) | ✗
| `CDConv` | [Fan et al.](https://openreview.net/forum?id=P5Z-Zl9XJ7) | ✓

### Equivariant Graph Encoders

Expand Down Expand Up @@ -361,8 +375,11 @@ Pre-training corpuses (with the exception of `pdb`, `cath`, and `astral`) are pr
| `esmatlas` | [ESMAtlas](https://esmatlas.com/) predictions (full) | [Kim et al.](https://academic.oup.com/bioinformatics/article/39/4/btad153/7085592) | | 1 Tb | [GPL-3.0](https://github.com/steineggerlab/foldcomp/blob/master/LICENSE.txt) / [CC-BY 4.0](https://esmatlas.com/about)
| `esmatlas_v2023_02`| [ESMAtlas](https://esmatlas.com/) predictions (v2023_02 release) | [Kim et al.](https://academic.oup.com/bioinformatics/article/39/4/btad153/7085592) | | 137 Gb| [GPL-3.0](https://github.com/steineggerlab/foldcomp/blob/master/LICENSE.txt) / [CC-BY 4.0](https://esmatlas.com/about)
| `highquality_clust30`| [ESMAtlas](https://esmatlas.com/) High Quality predictions | [Kim et al.](https://academic.oup.com/bioinformatics/article/39/4/btad153/7085592) | 37M Chains | 114 Gb | [GPL-3.0](https://github.com/steineggerlab/foldcomp/blob/master/LICENSE.txt) / [CC-BY 4.0](https://esmatlas.com/about)
| `igfold_paired_oas` | IGFold Predictions for [Paired OAS](https://journals.aai.org/jimmunol/article/201/8/2502/107069/Observed-Antibody-Space-A-Resource-for-Data-Mining) | [Ruffolo et al.](https://www.nature.com/articles/s41467-023-38063-x) | 104,994 paired Ab chains | | [CC-BY 4.0](https://www.nature.com/articles/s41467-023-38063-x#rightslink)
| `igfold_jaffe` | IGFold predictions for [Jaffe2022](https://www.nature.com/articles/s41586-022-05371-z) data | [Ruffolo et al.](https://www.nature.com/articles/s41467-023-38063-x) | 1,340,180 paired Ab chains | | [CC-BY 4.0](https://www.nature.com/articles/s41467-023-38063-x#rightslink)
| `pdb`| Experimental structures deposited in the [RCSB Protein Data Bank](https://www.rcsb.org/) | [wwPDB consortium](https://academic.oup.com/nar/article/47/D1/D520/5144142) | ~800k Chains |23 Gb | [CC0 1.0](https://www.rcsb.org/news/feature/611e8d97ef055f03d1f222c6) |


<details>
<summary>Additionally, we provide several species-specific compilations (mostly reference species)</summary>

Expand Down Expand Up @@ -528,8 +545,8 @@ We use `poetry` to manage the project's underlying dependencies and to push upda
To keep with the code style for the `proteinworkshop` repository, using the following lines, please format your commits before opening a pull request:
```bash
# assuming you are located in the `ProteinWorkshop` top-level directory
isort .
autoflake -r --in-place --remove-unused-variables --remove-all-unused-imports --ignore-init-module-imports .
isort .
autoflake -r --in-place --remove-unused-variables --remove-all-unused-imports --ignore-init-module-imports .
black --config=pyproject.toml .
```

Expand Down
4 changes: 2 additions & 2 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
"sphinx.ext.autosummary",
"sphinx.ext.intersphinx",
"sphinx.ext.viewcode",
"sphinx.ext.doctest",
"sphinx_copybutton",
"sphinx_inline_tabs",
"sphinxcontrib.gtagjs",
Expand All @@ -32,7 +33,7 @@
"nbsphinx_link",
"sphinx.ext.napoleon",
"sphinx_codeautolink",
"sphinxcontrib.jquery"
"sphinxcontrib.jquery",
# "sphinx_autorun",
]

Expand Down Expand Up @@ -109,7 +110,6 @@
"vu": "\\mathbf{u}",
"vv": "\\mathbf{v}",
"vw": "\\mathbf{w}",
"vx": "\\mathbf{x}",
"vy": "\\mathbf{y}",
"vz": "\\mathbf{z}",
}
Expand Down
6 changes: 3 additions & 3 deletions docs/source/configs/dataset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ Unlabelled Datasets


.. mdinclude:: ../../../README.md
:start-line: 331
:end-line: 373
:start-line: 361
:end-line: 406


:py:class:`ASTRAL <proteinworkshop.datasets.astral.AstralDataModule>` (``astral``)
Expand Down Expand Up @@ -116,7 +116,7 @@ This is a dataset of approximately 3 million protein structures from the AlphaFo

Species-Specific Datasets
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TODO
Stay tuned!


Graph-level Datasets
Expand Down
4 changes: 2 additions & 2 deletions docs/source/configs/features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ Features
:width: 400

.. mdinclude:: ../../../README.md
:start-line: 426
:end-line: 475
:start-line: 459
:end-line: 508


Default Features
Expand Down
4 changes: 2 additions & 2 deletions docs/source/configs/framework_components/env.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ Environment
------------

.. mdinclude:: ../../../../README.md
:start-line: 109
:end-line: 111
:start-line: 108
:end-line: 110

.. literalinclude:: ../../../../.env.example
:language: bash
Expand Down
48 changes: 40 additions & 8 deletions docs/source/configs/model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,14 +34,14 @@ Invariant Encoders
=============================

.. mdinclude:: ../../../README.md
:start-line: 295
:end-line: 302
:start-line: 319
:end-line: 326

:py:class:`SchNet <proteinworkshop.models.graph_encoders.schnet.SchNetModel>` (``schnet``)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

SchNet is one of the most popular and simplest instantiation of E(3) invariant message passing GNNs. SchNet constructs messages through element-wise multiplication of scalar features modulated by a radial filter conditioned on the pairwise distance :math:`\Vert \vec{\vx}_{ij} \Vert`` between two neighbours.
Scalar features are update from iteration :math:`t`` to :math:`t+1` via:
Scalar features are updated from iteration :math:`t`` to :math:`t+1` via:

.. math::
\begin{align}
Expand Down Expand Up @@ -113,12 +113,25 @@ where :math:`\mathrm{FC(\cdot)}` denotes a linear transformation upon the messag
:caption: config/encoder/gear_net_edge.yaml



:py:class:`CDConv <proteinworkshop.models.graph_encoders.cdconv.CDConvModel>` (``cdconv``)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

CDConv is an SE(3) invariant architecture that uses independent learnable weights for sequential displacement, whilst directly encoding geometric displacements.

As a result of the downsampling procedures, this architecture is only suitable for graph-level prediction tasks.

.. literalinclude:: ../../../proteinworkshop/config/encoder/cdconv.yaml
:language: yaml
:caption: config/encoder/cdconv.yaml


Vector-Equivariant Encoders
=============================

.. mdinclude:: ../../../README.md
:start-line: 306
:end-line: 312
:start-line: 330
:end-line: 336

:py:class:`EGNN <proteinworkshop.models.graph_encoders.egnn.EGNNModel>` (``egnn``)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -170,8 +183,8 @@ Tensor-Equivariant Encoders
=============================

.. mdinclude:: ../../../README.md
:start-line: 314
:end-line: 319
:start-line: 338
:end-line: 343


:py:class:`Tensor Field Networks <proteinworkshop.models.graph_encoders.tfn.TensorProductModel>` (``tfn``)
Expand Down Expand Up @@ -200,7 +213,7 @@ where the weights :math:`\vw` of the tensor product are computed via a learnt ra

MACE (Batatia et al., 2022) is a higher order E(3) or SE(3) equivariant GNN originally developed for molecular dynamics simulations.
MACE provides an efficient approach to computing high body order equivariant features in the Tensor Field Network framework via Atomic Cluster Expansion:
They first aggregate neighbourhood features analogous to the node update equation for TFN above (the :math:`A` functions in Batatia et al. (2022) (eq.9)) and then take :math:`k-1` repeated self-tensor products of these neighbourhood features.
They first aggregate neighbourhood features analogous to the node update equation for TFN above (the :math:`A` functions in Batatia et al. (2022) (eq.9)) and then take :math:`k-1` repeated self-tensor products of these neighbourhood features.
In our formalism, this corresponds to:

.. math::
Expand All @@ -214,6 +227,25 @@ In our formalism, this corresponds to:
:caption: config/encoder/mace.yaml


Sequence-Based Encoders
=============================

.. mdinclude:: ../../../README.md
:start-line: 345
:end-line: 349


:py:class:`Evolutionary Scale Modeling <proteinworkshop.models.graph_encoders.esm_embeddings.EvolutionaryScaleModeling>` (``esm``)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Evolutionary Scale Modeling is a series of Transformer-based protein sequence encoders (Vaswani et al., 2017) that has been successfully used in protein structure prediction (Lin et al., 2023), protein design (Verkuil et al., 2022), and beyond.
This model class has commonly been used as a baseline for protein-related representation learning tasks, and we included it in our benchmark for this reason.

.. literalinclude:: ../../../proteinworkshop/config/encoder/esm.yaml
:language: yaml
:caption: config/encoder/esm.yaml


Decoder Models
=============================

Expand Down
5 changes: 3 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ Welcome to Protein Workshop's documentation!
configs/task
configs/features
configs/transforms
configs/metrics
framework
ml_components

Expand All @@ -57,9 +58,9 @@ Welcome to Protein Workshop's documentation!
modules/proteinworkshop.tasks
modules/proteinworkshop.features
modules/proteinworkshop.utils
modules/protein_workshop.constants
modules/proteinworkshop.constants
modules/proteinworkshop.types

modules/proteinworkshop.metrics

Indices and tables
==================
Expand Down
4 changes: 2 additions & 2 deletions docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,5 @@ Installation
:doc:`/configs/framework_components/env`

.. mdinclude:: ../../README.md
:start-line: 64
:end-line: 109
:start-line: 66
:end-line: 108
4 changes: 4 additions & 0 deletions docs/source/modules/proteinworkshop.metrics.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
protein_workshop.metrics
-------------------------

Stay tuned!
Loading