Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade examples to BioNeMo 2 #3095

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions examples/advanced/bionemo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,14 @@ This directory contains examples of running BioNeMo in a federated learning envi
## Notebooks

In this repo you will find two notebooks under the `task_fitting` and `downstream` folders respectively:
1. The [task_fitting](./task_fitting/task_fitting.ipynb) notebook example includes a notebook that shows how to obtain protein learned representations in the form of embeddings using the ESM-1nv pre-trained model.
The model is trained with NVIDIA's BioNeMo framework for Large Language Model training and inference.
1. The [task_fitting](./task_fitting/task_fitting.ipynb) notebook example includes a notebook that shows how to obtain protein-learned representations in the form of embeddings using an ESM-2 pre-trained model.

2. The [downstream](./downstream/downstream_nvflare.ipynb) notebook example shows three different downstream tasks for fine-tuning a BioNeMo ESM-style model.

## Requirements

Download and run the [BioNeMo docker container](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/containers/bionemo-framework).
> **Note:** The examples here were tested with `nvcr.io/nvidia/clara/bionemo-framework:1.8`
> **Note:** The examples here were tested with `nvcr.io/nvidia/clara/bionemo-framework:2.1`

We recommend following the [Quickstart Guide](https://docs.nvidia.com/bionemo-framework/latest/access-startup.html?highlight=docker)
on how to get the BioNeMo container.
Expand Down
181 changes: 81 additions & 100 deletions examples/advanced/bionemo/downstream/downstream_nvflare.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"source": [
"# Federated Protein Downstream Fine-tuning\n",
"\n",
"<div class=\"alert alert-block alert-info\"> <b>NOTE</b> This notebook was tested on a single A1000 GPU and is compatible with BioNeMo Framework v1.8. To leverage additional or higher-performance GPUs, you can modify the configuration files and simulation script to accommodate multiple devices and increase thread utilization respectively.</div>\n",
"<div class=\"alert alert-block alert-info\"> <b>NOTE</b> This notebook was tested on a single A1000 GPU and is compatible with BioNeMo Framework v2.3. To leverage additional or higher-performance GPUs, you can modify the configuration files and simulation script to accommodate multiple devices and increase thread utilization respectively.</div>\n",
"\n",
"The example datasets used here are made available by [Therapeutics Data Commons](https://tdcommons.ai/) through PyTDC.\n",
"\n",
Expand All @@ -29,44 +29,46 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"metadata": {
"tags": []
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[33mDEPRECATION: Loading egg at /usr/local/lib/python3.12/dist-packages/igraph-0.11.8-py3.12-linux-x86_64.egg is deprecated. pip 25.1 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330\u001b[0m\u001b[33m\n",
"\u001b[0m\u001b[33mDEPRECATION: Loading egg at /usr/local/lib/python3.12/dist-packages/looseversion-1.3.0-py3.12.egg is deprecated. pip 25.1 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330\u001b[0m\u001b[33m\n",
"\u001b[0m\u001b[33mDEPRECATION: Loading egg at /usr/local/lib/python3.12/dist-packages/lightning_utilities-0.11.9-py3.12.egg is deprecated. pip 25.1 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330\u001b[0m\u001b[33m\n",
"\u001b[0m\u001b[33mDEPRECATION: Loading egg at /usr/local/lib/python3.12/dist-packages/texttable-1.7.0-py3.12.egg is deprecated. pip 25.1 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330\u001b[0m\u001b[33m\n",
"\u001b[0m\u001b[33mDEPRECATION: Loading egg at /usr/local/lib/python3.12/dist-packages/opt_einsum-3.4.0-py3.12.egg is deprecated. pip 25.1 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330\u001b[0m\u001b[33m\n",
"\u001b[0m\u001b[33mDEPRECATION: Loading egg at /usr/local/lib/python3.12/dist-packages/dill-0.3.9-py3.12.egg is deprecated. pip 25.1 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330\u001b[0m\u001b[33m\n",
"\u001b[0m\u001b[33mDEPRECATION: Loading egg at /usr/local/lib/python3.12/dist-packages/nvfuser-0.2.13a0+0d33366-py3.12-linux-x86_64.egg is deprecated. pip 25.1 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330\u001b[0m\u001b[33m\n",
"\u001b[0m\u001b[33mDEPRECATION: Loading egg at /usr/local/lib/python3.12/dist-packages/lightning_thunder-0.2.0.dev0-py3.12.egg is deprecated. pip 25.1 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330\u001b[0m\u001b[33m\n",
"\u001b[0mLooking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n",
"Requirement already satisfied: fuzzywuzzy in /usr/local/lib/python3.12/dist-packages (0.18.0)\n",
"Requirement already satisfied: PyTDC in /usr/local/lib/python3.12/dist-packages (1.1.12)\n",
"\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.\u001b[0m\u001b[33m\n",
"\u001b[0m"
]
}
],
"source": [
"%%capture --no-display --no-stderr cell_output\n",
"! pip install PyTDC\n",
"! pip install nvflare~=2.5.1\n",
"! pip install biopython\n",
"! pip install scikit-learn\n",
"! pip install matplotlib\n",
"! pip install protobuf==3.20\n",
"! pip install huggingface-hub==0.22.0\n",
"# %%capture --no-display --no-stderr cell_output\n",
"! pip install fuzzywuzzy PyTDC --no-dependencies # install tdc without dependencies to avoid version conflicts in the BioNeMo container\n",
"#! pip install nvflare~=2.5\n",
"#! pip install biopython\n",
"#! pip install scikit-learn\n",
"#! pip install matplotlib\n",
"#! pip install protobuf==3.20\n",
"#! pip install huggingface-hub==0.22.0\n",
"\n",
"import os\n",
"import warnings\n",
"\n",
"\n",
"warnings.filterwarnings('ignore')\n",
"warnings.simplefilter('ignore')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Home Directory"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"bionemo_home = \"/workspace/bionemo\"\n",
"os.environ['BIONEMO_HOME'] = bionemo_home"
"warnings.filterwarnings(\"ignore\")\n",
"warnings.simplefilter(\"ignore\")"
]
},
{
Expand All @@ -75,78 +77,27 @@
"source": [
"### Download Model Checkpoints\n",
"\n",
"In order to download pretrained models from the NGC registry, **please ensure that you have installed and configured the NGC CLI**, check the [Quickstart Guide](https://docs.nvidia.com/bionemo-framework/latest) for more info. The following code will download the pretrained model `esm2nv_650M_converted.nemo` from the NGC registry."
"The following code will download the pre-trained model, `\"esm2/8m:2.0`, from the NGC registry:"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/root/.cache/bionemo/2957b2c36d5978d0f595d6f1b72104b312621cf0329209086537b613c1c96d16-esm2_hf_converted_8m_checkpoint.tar.gz.untar\n"
]
}
],
"source": [
"# Define the NGC CLI API KEY and ORG for the model download\n",
"# If these variables are not already set in the container, uncomment below\n",
"# to define and set with your API KEY and ORG\n",
"# api_key = <YOUR_API_KEY>\n",
"# ngc_cli_org = <YOUR_ORG>\n",
"# # Update the environment variable\n",
"# os.environ['NGC_CLI_API_KEY'] = api_key\n",
"# os.environ['NGC_CLI_ORG'] = ngc_cli_org\n",
"from bionemo.core.data.load import load\n",
"\n",
"# Set variables and paths for model and checkpoint\n",
"model_name = \"esm2nv_650m\" # \"esm1nv\" \n",
"actual_checkpoint_name = \"esm2nv_650M_converted.nemo\" # \"esm1nv.nemo\"\n",
"model_path = os.path.join(bionemo_home, 'models')\n",
"checkpoint_path = os.path.join(model_path, actual_checkpoint_name)\n",
"os.environ['MODEL_PATH'] = model_path"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%capture --no-display --no-stderr cell_output\n",
"if not os.path.exists(checkpoint_path):\n",
" !cd /workspace/bionemo && \\\n",
" python download_artifacts.py --model_dir models --models {model_name}\n",
"else:\n",
" print(f\"Model {model_name} already exists at {model_path}.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Again for esm1nv: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model_name = \"esm1nv\"\n",
"actual_checkpoint_name = \"esm1nv.nemo\"\n",
"model_path = os.path.join(bionemo_home, 'models')\n",
"checkpoint_path = os.path.join(model_path, actual_checkpoint_name)\n",
"os.environ['MODEL_PATH'] = model_path"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%capture --no-display --no-stderr cell_output\n",
"if not os.path.exists(checkpoint_path):\n",
" !cd /workspace/bionemo && \\\n",
" python download_artifacts.py --model_dir models --models {model_name}\n",
"else:\n",
" print(f\"Model {model_name} already exists at {model_path}.\")"
"checkpoint_path = load(\"esm2/8m:2.0\")\n",
"print(checkpoint_path)"
]
},
{
Expand Down Expand Up @@ -226,11 +177,41 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 4,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Downloading...\n",
"100%|███████████████████████████████████████| 601k/601k [00:00<00:00, 1.96MiB/s]\n",
"Loading...\n",
"Done!\n",
"Sampling with alpha=1.0\n",
"Save 80 training proteins for site-1 (frac=0.041)\n",
"Save 365 training proteins for site-2 (frac=0.190)\n",
"Save 216 training proteins for site-3 (frac=0.112)\n",
"Save 578 training proteins for site-4 (frac=0.300)\n",
"Save 568 training proteins for site-5 (frac=0.295)\n",
"Save 119 training proteins for site-6 (frac=0.062)\n",
"Saved 1927 training and 482 testing proteins.\n",
" TRAIN Pos/Neg ratio: neg=366, pos=1561: 4.265\n",
" TRAIN Trivial accuracy: 0.810\n",
" TEST Pos/Neg ratio: neg=116, pos=366: 3.155\n",
" TEST Trivial accuracy: 0.759\n",
"[[ nan 0.04657534 0.02314815 0.0449827 0.04929577 0.03361345]\n",
" [ nan nan 0.18055556 0.17128028 0.19542254 0.18487395]\n",
" [ nan nan nan 0.11591696 0.10211268 0.08403361]\n",
" [ nan nan nan nan 0.28521127 0.32773109]\n",
" [ nan nan nan nan nan 0.28571429]\n",
" [ nan nan nan nan nan nan]]\n",
"Avg. overlap: 14.20%\n"
]
}
],
"source": [
" # you may need to fix these paths to your own scripts\n",
"# you may need to fix these paths to your own scripts\n",
"! cd /bionemo_nvflare_examples/downstream/sabdab && python prepare_sabdab_data.py"
]
},
Expand Down Expand Up @@ -340,7 +321,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.12.3"
}
},
"nbformat": 4,
Expand Down
Loading
Loading