The official code for "TaxDiff: Taxonomic-Guided Diffusion Model for Protein Sequence Generation". Here we publish the inference code of TaxDiff. The training code & Protein sequence with Taxonomic lables dataset will be released after our paper is accepted.
💡 I also have other AI for Science projects that may interest you ✨.
ProLLaMA: A Protein Large Language Model for Multi-Task Protein Language Processing
Liuzhenghao Lv, Zongying Lin, Li Hao, Yuyang Liu, Jiaxi Cui, Calvin Yu-Chian Chen, Li Yuan, Yonghong Tian
- To the best of our knowledge, our TaxDiff is the first controllable protein generation model utilizing guidance from taxonomies.
- TaxDiff proposes a taxonomic-guided framework that fits all diffusion-based protein design models. We also propose the patchify attention mechanism for better protein design.
- Experiments demonstrate that our TaxDiff achieves state-of-the-art results in both taxonomic-guided controllable and unconditional protein sequence generation, excelling in structural modeling scores and sequence consistency.
More detailed results can be found in our paper.
For inference, please download from HuggingFace. Unzip it and put the ckpt into the folder ckpt/
ckpt/0012802_eval.ckpt
Our dataset can download from HuggingFace.
uniref50_200_256_clean_taxnomic_family_tid__filter_layer6.fasta
We will release protein sequences with taxonmic labels for training procedure once our paper is accepted.
If you want to select a specific protein taxonomic for your research, you need to first find his corresponding tax-id in the data_reader/Taxonnmic_classfication.xlsx, and then modify protein class lables in the sample_protein.py.
class_lables = torch.randint(low=1, high=int(23427), size=(1,num))
- Python == 3.10
- Pytorch == 2.2.0
- Torchvision == 0.17.0
- CUDA Version == 12.0
- Install required packages:
git clone git@[github.com/Linzy19/TaxDiff.git]
cd TaxDiff
pip install -r requirements.txt
The inferencing instruction is in sample_protein.py.
python sample_protein.py --model DiT-pro-12-h6-L16 --cuda-num cuda:0 --num 500
If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝.
@article{zongying2024taxdiff,
title={TaxDiff: Taxonomic-Guided Diffusion Model for Protein Sequence Generation},
author={Zongying, Lin and Hao, Li and Liuzhenghao, Lv and Bin, Lin and Junwu, Zhang and Yu-Chian, Chen Calvin and Li, Yuan and Yonghong, Tian},
journal={arXiv preprint arXiv:2402.17156},
year={2024}
}