Skip to content

From cryo-EM density map to atomic structure

License

Notifications You must be signed in to change notification settings

Wang-Lin-boop/CryFold

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CryFold

Overview

CryFold is a software that automatically constructs full-atom 3D structural models of proteins based on cryo-EM density maps and sequence information.

It has two main stages: the first step predicts the Cα atom coordinates from the density map, and the second step builds the full-atom model by combining the sequence and density map information. Finally, the full-atom model will undergo a post-processing program to generate the final protein model. This post-processing program is modified from ModelAngelo.

For more details on CryFold, please refer to the manuscript.

Hardware requirements

CryFold requires at least 3GB of disk space for its own weight files plus the weight files of the ESM language model. It also requires at least 13GB of GPU memory.

Installation

Install CryFold

Step 1: Install Conda

It requires to use conda to manage the Python dependencies, which can be installed following https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html#regular-installation.

Step 2: Clone this repository

Now, you need to clone this Github repository with

git clone https://github.com/SBQ-1999/CryFold.git

Step 3: Check if the GPU is available

Ensure that the device you are currently running on has a GPU .it must be able to run the command:

nvcc -V

If you are in a cluster system, make sure that the node you are on has a GPU (this can be checked using the command 'nvcc -V'). If it does not have one, please first log into any compute node that has a GPU to execute the following commands.

Step 4: Install CryFold

Navigate to the CryFold installation directory and run the installation script:

cd CryFold
source install.sh

Once the installation script has finished running, you will have an CryFold execution environment. Finally, you can run the command

build -h

to check if the installation was successful.

Usage

First, use the command

build -h

to check some basic parameters of CryFold.

Additionally, since the first run requires downloading a 2GB ESM language model weight file, the waiting time is relatively long. However, this issue does not occur in subsequent runs. Below are a few simple examples to illustrate how to use CryFold.

Use a cryo-EM density map and a FASTA sequence

First, we need the density map and the fasta file:

wget -P ./example https://ftp.ebi.ac.uk/pub/databases/emdb/structures/EMD-33306/map/emd_33306.map.gz
wget https://www.rcsb.org/fasta/entry/7xmv -O ./example/rcsb_pdb_7XMV.fasta
cd ./example
gzip -d emd_33306.map.gz

Then, run CryFold:

conda activate CryFold
build -s rcsb_pdb_7XMV.fasta -v emd_33306.map -o out
Extra use of mask map

let's assume we already have the density map and the fasta file. we also need to obtain the mask map:

cd ./example
wget https://ftp.ebi.ac.uk/pub/databases/emdb/structures/EMD-33306/masks/emd_33306_msk_1.map

Then, run CryFold:

build -s rcsb_pdb_7XMV.fasta -v emd_33306.map -m emd_33306_msk_1.map -o use_mask
Use only cryo-EM density map without inputting sequences

If you have a protein sequence database for the species which covers all sequences of the density map, you can run the command as follows (Otherwise, do not specify the parameter -f, and it will only return the HMM profiles predicted by Cry-Net):

build -v map.mrc -o output -f sequence_database.fasta

The other way to use the sequence database is under the premise of input sequences (which can be any subset of the sequences covered by the density map):

build -s protein.fasta -v map.mrc -o output -f sequence_database.fasta

Both of the above applications can return a new file (output.fasta), which contains sequences found in the density map based on the sequence database.

Based on the premise of not knowing any specific sequences, this method can quickly obtain initial matching sequences from a protein database. Then, use output.fasta as the input for CryFold and run it iteratively until no new sequences are searched.

Specify GPU device and inference acceleration

If you want to specify the GPU number, you can set it using the parameter -d, for example, to specify GPU 3:

build -s protein.fasta -v map.mrc -o output_dir -d cuda:3

If you want to infer 900 residues at once (the default is 300), you can set it using the parameter -n:

build -s protein.fasta -v map.mrc -o output_dir -n 900

Citation

Su et al, Accurate de novo modeling of atomic structures from cryo-EM maps using an enhanced transformer, submitted, 2024.

About

From cryo-EM density map to atomic structure

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.5%
  • Shell 0.5%