-
Notifications
You must be signed in to change notification settings - Fork 3
Setting up
Georgios Koutsovoulos edited this page Nov 23, 2023
·
22 revisions
git clone https://github.com/GDKO/AvP.git
conda create --name avp python=3
conda activate avp
# Install Programs
conda install -c bioconda mafft blast=2.9.0 trimal fasttree iqtree
# Install Python libraries
pip install numpy networkx pyyaml ete3 six biopython docopt pybedtools
[!] If you want to use diamond, download the latest version from here. Avoid installing with conda since sometimes it installs a very old version.
[!] The first time you run the program it will create a database for nodes and names from the NCBI
We recommend using BLAST with NR and Diamond with other databases.
A copy of NR should be present in the system. See here on how to download NR.
If you want to use diamond with nr see (#11) (thanks to @bshrestha0)
- Install diamond from (https://github.com/bbuchfink/diamond/releases)
- Download taxdump from NCBI
mkdir taxdump
tar xvf taxdump.tar.gz -C taxdump
Download SwissProt
#Create taxid file with acc2taxid.py
acc2taxid.py -i uniprot_sprot.fasta.gz -m swissprot > sp.taxids
#Makedb with diamond
diamond makedb --in uniprot_sprot.fasta.gz --taxonmap sp.taxids --db uniprot_sprot.fasta.dmnd --taxonnodes taxdump/nodes.dmp --taxonnames taxdump/names.dmp
Download Uniref90
#Rename headers with sed
sed 's/>/>Uniref90|/' <(zcat uniref90.fasta.gz) | gzip > uniref90.fasta.fixed.gz
#Create taxid file with acc2taxid.py
acc2taxid.py -i uniref90.fasta.fixed.gz -m uniref > un.taxids
#Makedb with diamond
diamond makedb --in uniref90.fasta.fixed.gz --taxonmap un.taxids --db uniref90.fasta.dmnd --taxonnodes taxdump/nodes.dmp --taxonnames taxdump/names.dmp
#Fasta headers should be in the following format
>DB|Accession TaxID=Number (ex. >Uniref90|Q6GZX3 TaxID=654924)
#Create taxid file with acc2taxid.py
acc2taxid.py -i [custom DB] -m uniref > db.taxids
#Create Diamond db
diamond makedb --in [custom DB] --taxonmap db.taxids --db db.fasta.dmnd --taxonnodes taxdump/nodes.dmp --taxonnames taxdump/names.dmp
This option inside the file config.yaml
has been only tested with simulated datasets so please report if there are any issues. blastn
should work for ncbi databases (e.g nt) and for custom databases see here.