Skip to content

Latest commit

 

History

History
51 lines (34 loc) · 1.77 KB

README.md

File metadata and controls

51 lines (34 loc) · 1.77 KB

PhyloEuk: Phylogenomics for Eukaryotes

Warnings: This pipeline is an ongoing work and for now only "works" for Ocrophytes, but there is no real reason why it may not work for others taxa with few modifications in the script.

Goal

Heavily inspired by GTDB-TK and Busco, the goal of this software is to determine the taxonomy of your eukaryotic strain(s) of interest based on the presence of single copy genes.

To Do

Add options to broaden the use case.

Dependencies

See dependencies.bib for citation of these dependencies.

Installation

conda install -c bioconda -c conda-forge mamba
mamba create -n phyloeuk -c bioconda -c conda-forge trimal mamba mafft busco=5 iqtree perl-bioperl perl-file-slurp bioawk epa-ng

Run

Caveats

These scripts are hardcoded to select single copy genes that are present in at least 30 reference genomes. At the moment, you cannot restart the processes, but opening the scripts and copy/paste the different commands will work. The more reference genomes, the longer of course. The last steps, aka the tree generation, is the most time consuming part.

Pipeline

git clone https://github.com/michoug/PhyloEuk.git

Put all references and MAGs proteomes (faa files) in folders called reference and MAGs, respectively.

Run the runReference.sh script. Run the runMAGs.sh script.