PhyloEuk: Phylogenomics for Eukaryotes

Warnings: This pipeline is an ongoing work and for now only "works" for Ocrophytes, but there is no real reason why it may not work for others taxa with few modifications in the script.

Goal

Heavily inspired by GTDB-TK and Busco, the goal of this software is to determine the taxonomy of your eukaryotic strain(s) of interest based on the presence of single copy genes.

To Do

Add options to broaden the use case.

Dependencies

conda
mafft
trimal
Busco
bioawk
iqtree2
epa-ng
perl
BioPerl

See dependencies.bib for citation of these dependencies.

Installation

conda install -c bioconda -c conda-forge mamba
mamba create -n phyloeuk -c bioconda -c conda-forge trimal mamba mafft busco=5 iqtree perl-bioperl perl-file-slurp bioawk epa-ng

Run

Caveats

These scripts are hardcoded to select single copy genes that are present in at least 30 reference genomes. At the moment, you cannot restart the processes, but opening the scripts and copy/paste the different commands will work. The more reference genomes, the longer of course. The last steps, aka the tree generation, is the most time consuming part.

Pipeline

git clone https://github.com/michoug/PhyloEuk.git

Put all references and MAGs proteomes (faa files) in folders called reference and MAGs, respectively.

Run the runReference.sh script. Run the runMAGs.sh script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

PhyloEuk: Phylogenomics for Eukaryotes

Goal

To Do

Dependencies

Installation

Run

Caveats

Pipeline

Files

README.md

Latest commit

History

README.md

File metadata and controls

PhyloEuk: Phylogenomics for Eukaryotes

Goal

To Do

Dependencies

Installation

Run

Caveats

Pipeline