-
Notifications
You must be signed in to change notification settings - Fork 0
Home
For this article we mainly have three objectives
We listed all the Enterobacter bugandensis genomes present in NCBI using Entrez Direct (EDirect)
esearch -query '"Enterobacter bugandensis"[ORGN] AND (latest[filter] AND all[filter] NOT anomalous[filter])' -db assembly | esummary | xtract -pattern DocumentSummary -def "NA" -element AssemblyAccession > E_bugandensis_accessions.txt
Then the genomes were downloaded using bit package (https://github.com/AstrobioMike/bit) and then unzip it
# Download the genomes using bit package
bit-dl-ncbi-assemblies -w E_bugandensis_accessions.txt -j 100 -f fasta
# Unzipping uisng gunzip
gunzip *.gz
Further we calculate Average Nucleotide Identity (ANI) to ensure taxonomic identification and relatedness of the genomes using FastANI (https://github.com/ParBLiSS/FastANI)
# Compiled all the genomes
realpath *.fa > query_file.txt
# Running fastANI
fastANi -q GCF_900324475.1.fa --rl /path/to/query_file.txt -o output.txt
We used GToTree (https://github.com/AstrobioMike/GToTree) to construct the phylogenetic tree of the E. bugandensis genomes
# Compiled all the genomes
ls *.fa > fasta_files.txt
# Running GToTree
GToTree -f fasta_files.txt -H Gammaproteobacteria -t -L Species,Strain -j 100 -o GToTree
We further used Snippy (https://github.com/tseemann/snippy) to identify mutations in the E. bugandensis genomes with respect to its Type Strain (EB-247)
# Example run: identifying mutations for IF2SW-F3 (GCF_013403425.1.fa) wrt EB-247 (GCF_900075565.1.fa)
snippy --outdir IF2SW-F3 --ref GCF_900075565.1.fa --ctgs GCF_013403425.1.fa
We re-annotated the genomes using Prokka (https://github.com/tseemann/prokka) and constructed a pan-genome using Panaroo (https://github.com/gtonkinhill/panaroo)
# Annotating genomes iteratively
while read p; do prokka --outdir "$p" --cpus 50 "$p"; done<fasta_files.txt
# Extract all the annotated gff files into one folder and process Panaroo
# Pan-genome construction
panaroo -i *.gff -o ./results/ --clean-mode strict -t 80 --remove-invalid-genes -f 0.7 --merge_paralogs --aligner clustal --core_threshold 0.95
# Tree construction from pan-genome core-alignment
iqtree -s core_gene_alignment_filtered.aln --prefix ebug -T 40 -fast -m GTR+G
We used python-based cogclassifier (https://pypi.org/project/cogclassifier/) to identified the Cluster of Orthologous Genes in all the E. bugandensis genomes
# Identifying COGs
while read p; do COGclassifier -i "$p"/*.faa -o ./cogclassification/"$p"; done <files.txt
We used Scoary (https://github.com/AdmiralenOla/Scoary) to estimate the genome-wide associations among 4,696 gene sequences, ranging from 5 to 100% occurrence across all genomes
scoary -g gene_presence_absence.csv -t traits.csv --threads 60 -s 4
Metagenome raw sequences are available here: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA438545
# List all the SRR IDs present in the project
esearch -db sra -query PRJNA438545 | efetch -format runinfo |cut -d "," -f 1 > SRRid.list
# Download them using fastq-dump
parallel --jobs 4 "fastq-dump --split-files --origfmt --gzip {}" ::: SRRid.list
We performed quality control and taxonomy prediction using the MetaSUB-CAMP Pipeline (selected parameters are described in the Methods section)
A detailed pipeline is available here: https://github.com/MetaSUB-CAMP
We analyzed the output of Bracken using the code metagenomicsanalysis.py attached in the main repository.
Models can be found here: https://narrative.kbase.us/narrative/151913
We predicted the minimal microbiome for the identified communities using MetQuest2 (https://github.com/RamanLab/metquest2)
from metquest import minimal_media_from_cobrapy
path = '/path/to/directory/with/models'
outputfilename = '/path/to/output.txt'
essential_mets = '/path/to/essential metabolites.txt'
minimal_media_from_cobrapy(path, outputfilename, essential_mets)
Note: Essential metabolite argument is obsolete now.
We simulate the models for calculating pairwise MSI using MetQuest2 with minimal medium with added co-factors
from metquest import calculate_pairwiseMSI
path = '/path/to/directory/with/models'
medium_file = '/path/to/medium.txt'
calculate_pairwiseMSI(path, medium_file)
We used SteadyCom (https://github.com/maranasgroup/SteadyCom) to estimate the nature of the microbial interactions with the E. bugandensis in the ISS communities
SteadyCom code is attached in the main repository steadycom_interactions.m