Skip to content
Pratyay Sengupta edited this page Dec 19, 2023 · 11 revisions

Welcome to the wiki page of the Repo!


For this article we mainly have three objectives

1. Analysis of Enterobacter bugandensis genomes

We listed all the Enterobacter bugandensis genomes present in NCBI using Entrez Direct (EDirect)

esearch -query '"Enterobacter bugandensis"[ORGN] AND (latest[filter] AND all[filter] NOT anomalous[filter])' -db assembly | esummary | xtract -pattern DocumentSummary -def "NA" -element AssemblyAccession > E_bugandensis_accessions.txt

Then the genomes were downloaded using bit package (https://github.com/AstrobioMike/bit) and then unzip it

# Download the genomes using bit package
bit-dl-ncbi-assemblies -w E_bugandensis_accessions.txt -j 100 -f fasta
# Unzipping uisng gunzip
gunzip *.gz

Further we calculate Average Nucleotide Identity (ANI) to ensure taxonomic identification and relatedness of the genomes using FastANI (https://github.com/ParBLiSS/FastANI)

# Compiled all the genomes 
realpath *.fa > query_file.txt 
# Running fastANI
fastANi -q GCF_900324475.1.fa --rl /path/to/query_file.txt -o output.txt

We used GToTree (https://github.com/AstrobioMike/GToTree) to construct the phylogenetic tree of the E. bugandensis genomes

# Compiled all the genomes
ls *.fa > fasta_files.txt
# Running GToTree
GToTree -f fasta_files.txt -H Gammaproteobacteria -t -L Species,Strain -j 100 -o GToTree

We further used Snippy (https://github.com/tseemann/snippy) to identify mutations in the E. bugandensis genomes with respect to its Type Strain (EB-247)

# Example run: identifying mutations for IF2SW-F3 (GCF_013403425.1.fa) wrt EB-247 (GCF_900075565.1.fa)
snippy --outdir IF2SW-F3 --ref GCF_900075565.1.fa --ctgs GCF_013403425.1.fa

We re-annotated the genomes using Prokka (https://github.com/tseemann/prokka) and constructed a pan-genome using Panaroo (https://github.com/gtonkinhill/panaroo)

# Annotating genomes iteratively 
while read p; do prokka --outdir "$p" --cpus 50 "$p"; done<fasta_files.txt

# Extract all the annotated gff files into one folder and process Panaroo
# Pan-genome construction
panaroo -i *.gff -o ./results/ --clean-mode strict -t 80 --remove-invalid-genes -f 0.7 --merge_paralogs --aligner clustal --core_threshold 0.95

# Tree construction from pan-genome core-alignment
iqtree -s core_gene_alignment_filtered.aln --prefix ebug -T 40 -fast -m GTR+G 

We used python-based cogclassifier (https://pypi.org/project/cogclassifier/) to identified the Cluster of Orthologous Genes in all the E. bugandensis genomes

# Identifying COGs
while read p; do COGclassifier -i  "$p"/*.faa -o ./cogclassification/"$p"; done <files.txt 

We used Scoary (https://github.com/AdmiralenOla/Scoary) to estimate the genome-wide associations among 4,696 gene sequences, ranging from 5 to 100% occurrence across all genomes

scoary -g gene_presence_absence.csv -t traits.csv --threads 60 -s 4

2. Presence of Enterobacter bugandensis in International Space Station (ISS) metagenome

Metagenome raw sequences are available here: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA438545

# List all the SRR IDs present in the project
esearch -db sra -query PRJNA438545 | efetch -format runinfo |cut -d "," -f 1 > SRRid.list

# Download them using fastq-dump
parallel --jobs 4 "fastq-dump --split-files --origfmt --gzip {}" ::: SRRid.list

We performed quality control and taxonomy prediction using the MetaSUB-CAMP Pipeline (selected parameters are described in the Methods section)

A detailed pipeline is available here: https://github.com/MetaSUB-CAMP

We analyzed the output of Bracken using the code metagenomicsanalysis.py attached in the main repository.


3. Metabolic interactions of Enterobacter bugandensis with co-existing communities

Models can be found here: https://narrative.kbase.us/narrative/151913

We predicted the minimal microbiome for the identified communities using MetQuest2 (https://github.com/RamanLab/metquest2)

from metquest import minimal_media_from_cobrapy

path = '/path/to/directory/with/models'
outputfilename = '/path/to/output.txt'
essential_mets = '/path/to/essential metabolites.txt'

minimal_media_from_cobrapy(path, outputfilename, essential_mets)

Note: Essential metabolite argument is obsolete now.

We simulate the models for calculating pairwise MSI using MetQuest2 with minimal medium with added co-factors

from metquest import calculate_pairwiseMSI

path = '/path/to/directory/with/models'
medium_file = '/path/to/medium.txt'

calculate_pairwiseMSI(path, medium_file)

We used SteadyCom (https://github.com/maranasgroup/SteadyCom) to estimate the nature of the microbial interactions with the E. bugandensis in the ISS communities

SteadyCom code is attached in the main repository steadycom_interactions.m