Preprocessing steps for metagenomics reads

This is a Snakemake workflow to:

Check the quality of metagenomic reads using FastQC.
Trim low-quality bases and adapter sequences using Trimmomatic.
Assess the post-trimming quality of reads using FastQC.
Remove human DNA from the metagenomic dataset.

This workflow was used in:

Development of early life gut resistome and mobilome across gestational ages and microbiota-modifying treatments
Authors: Ahmed Bargheet, Claus Klingenberg, Eirin Esaiassen, Erik Hjerde, Jorunn Pauline Cavanagh, Johan Bengtsson-Palme, Veronika Kuchařová Pettersen
Dynamics of the Gut Resistome and Mobilome in Early Life: A Meta-Analysis
Authors: Ahmed Bargheet, Hanna Noordzij, Alise Ponsero, Ching Jian, Katri Korpela, Mireia Valles-Colomer, Justine Debelius, Alexander Kurilshikov, Veronika K. Pettersen

Installation and requirements

This pipeline requires the use of Snakemake, FastQC v0.11.9, Trimmomatic v=0.39, Bowtie 2 v2.4.5, SAMtools v1.17, and BEDTools v2.30.0.
If not previously installed run the following code:

# Clone the repository
git clone https://github.com/Ahmedbargheet/Snakemake_short_reads_preprocessing.git
cd Snakemake_short_reads_preprocessing

## Snakemake installation in a conda environment
conda env create --file envs/env_snakemake.yml

# Alternatively you can create the environment manually:
conda create -n snakemake_env -c bioconda snakemake fastqc=0.11.9 trimmomatic=0.39 bowtie2=2.4.5 samtools=1.17 bedtools=2.30.0
conda activate snakemake_env

Additionally, the human genome database should be downloaded from NCBI
Follow the following steps for downloading and indexing the human genome database

mkdir Human_database
cd Human_database
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_genomic.fna.gz
gunzip GCF_000001405.40_GRCh38.p14_genomic.fna.gz
bowtie2-build GCF_000001405.40_GRCh38.p14_genomic.fna Human

Moreover, the Trimmomatic binary file should be downloaded
Follow the following steps

mkdir trimmomatic
cd trimmomatic
wget http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.39.zip
unzip Trimmomatic-0.39.zip

Overview of the pipeline

How to run the Snakemake pipeline

In the Snakefile, you will find samples variables. You can change ["sample_name"] to your actual sample name. The pipeline is designed to work with paired files {sample}_1.fastq.gz and {sample}_2.fastq.gz)

# run the pipeline
mkdir -p result/1.fastqc/
mkdir -p result/2.trimmomatic/
mkdir -p result/3.fastqc/
mkdir -p result/4.rm_dna/1.sort/
mkdir -p result/4.rm_dna/2.fq/
snakemake --cores 8

Note:

The main workflow is designed to remove adapters from the TruSeq kit. However, if the sequencing center used adapters from the Nextera kit, you can change this path:

/trimmomatic/Trimmomatic-0.39/adapters/TruSeq3-PE.fa

To

/trimmomatic/Trimmomatic-0.39/adapters/NexteraPE-PE.fa

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
env		env
README.md		README.md
Snakefile		Snakefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Preprocessing steps for metagenomics reads

This is a Snakemake workflow to:

This workflow was used in:

Installation and requirements

Overview of the pipeline

How to run the Snakemake pipeline

Note:

About

Releases

Packages

Languages

Ahmedbargheet/Snakemake_short_reads_preprocessing

Folders and files

Latest commit

History

Repository files navigation

Preprocessing steps for metagenomics reads

This is a Snakemake workflow to:

This workflow was used in:

Installation and requirements

Overview of the pipeline

How to run the Snakemake pipeline

Note:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages