Skip to content

Latest commit

 

History

History
49 lines (40 loc) · 1.87 KB

README.md

File metadata and controls

49 lines (40 loc) · 1.87 KB

Preprocessing steps for RNA-seq reads and abundances quantification

This is a Snakemake workflow to:

  1. Check the quality of RNA-seq reads using FastQC.
  2. Trim low-quality bases and adapter sequences using Trim Galore.
  3. Assess the post-trimming quality of reads using Trim Galore.
  4. Quantifying abundances of transcripts using Kallisto.

Installation and requirements

This pipeline requires the use of Snakemake, FastQC v0.11.9, Trim Galore v0.6.10, Kallisto v0.50.1.
If not previously installed run the following code:

# Clone the repository
git clone https://github.com/Ahmedbargheet/Snakemake_RNA_seq.git
cd Snakemake_RNA_seq

## Snakemake installation in a conda environment
conda env create --file envs/env_snakemake.yml

# Alternatively you can create the environment manually:
conda create -n snakemake_env -c bioconda snakemake fastqc=0.11.9 trim-galore=0.6.10 kallisto=0.50.1
conda activate snakemake_env

Additionally, the human transcriptome database should be downloaded from ENSEMBL
Follow the following steps for downloading and indexing the human transcriptome database

mkdir cDNA
cd cDNA
wget https://ftp.ensembl.org/pub/release-112/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz
gunzip Homo_sapiens.GRCh38.cdna.all.fa.gz
kallisto index -i /cDNA/Homo_sapiens.GRCh38.cdna.idx /cDNA/Homo_sapiens.GRCh38.cdna.all.fa.gz -t 16

Overview of the pipeline

plot

How to run the Snakemake pipeline

In the Snakefile, you will find samples variables. You can change ["sample_name"] to your actual sample name. The pipeline is designed to work with paired files {sample}_1.fastq.gz and {sample}_2.fastq.gz)

# run the pipeline
mkdir -p result/1.fastqc/
mkdir -p result/2.trimming/
mkdir -p result/3.kallisto/
snakemake --cores 8