- Check the quality of RNA-seq reads using FastQC.
- Trim low-quality bases and adapter sequences using Trim Galore.
- Assess the post-trimming quality of reads using Trim Galore.
- Quantifying abundances of transcripts using Kallisto.
This pipeline requires the use of Snakemake, FastQC v0.11.9, Trim Galore v0.6.10, Kallisto v0.50.1.
If not previously installed run the following code:
# Clone the repository
git clone https://github.com/Ahmedbargheet/Snakemake_RNA_seq.git
cd Snakemake_RNA_seq
## Snakemake installation in a conda environment
conda env create --file envs/env_snakemake.yml
# Alternatively you can create the environment manually:
conda create -n snakemake_env -c bioconda snakemake fastqc=0.11.9 trim-galore=0.6.10 kallisto=0.50.1
conda activate snakemake_env
Additionally, the human transcriptome database should be downloaded from ENSEMBL
Follow the following steps for downloading and indexing the human transcriptome database
mkdir cDNA
cd cDNA
wget https://ftp.ensembl.org/pub/release-112/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz
gunzip Homo_sapiens.GRCh38.cdna.all.fa.gz
kallisto index -i /cDNA/Homo_sapiens.GRCh38.cdna.idx /cDNA/Homo_sapiens.GRCh38.cdna.all.fa.gz -t 16
In the Snakefile, you will find samples variables. You can change ["sample_name"] to your actual sample name. The pipeline is designed to work with paired files {sample}_1.fastq.gz and {sample}_2.fastq.gz)
# run the pipeline
mkdir -p result/1.fastqc/
mkdir -p result/2.trimming/
mkdir -p result/3.kallisto/
snakemake --cores 8