The goal of this repository is to contain the scripts to analyze
(nanoporemet.py
) and visualize (app.R
, coverage.py
) metagenomic
sequencing data generated by Oxford Nanopore Technologies sequencing
devices. Both viral and bacterial analyses are possible.
nanoporemet.py
analyzes metagenomic sequencing reads with kraken2.
As to whether only viral or also bacterial analysis should be performed
can be decided through the selection of the kraken2 database.
nanoporemet.py
first concatenates all .fastq.gz
files of each
barcode within /fastq_pass
, then runs kraken2 on all of them
individually, and finally combines all kraken2 output files (i.e. from
each barcode) into one file, either virus.kraken.txt
or
virus_bacteria.kraken.txt
(depending on the selected database). If
nanoporemet.py
is run after the sequencing run has finished and the
sequencing_summary_*.txt
file is available, a sequencing_summary.pdf
file is created which plots histograms of the mean Q scores and read
lengths of all reads as well as reads passing the quality filter.
- Enter timavo.
ssh timavo
- Activate kraken2.
conda activate kraken
- Move into the sequencing output directory, i.e., the one where you
find, e.g., the
fastq_pass
subdirectory, or thesequencing_summary_*.txt
file at the end of the sequencing run.
cd /data/GridION/GridIONOutput/<experiment>/<sample>/<flowcell>/
- Run the python script.
python <path to script>/nanoporemet.py
- The script asks you whether you want to analyze bacterial reads (in addition to only viral reads).
Reply with either yes
/y
or no
/n
.
Within the sequencing output directory, the script looks for the
/fastq_pass
subdirectory and analyzes all .fastq.gz
files.
nanoporemet.py
uses one of two kraken2 databases to analyze the
reads. The paths to these databases are to be found within the script
and can easily be adjusted. The current databases are as follows:
-
viral database:
k2_human-viral_20240111
-
viral + bacterial database:
k2_human-viral_20240111
For the creation of the histogram plots, the script looks for
sequencing_summary_*.txt
within the sequencing output directory. If it
is not available yet, this step is simply skipped.
The kraken2 report with the analysis of all barcodes is saved in the
sequencing output directory. Depending on the selection of the kraken
database, the report is saved as virus.kraken.txt
or
virus_bacteria.kraken.txt
.
The histogram plots of the mean Q scores and read lengths of all reads
as well as the reads passing the quality filter are all saved in
sequencing_summary.pdf
, which is also found within the sequencing
output directory.
The app.R
script is a Shiny app which serves to visualize the
kraken2 report as generated by nanoporemet.py
. Simply upload
virus.kraken.txt
or virus_bacteria.kraken.txt
to the app, select a
barcode and choose whether you want to analyze viral or bacterial reads,
on either species or genus level. Endogenous retroviruses and phages as
well as blocklisted viruses can be hidden from the output (the
blocklist can be updated within app.R
).
The Shiny app shows the taxonomic distribution of the reads in a barplot as well as a list with all found virus or bacterial species or genera within the sample (per barcode).
The coverage.py
automates coverage plot generation for Oxford Nanopore
Technologies reads. First, it concatenates all reads within
/fastq_pass
and then maps those reads to a desired reference sequence
(indexed .fasta
file) using minimap2.
- Enter timavo.
ssh timavo
- Activate minimap2.
conda activate minimap2
- Move into the sequencing output directory, i.e., the one where you
find, e.g., the
fastq_pass
subdirectory.
cd /data/GridION/GridIONOutput/<experiment>/<sample>/<flowcell>/
- Run the python script.
python <path to script>/coverage.py
- You will be asked to enter the path to the indexed reference sequence.
Within the sequencing output directory, the script looks for the
/fastq_pass
subdirectory and analyzes all .fastq.gz
files.
The path to the reference sequence is provided by the user upon running the script. Make sure the reference sequence is indexed and stored in
/analyses/ONT_analyses/bwa/references/<virus/bacteria>/<name>/
.
To index the reference .fasta
file, move into
/analyses/ONT_analyses/bwa/
and run:
./bwa index ./references/<virus/bacteria>/<name>/*.fasta
.
Within the sequencing output directory, you will find a new subdirectory
with the name of the reference sequence. Next to the coverage plot
(PDF), it also contains the .sam
, .bam
, and .coverage
files.