Snakemake workflow: `Nanopore Basecalling`

About

This Snakemake workflow simplifies the basecalling of raw Oxford Nanopore data with either Guppy or Dorado, specifically the task of Duplex basecalling multiplexed data.

Currently, neither Guppy nor Dorado can do this in one simple step/command, so this is why I set up this workflow. Regardless of how many barcodes were used, this workflow will automatically adjust itself to produce the output.

Workflow

Depending on what basecaller is specified in the runs.csv file, the data is either going through the Guppy or Dorado pipeline.

Guppy

For Guppy, the following pipeline based on best practices from Nanopore and members of the community is used:

Basecall with Guppy in simplex mode with demultiplexing
Use duplex_tools pairs_from_summary to generate candidate read pairs
Use duplex_tools filter_pairs to check for similarity
Use the filtered pairs to basecall using Guppy in duplex mode and demultiplexing

Dorado

For Dorado, a pipeline based on recommendations from Nanopore is used:

Basecall with Dorado in simplex mode with demultiplexing enabled
Extract a list of ReadIDs for each barcode
Basecall using Dorado in duplex mode but constrain to the reads for each barcode seperately

Usage

Check out the usage instructions in the snakemake workflow catalog

But here is a rough overview:

Install conda (mamba or miniconda is fine).
Install snakemake with:

conda install -c conda-forge -c bioconda snakemake

Install Guppy (see these instructions) and/or Dorado (see these instructions), depending on what you want to use
Download the latest release from this repo and cd into it
Edit the config/config.yaml to provide the paths to your results/logs directories, and the paths to Dorado and/or Guppy, as well as any parameters you might want to change. You can test the setup by using the config/runs_test.csv sample sheet.
Edit the config/runs.csv file with the specific details for each run. Depending on what you enter here, the pipeline will automatically adjust what will be done.
Open a terminal in the main dir and start a dry-run of the pipeline with the following command. This will download and install all the dependencies for the pipeline (this step takes may take some time) and it will show you if you set up the paths correctly:

snakemake --use-conda -n --cores

Run the pipeline with

snakemake --use-conda --cores

TODO and planned features

basecall qc (pycoqc? fastp?)
choose dorado duplex output (instead of fastq)
somehow automate Dorado installation?

Copyright Richard Stöckl 2024.
Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE or copy at 
https://www.boost.org/LICENSE_1_0.txt)

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.tests		.tests
config		config
workflow		workflow
.gitignore		.gitignore
.snakemake-workflow-catalog.yml		.snakemake-workflow-catalog.yml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Snakemake workflow: `Nanopore Basecalling`

About

Workflow

Guppy

Dorado

Usage

TODO and planned features

About

Releases 3

Packages

Languages

License

richardstoeckl/basecallNanopore

Folders and files

Latest commit

History

Repository files navigation

Snakemake workflow: Nanopore Basecalling

About

Workflow

Guppy

Dorado

Usage

TODO and planned features

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Snakemake workflow: `Nanopore Basecalling`

Packages