Skip to content

Snakemake Pipeline to automatically basecall Nanopore sequencing data with hybrid approach of simplex and duplex basecalling

License

Notifications You must be signed in to change notification settings

richardstoeckl/basecallNanopore

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Snakemake workflow: Nanopore Basecalling

Author: [email protected]

Snakemake

About

This Snakemake workflow simplifies the basecalling of raw Oxford Nanopore data with either Guppy or Dorado, specifically the task of Duplex basecalling multiplexed data.

Currently, neither Guppy nor Dorado can do this in one simple step/command, so this is why I set up this workflow. Regardless of how many barcodes were used, this workflow will automatically adjust itself to produce the output.

Workflow

Depending on what basecaller is specified in the runs.csv file, the data is either going through the Guppy or Dorado pipeline.

Guppy

For Guppy, the following pipeline based on best practices from Nanopore and members of the community is used:

  1. Basecall with Guppy in simplex mode with demultiplexing
  2. Use duplex_tools pairs_from_summary to generate candidate read pairs
  3. Use duplex_tools filter_pairs to check for similarity
  4. Use the filtered pairs to basecall using Guppy in duplex mode and demultiplexing

Dorado

For Dorado, a pipeline based on recommendations from Nanopore is used:

  1. Basecall with Dorado in simplex mode with demultiplexing enabled
  2. Extract a list of ReadIDs for each barcode
  3. Basecall using Dorado in duplex mode but constrain to the reads for each barcode seperately

Usage

Check out the usage instructions in the snakemake workflow catalog

But here is a rough overview:

  1. Install conda (mamba or miniconda is fine).
  2. Install snakemake with:
conda install -c conda-forge -c bioconda snakemake
  1. Install Guppy (see these instructions) and/or Dorado (see these instructions), depending on what you want to use
  2. Download the latest release from this repo and cd into it
  3. Edit the config/config.yaml to provide the paths to your results/logs directories, and the paths to Dorado and/or Guppy, as well as any parameters you might want to change. You can test the setup by using the config/runs_test.csv sample sheet.
  4. Edit the config/runs.csv file with the specific details for each run. Depending on what you enter here, the pipeline will automatically adjust what will be done.
  5. Open a terminal in the main dir and start a dry-run of the pipeline with the following command. This will download and install all the dependencies for the pipeline (this step takes may take some time) and it will show you if you set up the paths correctly:
snakemake --use-conda -n --cores
  1. Run the pipeline with
snakemake --use-conda --cores

TODO and planned features

  • basecall qc (pycoqc? fastp?)
  • choose dorado duplex output (instead of fastq)
  • somehow automate Dorado installation?
Copyright Richard Stöckl 2024.
Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE or copy at 
https://www.boost.org/LICENSE_1_0.txt)

About

Snakemake Pipeline to automatically basecall Nanopore sequencing data with hybrid approach of simplex and duplex basecalling

Resources

License

Stars

Watchers

Forks

Packages

No packages published