Author: [email protected]
This Snakemake workflow simplifies the basecalling of raw Oxford Nanopore data with either Guppy or Dorado, specifically the task of Duplex basecalling multiplexed data.
Currently, neither Guppy nor Dorado can do this in one simple step/command, so this is why I set up this workflow. Regardless of how many barcodes were used, this workflow will automatically adjust itself to produce the output.
Depending on what basecaller is specified in the runs.csv
file, the data is either going through the Guppy or Dorado pipeline.
For Guppy, the following pipeline based on best practices from Nanopore and members of the community is used:
- Basecall with Guppy in simplex mode with demultiplexing
- Use duplex_tools
pairs_from_summary
to generate candidate read pairs - Use duplex_tools
filter_pairs
to check for similarity - Use the filtered pairs to basecall using Guppy in duplex mode and demultiplexing
For Dorado, a pipeline based on recommendations from Nanopore is used:
- Basecall with Dorado in simplex mode with demultiplexing enabled
- Extract a list of ReadIDs for each barcode
- Basecall using Dorado in duplex mode but constrain to the reads for each barcode seperately
Check out the usage instructions in the snakemake workflow catalog
But here is a rough overview:
- Install conda (mamba or miniconda is fine).
- Install snakemake with:
conda install -c conda-forge -c bioconda snakemake
- Install Guppy (see these instructions) and/or Dorado (see these instructions), depending on what you want to use
- Download the latest release from this repo and cd into it
- Edit the
config/config.yaml
to provide the paths to your results/logs directories, and the paths to Dorado and/or Guppy, as well as any parameters you might want to change. You can test the setup by using theconfig/runs_test.csv
sample sheet. - Edit the
config/runs.csv
file with the specific details for each run. Depending on what you enter here, the pipeline will automatically adjust what will be done. - Open a terminal in the main dir and start a dry-run of the pipeline with the following command. This will download and install all the dependencies for the pipeline (this step takes may take some time) and it will show you if you set up the paths correctly:
snakemake --use-conda -n --cores
- Run the pipeline with
snakemake --use-conda --cores
- basecall qc (pycoqc? fastp?)
- choose dorado duplex output (instead of fastq)
- somehow automate Dorado installation?
Copyright Richard Stöckl 2024.
Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE or copy at
https://www.boost.org/LICENSE_1_0.txt)