nf-core/kmermaid is a bioinformatics pipeline that performs comparative analysis of *omes using k-mer based methods. It supports various reference and sequencing input formats, and provides statistics files along with a MultiQC report as output. It provides pre-processing methods for reads and alignments.
In the outline below, every step except for the main analysis is optional and might be input-dependent.
Optional – BAM preprocessing
Extract BAM from 10X archive (
) -
Extract FASTQ reads (
) -
Split reads per cell (
) -
Count UMIs per cell (
) -
Download SRA experiment () [optional]
Optional – read preprocessing
k-mer analysis per method
Create sketch
Calculate distances
Present the results (
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow.Make sure to test your setup with -profile test
before running the workflow on actual data.
nextflow run nf-core/kmermaid --outdir s3://bucket/sub-bucket --samples samples.csv
nextflow run nf-core/kmermaid --outdir s3://olgabot-maca/nf-kmer-similarity/ \
--read_pairs 's3://bucket/sub-bucket/*{R1,R2}*.fastq.gz,s3://bucket/sub-bucket2/*{1,2}.fastq.gz'
nextflow run nf-core/kmermaid --outdir s3://bucket/sub-bucket --sra SRP016501
nextflow run nf-core/kmermaid --outdir s3://bucket/sub-bucket \
--fastas '*.fasta'
nextflow run nf-core/kmermaid --outdir s3://bucket/sub-bucket \
--bam 'possorted_genome_bam.bam'
nextflow run nf-core/kmermaid --outdir s3://bucket/sub-bucket --samples samples.csv --split_kmer --subsample 1000
nf-core/kmermaid was originally written by Olga Botvinnik. The DSL2 port is done by Igor Trujnara.
We thank the following people for their extensive assistance in the development of this pipeline:
If you would like to contribute to this pipeline, please see the contributing guidelines.
For further information or help, don't hesitate to get in touch on the Slack #kmermaid
channel (you can join with this invite).
An extensive list of references for the tools used by the pipeline can be found in the
You can cite the nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.