nf-core/kmermaid is a bioinformatics pipeline that performs comparative analysis of *omes using k-mer based methods. It supports various reference and sequencing input formats, and provides statistics files along with a MultiQC report as output. It provides pre-processing methods for reads and alignments.
In the outline below, every step except for the main analysis is optional and might be input-dependent.
Optional – BAM preprocessing
-
Extract BAM from 10X archive (
tar
) -
Extract FASTQ reads (
samtools
) -
Split reads per cell (
grep
) -
Count UMIs per cell (
pbtk
) -
Download SRA experiment () [optional]
Optional – read preprocessing
k-mer analysis per method
-
Create sketch
-
Calculate distances
-
Present the results (
MultiQC
)
Note
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow.Make sure to test your setup with -profile test
before running the workflow on actual data.
nextflow run nf-core/kmermaid --outdir s3://bucket/sub-bucket --samples samples.csv
nextflow run nf-core/kmermaid --outdir s3://olgabot-maca/nf-kmer-similarity/ \
--read_pairs 's3://bucket/sub-bucket/*{R1,R2}*.fastq.gz,s3://bucket/sub-bucket2/*{1,2}.fastq.gz'
nextflow run nf-core/kmermaid --outdir s3://bucket/sub-bucket --sra SRP016501
nextflow run nf-core/kmermaid --outdir s3://bucket/sub-bucket \
--fastas '*.fasta'
nextflow run nf-core/kmermaid --outdir s3://bucket/sub-bucket \
--bam 'possorted_genome_bam.bam'
nextflow run nf-core/kmermaid --outdir s3://bucket/sub-bucket --samples samples.csv --split_kmer --subsample 1000
nf-core/kmermaid was originally written by Olga Botvinnik. The DSL2 port is done by Igor Trujnara.
We thank the following people for their extensive assistance in the development of this pipeline:
If you would like to contribute to this pipeline, please see the contributing guidelines.
For further information or help, don't hesitate to get in touch on the Slack #kmermaid
channel (you can join with this invite).
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md
file.
You can cite the nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.