Releases: kishwarshafin/pepper
PEPPER-Margin-DeepVariant r0.8 update
In this release:
- Separate SNP and INDEL calling with DeepVariant: SNP calling with
none
and INDEL calling withrows
. - Parameters to select manual SNP and INDEL models for DeepVariant.
- Parameter separation to handle candidates in repeat regions within PEPPER.
- Update and fix training documentation.
- Update to DeepVariant version 1.3.0.
PEPPER-Margin-DeepVariant r0.7 update
Version r0.7 update
- Detailed explanation of methods.
- Detailed performance evaluation on ONT and PacBio-HiFi data.
- Included training documentation for PEPPER-Margin-DeepVariant.
- Examples on how to tune parameters to balance speed and accuracy
- State-of-the-art results for all nanopore chemistry.
PEPPER-Margin-DeepVariant r0.6
Release 0.6 comes with these updates:
- At least 3x runtime acceleration on Oxford Nanopore and 2x acceleration on PacBio-HiFi variant calling pipeline.
- Support for R10.4 Q20 variant calling.
- Wide range of parameters available for tuning PEPPER-DeepVariant to user's mode of usage.
- Ability to provide customized models for PEPPER-DeepVariant.
Will be shortly added to this release:
- Full training documentation on how to train a model end-to-end.
- Documentation and explanation of the available parameters and their downstream effect in variant calling.
r0.5
Updates in Oxford Nanopore variant calling in V0.5:
Reduce search space of PEPPER by only predicting on sites with variants
Adding CNN layers on top of RNNs to improve predictions
Remove PEPPER HP from the pipeline
Support rows model for DeepVariant that uses alt alignment which significantly improves INDEL performance.
Support for Guppy 5.0.7 and high-accuracy mode of Guppy.
PEPPER v0.4 release for Zenodo
Archived release of v0.4. No updates.
PEPPER-Margin-DeepVariant release
This is the official release of PEPPER-Margin-DeepVariant. It supports the Nanopore and PacBio HiFi variant calling and assembly polishing pipelines.
Key highlights and improvements:
- Candidate finding with
PEPPER HP
is implemented in a manner that is best suited for DeepVariant's image generation. You will not see any called variant with allele frequency 0 as the candidate finding is now synonymous with DeepVariant's candidate finding with RNN predictions used to rank the candidates.PEPPER-DeepVariant
can be used as a standard small variant calling tool. PEPPER SNP
is improved and tuned to work withMargin
in a manner thatPEPPER-Margin
produces the best haplotyping results for Oxford nanopore and PacBio HiFi data.- Overall we see a 30x improvement in the total runtime of the pipeline and PEPPER itself is 20x faster compared to r0.1. We are expecting more runtime improvements in the future.
Discontinued features:
- PEPPER is not supported as a standalone assembly polishing tool. We believe the
PEPPER-Margin-DeepVariant
pipeline is much more sensitive to the structure of the assembly and provides a way to avoid over-polishing an assembly by filtering the VCF. This feature in itself is missing from PEPPER so we dropped haploid assembly polishing and overall assembly polishing with PEPPER alone.
PEPPER v0.1 release
PEPPER v0.1 release notes (haploid assembly polisher)
PEPPER
is a recurrent neural network-based haploid genome assembly polisher. This is the first release of the haploid assembly polishing component of PEPPER
. We tested PEPPER
's performance on several human genome samples, Zymo microbial community samples, and non-model organisms. The performance of PEPPER
suggests that we can achieve highly accurate genome assemblies using ONT reads only.
Installation
PEPPER
is available via pip
to install.
python3 -m pip install pepper-polish
# if you get permission error, then try:
python3 -m pip install --user pepper-polish
python3 -m pepper.pepper --help
python3 -m pepper.pepper polish --help
# Expected output: PEPPER VERSION: 0.1.1
Models
The model files are available here: https://github.com/kishwarshafin/pepper/tree/r0.1/models
MinION_r10_native_microbial.pkl : For R10.3 guppy 3.4.8 (Microbial)
MinION_r10_pcr_microbial.pkl : For R10.3 guppy 3.4.8 (Microbial)
PEPPER_polish_haploid_guppy360.pkl : Supports Guppy 3.0.5 to Guppy 4+ (Large genomes- trained to be sensitive to the heterozygosity of the genome, can be used in phase-aware polishing)
PromethION_r941_guppy305_HAC_human.pkl : Supports Guppy 3.0.5 to Guppy 4+ (Large genomes)
PromethION_r941_guppy305_HAC_microbial.pkl : Supports Guppy 3.0.5 to Guppy 4+ (Microbial)
Motivation
Assemblies generated using ONT data usually have low base-level quality and require further polishing. Existing polishers like Racon-Medaka
can improve the base-level quality of an assembly but performs poorly in transcriptome completeness. Previously, we introduced a new polisher suite, MarginPolish-HELEN
, with superior performance in transcriptome completeness and base-level accuracy. However, MarginPolish-HELEN
has runtime and cost overhead. To overcome the issue, we developed PEPPER
, where we use local realignment of reads to the assembly to produce highly accurate polished genome assemblies while being sensitive to the structural integrity of the assembly. PEPPER
can be paired with Shasta
, Flye
, Canu
or any other ONT based assemblers. The performance of PEPPER
as a standalone assembly polisher is superior to any other existing ONT assembly polisher including MarginPolish-HELEN
.
We participated in the HPRC assembly bakeoff where Shasta-PEPPER
HG002 assembly was able to achieve Q35 in assembly quality while having similar transcriptome completeness to that reported in the Shasta-MarginPolish-HELEN
paper.
Extension to variant calling
In collaboration with Google Health, we used a modified version of the haploid assembly polisher mode of PEPPER
and paired it with DeepVariant to achieve state-of-the-art performance in reference based small variant calling with ONT reads. Our effort has been recognized by the PrecisionFDA truth challenge V2 where PEPPER-DeepVariant
achieved top awards in ONT category. This work is still in development and future releases will include details about modules that we are developing to enable ONT-based variant calling.
Collaboration with Darwin tree of life project and other projects.
The Darwin Tree of Life project plans to sequence and assemble all known species of animals, plants, fungi and protists in Britain and Ireland. The project picked Shasta
to generate de novo ONT assemblies efficiently and after evaluating multiple existing assembly polishers, the tree of life project picked PEPPER
to polish the assemblies. We are collaborating with Ksenia Krasheninnikova from the Wellcome Sanger Institute, who is actively evaluating PEPPER
on non-model vertebrate genomes and helping us to improve our methods.
We are also collaborating with several other groups to use PEPPER
to polish ONT based genome assemblies. We have applied PEPPER
to polish tomato genomes, non-human vertebrate genomes, highly heterozygous plant genomes and microbial genomes. In all cases, we saw better performance than existing polishing tools when it comes to structural integrity of the genome assembly and base-level quality.
Future direction
PEPPER
builds a foundation upon which we plan to develop a set of next-generation genome inference tools for ONT reads. In collaboration with Google Health, we were able to use PEPPER
as a primary candidate finder that enabled DeepVariant
to identify variants from ONT reads accurately. We plan to keep improving the variant-calling pipeline. Moreover, Shasta is now producing haplotype-resolved genome assemblies, and we plan to deploy a diploid assembly polishing pipeline soon.