Notes on v0.1-r11

We focused on speedup in v0.1-r11. We tried a few techniques and listed those that worked as follows.

C implementation for pileup and full-alignment feature generation. Before r11, feature generation (tensor creation) in Clair3 was sped up using pypy on python code. The speedup was ~10x over native python. The practice balanced speed and ease of coding in the developmental stage of Clair3. In r11, we added C implementation, bringing another ~2-3 times speedup over pypy. The C code is integrated with the other python parts using CFFI (C Foreign Function Interface). The variants called with the new C implementation are identical to the previous version. Thanks to co-contributors @cjw85, @ftostevin-ont, and @EpiSlim.
Use longphase for phasing. longphase by Lin et al. is an ultra-fast chromosome-scale phasing algorithm for small and large variants. In our experiments, longphase took ~3 minutes to phase 69x Q20 ONT WGS with 24 CPU cores and no I/O bound, faster than whatshap that took 52 minutes. To enable using longphase for phasing, please use the --longphase_for_phasing option. Our suggestions on when to enable longphase are shown in the section below.
Haplotagging on the fly. Whatshap haplotag was used to add an HP tag to each read after phasing. This process writes out a new BAM, which is I/O intensive and in fact, unnecessary. In r11, we implemented haplotagging to feed tagged read directly to full-alignment calling. We used the exact logic that was implemented in whatshap's haplotag module. This technique, no matter whatshap or longphase was used, saves more than 10-20 minutes on compressing, writing and reading a new BAM.

We benchmarked r11 against r10 with 69x Q20 ONT HG002 data. 24 CPU cores with minimal I/O speed limit were used. The results are as follows. With C implementation and longphase enabled, the total runtime reduced from 234 to 101 minutes.

Implementation	Sample	CPU cores	Inference hardware	Total runtime	Pileup runtime	Phasing runtime	Full-alignment runtime
c_impl, longphase	HG002 WGS Q20 69x	24	CPU	101m	38m	3m	56m
v0.1-r10, whatshap	HG002 WGS Q20 69x	24	CPU	234m	57m	52m	118m

When to use `longphase` (to replace `whatshap`)

longphase is not enabled by default. We suggest enabling longphase through the --longphase_for_phasing option when calling variants in human with ≥20x of data. Use whatshap with non-human samples or insufficient depth.

Benchmarks between using longphase and whatshap on HG003 WGS ONT Guppy5 with five depths from 10x to 50x are as follows.

Phasing algorithm	Depth	SNP-Precision	SNP-Recall	SNP-F1	Indel-Precision	Indel-Recall	Indel-F1
longphase	10x	96.75%	93.94%	95.32%	82.86%	47.30%	60.22%
whatshap	10x	95.87%	96.64%	96.26%	83.37%	47.50%	60.52%
longphase	20x	99.22%	99.27%	99.25%	88.49%	62.22%	73.07%
whatshap	20x	99.21%	99.36%	99.28%	88.75%	60.47%	71.93%
longphase	30x	99.50%	99.60%	99.55%	90.63%	68.39%	77.96%
whatshap	30x	99.50%	99.61%	99.56%	90.61%	66.52%	76.72%
longphase	40x	99.59%	99.67%	99.63%	91.69%	72.34%	80.87%
whatshap	40x	99.60%	99.70%	99.65%	91.71%	72.39%	80.91%
longphase	50x	99.63%	99.70%	99.66%	92.17%	75.29%	82.88%
whatshap	50x	99.62%	99.70%	99.66%	91.59%	73.66%	81.65%

Use the old python-based feature generation code (to disable the new C implementation)

The new C implementation generates results identical to the previous version. However, we retained the old python-based feature generation code for benchmarking or back-compatibility purposes. Users can use it through the --disable_c_impl option.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1_r11_speedup.md

v0.1_r11_speedup.md

Notes on v0.1-r11

When to use `longphase` (to replace `whatshap`)

Use the old python-based feature generation code (to disable the new C implementation)

Files

v0.1_r11_speedup.md

Latest commit

History

v0.1_r11_speedup.md

File metadata and controls

Notes on v0.1-r11

When to use longphase (to replace whatshap)

Use the old python-based feature generation code (to disable the new C implementation)

When to use `longphase` (to replace `whatshap`)