Skip to content

Latest commit

 

History

History
41 lines (29 loc) · 4.75 KB

v0.1_r11_speedup.md

File metadata and controls

41 lines (29 loc) · 4.75 KB

Notes on v0.1-r11

We focused on speedup in v0.1-r11. We tried a few techniques and listed those that worked as follows.

  1. C implementation for pileup and full-alignment feature generation. Before r11, feature generation (tensor creation) in Clair3 was sped up using pypy on python code. The speedup was ~10x over native python. The practice balanced speed and ease of coding in the developmental stage of Clair3. In r11, we added C implementation, bringing another ~2-3 times speedup over pypy. The C code is integrated with the other python parts using CFFI (C Foreign Function Interface). The variants called with the new C implementation are identical to the previous version. Thanks to co-contributors @cjw85, @ftostevin-ont, and @EpiSlim.
  2. Use longphase for phasing. longphase by Lin et al. is an ultra-fast chromosome-scale phasing algorithm for small and large variants. In our experiments, longphase took ~3 minutes to phase 69x Q20 ONT WGS with 24 CPU cores and no I/O bound, faster than whatshap that took 52 minutes. To enable using longphase for phasing, please use the --longphase_for_phasing option. Our suggestions on when to enable longphase are shown in the section below.
  3. Haplotagging on the fly. Whatshap haplotag was used to add an HP tag to each read after phasing. This process writes out a new BAM, which is I/O intensive and in fact, unnecessary. In r11, we implemented haplotagging to feed tagged read directly to full-alignment calling. We used the exact logic that was implemented in whatshap's haplotag module. This technique, no matter whatshap or longphase was used, saves more than 10-20 minutes on compressing, writing and reading a new BAM.

We benchmarked r11 against r10 with 69x Q20 ONT HG002 data. 24 CPU cores with minimal I/O speed limit were used. The results are as follows. With C implementation and longphase enabled, the total runtime reduced from 234 to 101 minutes.

Implementation Sample CPU cores Inference hardware Total runtime Pileup runtime Phasing runtime Full-alignment runtime
c_impl, longphase HG002 WGS Q20 69x 24 CPU 101m 38m 3m 56m
v0.1-r10, whatshap HG002 WGS Q20 69x 24 CPU 234m 57m 52m 118m

When to use longphase (to replace whatshap)

longphase is not enabled by default. We suggest enabling longphase through the --longphase_for_phasing option when calling variants in human with ≥20x of data. Use whatshap with non-human samples or insufficient depth.

Benchmarks between using longphase and whatshap on HG003 WGS ONT Guppy5 with five depths from 10x to 50x are as follows.

Phasing algorithm Depth SNP-Precision SNP-Recall SNP-F1 Indel-Precision Indel-Recall Indel-F1
longphase 10x 96.75% 93.94% 95.32% 82.86% 47.30% 60.22%
whatshap 10x 95.87% 96.64% 96.26% 83.37% 47.50% 60.52%
longphase 20x 99.22% 99.27% 99.25% 88.49% 62.22% 73.07%
whatshap 20x 99.21% 99.36% 99.28% 88.75% 60.47% 71.93%
longphase 30x 99.50% 99.60% 99.55% 90.63% 68.39% 77.96%
whatshap 30x 99.50% 99.61% 99.56% 90.61% 66.52% 76.72%
longphase 40x 99.59% 99.67% 99.63% 91.69% 72.34% 80.87%
whatshap 40x 99.60% 99.70% 99.65% 91.71% 72.39% 80.91%
longphase 50x 99.63% 99.70% 99.66% 92.17% 75.29% 82.88%
whatshap 50x 99.62% 99.70% 99.66% 91.59% 73.66% 81.65%

Use the old python-based feature generation code (to disable the new C implementation)

The new C implementation generates results identical to the previous version. However, we retained the old python-based feature generation code for benchmarking or back-compatibility purposes. Users can use it through the --disable_c_impl option.