We focused on speedup in v0.1-r11
. We tried a few techniques and listed those that worked as follows.
- C implementation for pileup and full-alignment feature generation. Before r11, feature generation (tensor creation) in Clair3 was sped up using pypy on python code. The speedup was ~10x over native python. The practice balanced speed and ease of coding in the developmental stage of Clair3. In r11, we added C implementation, bringing another ~2-3 times speedup over pypy. The C code is integrated with the other python parts using CFFI (C Foreign Function Interface). The variants called with the new C implementation are identical to the previous version. Thanks to co-contributors @cjw85, @ftostevin-ont, and @EpiSlim.
- Use longphase for phasing. longphase by Lin et al. is an ultra-fast chromosome-scale phasing algorithm for small and large variants. In our experiments, longphase took ~3 minutes to phase 69x Q20 ONT WGS with 24 CPU cores and no I/O bound, faster than
whatshap
that took 52 minutes. To enable using longphase for phasing, please use the--longphase_for_phasing
option. Our suggestions on when to enable longphase are shown in the section below. - Haplotagging on the fly. Whatshap
haplotag
was used to add anHP
tag to each read after phasing. This process writes out a new BAM, which is I/O intensive and in fact, unnecessary. In r11, we implemented haplotagging to feed tagged read directly to full-alignment calling. We used the exact logic that was implemented in whatshap's haplotag module. This technique, no matter whatshap or longphase was used, saves more than 10-20 minutes on compressing, writing and reading a new BAM.
We benchmarked r11 against r10 with 69x Q20 ONT HG002 data. 24 CPU cores with minimal I/O speed limit were used. The results are as follows. With C implementation and longphase enabled, the total runtime reduced from 234 to 101 minutes.
Implementation | Sample | CPU cores | Inference hardware | Total runtime | Pileup runtime | Phasing runtime | Full-alignment runtime |
---|---|---|---|---|---|---|---|
c_impl, longphase | HG002 WGS Q20 69x | 24 | CPU | 101m | 38m | 3m | 56m |
v0.1-r10, whatshap | HG002 WGS Q20 69x | 24 | CPU | 234m | 57m | 52m | 118m |
longphase
is not enabled by default. We suggest enabling longphase
through the --longphase_for_phasing
option when calling variants in human with ≥20x of data. Use whatshap
with non-human samples or insufficient depth.
Benchmarks between using longphase and whatshap on HG003 WGS ONT Guppy5 with five depths from 10x to 50x are as follows.
Phasing algorithm | Depth | SNP-Precision | SNP-Recall | SNP-F1 | Indel-Precision | Indel-Recall | Indel-F1 |
---|---|---|---|---|---|---|---|
longphase | 10x | 96.75% | 93.94% | 95.32% | 82.86% | 47.30% | 60.22% |
whatshap | 10x | 95.87% | 96.64% | 96.26% | 83.37% | 47.50% | 60.52% |
longphase | 20x | 99.22% | 99.27% | 99.25% | 88.49% | 62.22% | 73.07% |
whatshap | 20x | 99.21% | 99.36% | 99.28% | 88.75% | 60.47% | 71.93% |
longphase | 30x | 99.50% | 99.60% | 99.55% | 90.63% | 68.39% | 77.96% |
whatshap | 30x | 99.50% | 99.61% | 99.56% | 90.61% | 66.52% | 76.72% |
longphase | 40x | 99.59% | 99.67% | 99.63% | 91.69% | 72.34% | 80.87% |
whatshap | 40x | 99.60% | 99.70% | 99.65% | 91.71% | 72.39% | 80.91% |
longphase | 50x | 99.63% | 99.70% | 99.66% | 92.17% | 75.29% | 82.88% |
whatshap | 50x | 99.62% | 99.70% | 99.66% | 91.59% | 73.66% | 81.65% |
The new C implementation generates results identical to the previous version. However, we retained the old python-based feature generation code for benchmarking or back-compatibility purposes. Users can use it through the --disable_c_impl
option.