Releases: HKU-BAL/ClairS
Releases · HKU-BAL/ClairS
v0.4.0
This version is a major update. The new features and benchmarks are explained in a technical note titled “Improving the performance of ClairS and ClairS-TO with new real cancer cell-line datasets and PoN”. A summary of changes:
- Starting from this version, ClairS will provide two model types. ssrs is a model trained initially with synthetic samples and then real samples augmented (e.g., ont_r10_dorado_sup_5khz_ssrs), ss is a model trained from synthetic samples (e.g., ont_r10_dorado_sup_5khz_ss). The ssrs model provides better performance and fits most usage scenarios. ss model can be used when missing a cancer-type in model training is a concern. In v0.4.0, four real cancer cell-line datasets (HCC1937/BL, HCC1954/BL, H1437/BL, and H2009/BL) covering two cancer types (breast cancer, lung cancer) published by Park et al. were used for ssrs model training.
- Added BQ jittering in model training to address the BQ distribution difference between the training and calling datasets that leads to performance drop.
- Added the --indel_min_af option and adjusted the default minimum allelic fraction requirement to 0.1 for Indels in ONT platform.
v0.3.1
- Added four options i.
--use_heterozygous_snp_in_tumor_sample_and_normal_bam_for_intermediate_phasing
, ii.--use_heterozygous_snp_in_normal_sample_and_normal_bam_for_intermediate_phasing
, iii.--use_heterozygous_snp_in_tumor_sample_and_tumor_bam_for_intermediate_phasing
, and iv.--use_heterozygous_snp_in_normal_sample_and_tumor_bam_for_intermediate_phasing
. iii is equivalent to--use_heterozygous_snp_in_tumor_sample_for_intermediate_phasing
added in v0.2.0. iv is equivalent to--use_heterozygous_snp_in_normal_sample_for_intermediate_phasing
added in v0.2.0. Use normal bam for intermediate phasing was a request from @Sergey Aganezov. When the coverage of normal and tumor are similar, using normal bam for intermediate phasing has negligible difference from using tumor bam in our experiments using HCC1395/BL. - Added
--haplotagged_tumor_bam_provided_so_skip_intermediate_phasing_and_haplotagging
to use the haplotype information provided in the tumor bam directly and skip intermediate phasing and haplotagging. This option is useful when using ClairS in a pipeline in which the phasing of the tumor bam is done before running ClairS. BAM haplotagged by WhatsHap and LongPhase are accepted. - Bumped up Clair3 dependency to version 1.0.10, LongPhase to version 1.7.3.
v0.3.0
- Added a module called “verdict” (Option --enable_verdict) to statistically classify a called variant into either a germline, somatic, or subclonal somatic variant based on the CNV profile and tumor purity estimation.
- Improved model training speed, reduced model training time cost by about three times.
v0.2.0
- Added --use_heterozygous_snp_in_normal_sample_for_intermediate_phasing/--use_heterozygous_snp_in_tumor_sample_for_intermediate_phasing option to support using either heterozygous SNPs in the normal sample or tumor sample for intermediate phasing. The previous versions used in_tumor_sample for phasing. In this new version, when testing with ONT 4kkz HCC1395/BL and using in_normal_sample for intermediate phasing, the SNV precision improved ~2%, while recall remained unchanged. in_normal_sample becomes the default from this version. However, if the coverage of normal sample is low, please consider switching back to using in_tumor_sample (#22, idea contributed by the longphase team @sloth-eat-pudding).
- Added --use_heterozygous_indel_for_intermediate_phasing to include high quality heterozygous Indels for intermediate phasing. With this new option, the haplotagged tumor reads increased by ~3% in ONT 4khz HCC1395/BL, the option becomes default from this version.
- Added a model that might provide a slightly better performance for liquid tumor. In this release, only ONT Dorado 5khz HAC for liquid tumor (-p ont_r10_dorado_hac_5khz_liquid) is provided. The model was trained with slightly higher normal contamination. We are testing out the new model with collaborator.
- Added --use_longphase_for_intermediate_haplotagging option to replace WhatsHap haplotagging by LongPhase haplotagging to speed up read haplotagging process, the option becomes default from this version.
- Bumped up Clair3 dependency to version 1.0.7, LongPhase to version 1.7.
v0.1.7
v0.1.6
- Fixed an output bug that caused no VCF output if no Indel candidate was found (contributor @Khi Pin).
- Fixed showing incorrect reference allele depth at a deletion region.
- Added PacBio HiFi quick demo.
v0.1.5
- Updated SNV calling using ONT Dorado 4kHz data with a new model trained using multiple-sample pairs (HG003/4);
- Updated SNV calling using ONT Dorado 5kHz data with a new model trained using multiple-sample pairs (HG001/HG002, HG003/4);
- Support somatic indel calling using ONT Dorado 4kHz data.
- Support somatic indel calling using ONT Dorado 5kHz data.
v0.1.4
- Added reference depth in the AD tag.
- Added HiFi Sequel II Indel model.
v0.1.3
v0.1.2
Added HiFi Revio model, renamed HiFi Sequel II model from hifi to hifi_sequel2.