Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 3.17 KB

verdict.md

File metadata and controls

7 lines (4 loc) · 3.17 KB

About the Verdict module

ClairS can distinguish somatic and germline variants not only through the use of a matched normal sample, but also by modeling a variant’s allele frequency, taking into account the tumor purity, tumor ploidy, and copy number of the variant. A module named Verdict in ClairS is applicable to the neural network called variants, and tags them as either a germline, somatic, or subclonal somatic variant. Verdict’s idea and algorithm are similar to and improved from SGZ [A computational approach to distinguish somatic vs. germline origin of genomic alterations from deep sequencing of cancer specimens without a matched normal, Sun et al., 2018]. Verdict has three steps: 1) find copy number segments, 2) estimate tumor purity and ploidy, and 3) binomial tests.

SGZ suggested using ASCAT [Allele-specific copy number analysis of tumors, Van Loo et al., 2010] to estimate tumor purity, ploidy, and the copy number profile of each variant. We rewrote ASCAT in Python from R so that it could be integrated into Verdict and run reasonably fast. ASCAT uses the LogR (log ratios, representing log-transformed copy numbers derived from sequencing depth) and BAF (B allele frequencies, describing the allelic imbalance of variants) of germline heterozygous variants as input. LogR is calculated by comparing the normalized read coverage of the tumor and normal samples. BAF is calculated by dividing the signal intensities of minor alleles by those of major and minor alleles. ASCAT uses the LogR and BAF distributions to segment the genome into multiple regions with constant copy number states, identifying breakpoints based on LogR and BAF value changes. Next, ASCAT estimates tumor purity and ploidy by evaluating the goodness of fit for a grid of possible values for tumor purity and ploidy. Using the fact that true copy numbers are nonnegative whole numbers, ASCAT seeks values for tumor purity and ploidy such that the copy number estimates are as close as possible to nonnegative whole numbers for germline heterozygous variants. Finally, the allele frequency of each variant is used as input to two binomial tests that decide whether a variant is more likely to be germline, somatic, or subclonal somatic. The two binomial tests are calculated as follows using the tumor purity, tumor ploidy, and copy number as inputs. The p-value of the somatic hypothesis: p-somatic = Binomial (n * f, n, AFsomatic), where n is the read depth, f is the allele frequency, AFsomatic is the expected allele frequency of the variant being somatic, calculated as p * V / (p * C + 2 * (1-p)), where p is the tumor purity, C is the copy number, and V is the variant allele count in the tumor. The p-value of the germline hypothesis: p-germline = Binomial (n * f, n, AFgermline), where AFgermline is the expected allele frequency of the variant being a germline, calculated as (p * V + (1-p))/ (p * C + 2 * (1-p)). Verdict tags a variant as somatic if p-somatic is greater than 0.001 and p-germline is lower than 0.001. A variant is tagged as subclonal somatic if the p-somatic and p-germline are lower than 0.001, tumor purity is greater than 0.2, and f < AFsomatic / gamma, where gamma is a tunable parameter with a default of 1.5.