-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove MAX AF >= 1 filter from bcftools SNV and InDel filters in all workflows #1180
Comments
On raw VCFs generated from vardict and TNscope with the most recent version of balsamic 12.0.2 I have run bcftools the same way we do in the TGA and WGS workflow. For TGA: For WGS:
Conclusion from this testing is that it seems that the main purpose of this filter has been to remove germline-variants from tumor-only analyses with low to moderately low coverage (WGS and WES). For tumor only TGA the coverage is probably so high that purely by chance it is unlikely to have exactly 100% of let's say 1000X reads in a position to have the same base. It would be nice to see what happens with these variants with AF = 1 downtream in the filtering in the tumor only analyses to see if they are filtered out by the population and LoqusDB databases. I see two scenarios here:
In the original issue: #1166 I wrote that if the original idea with the filter was to remove variants with an illegal AF above 1 then the filter could be changed to I also tested changing the original filter to this, and confirmed that no variants with an AF of 1 was filtered out, I modified manually a VCF to have an AF above 1, at which point it was filtered out by this filter. So if we want to keep the filter for the purpose of filtering out variants with an illegal AF changing the filter in this way would work, however, I did not see any real variant in either Vardict or TNscope with an AF larger than 1. So I think the question should be focused on either saving or keep removing the variants with AF = 1. |
I'd also like to note that in the last NIQAS3 the clinically relevant variant was a SNV with roughly 0.996 AF, which means it was very close to being filtered out with this filter. |
Thanks @mathiasbio for great summary. Do you know anything about the NIQAS3 sample? For example, what was the tumour cell fraction in the sample? The reason I am asking is that I have really hard time to understand in what biological context a somatic variant could have VAF=1. It could of course be a measuring artefact (random sampling bias), but this should be very rare. A high VAF could also be explained by the presence of a CNA in the T sample for the region of the variant, but in this case the VAF should approach 1 (but never reach it). |
@vwirta I don't remember exactly! But I remember something about overlapping CNVs in the region, I could ask Fulya. I also think it's very rare that a somatic variant would have an AF of 1, but Kalle brought my attention to this problem to begin with based on some ILC (https://github.com/Clinical-Genomics/External-comparison/issues/22 I think) where we missed some 100% VAF. Probably happens very rarely, so we don't need to get too anxious, but it would be nice to be sure : ) |
I've asked Fulya regarding the NIQAS3 case. |
No, I don't. I was in contact earlier and then the results was about to be submitted. I'll contact them again. This slipped my mind. |
Also wrote this in the PR: #1338
Note that likely many variants would have been added in the T-only WGS cases if at the same time the T-only WGS specific filter I think we can conclude with these stats that it is safe to remove the MAX AF filter, and we can postpone extending this fix to the WGS Tumor only analysis for later. |
Thanks @mathiasbio for a great summary again! |
merged into #1320 🥳 |
Need
Background to this feature can be found here: #1166
In short the need is to remove this filter bcftools filter
--include FORMAT/AF[0] < 1 --soft-filter balsamic_af_one --mode +
which occurs in all workflows for filtering of SNVs and InDels, used here: in sentieon_quality_filter.ruleThis should be removed because it's a risk, especially in very pure and somewhat lower-coverage tumor-samples, that a true somatic variant may reach 1 AF and be filtered out.
Suggested approach
The suggested approach is simply to remove it from the rule. However, we want to make sure first that the filter is not terribly important for filtering out false positives so that we can if necessary make a more sophisticated solution.
Considered alternatives
Were there alternative approaches which have been rejected?
Requests/suggestions/bugs solved by the feature
Can be closed when
Link the issues needed to be closed for this to be implemented
Blockers
Anything preventing this from happening?
The text was updated successfully, but these errors were encountered: