-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
is_indel treats "<NON_REF>" ALT as an indel #217
Comments
ouch. it's been a long time since I looked at the |
I pushed a fix for this exact case and made the is_indel handling more general. But yeah, likely other problems persist. Will fix as I see them. |
I'm rather a beginner with variant calling, so there may be things that I don't understand well, but I noticed while using cyvcf2 to parse some freebayes + snpEff generated .vcf that there were variants that had at the same time The vcf line (excluding individual sample's info) looks as follows:
This record, explored from within an ipython session:
I also notice a So I'm lost: Should this be considered an indel, a snp or an mnp? The following line of code in property https://github.com/brentp/cyvcf2/blob/main/cyvcf2/cyvcf2.pyx#L1928
This would make any mnp an indel also. Intuitively, this doesn't sound correct to me. |
The GVCF format uses
<NON_REF>
as a placeholder for a possible non-reference allele in the ALT column. Because cyvcf2 compares REF to ALT length to determine if a variant is an indel these ALT entries are incorrectly classified as indels.Given that this labeling is not defined in the VCF standard (AFAIK), and very much a part of the GATK HaplotypeCaller way of doing things, this may not be a bug necessarily, but I wanted to document this for others stumped by this behavior.
The text was updated successfully, but these errors were encountered: