Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is_indel treats "<NON_REF>" ALT as an indel #217

Open
vsbuffalo opened this issue Jul 27, 2021 · 3 comments
Open

is_indel treats "<NON_REF>" ALT as an indel #217

vsbuffalo opened this issue Jul 27, 2021 · 3 comments

Comments

@vsbuffalo
Copy link

The GVCF format uses <NON_REF> as a placeholder for a possible non-reference allele in the ALT column. Because cyvcf2 compares REF to ALT length to determine if a variant is an indel these ALT entries are incorrectly classified as indels.

Given that this labeling is not defined in the VCF standard (AFAIK), and very much a part of the GATK HaplotypeCaller way of doing things, this may not be a bug necessarily, but I wanted to document this for others stumped by this behavior.

@brentp
Copy link
Owner

brentp commented Jul 27, 2021

ouch. it's been a long time since I looked at the is_* functions/properties. I'll see if I can et a fix for this case. Likely there are similar oversights in the other is_ functions.

brentp added a commit that referenced this issue Jul 27, 2021
@brentp
Copy link
Owner

brentp commented Jul 27, 2021

I pushed a fix for this exact case and made the is_indel handling more general. But yeah, likely other problems persist. Will fix as I see them.
thanks for reporting.

@blaiseli
Copy link

blaiseli commented Mar 6, 2024

I'm rather a beginner with variant calling, so there may be things that I don't understand well, but I noticed while using cyvcf2 to parse some freebayes + snpEff generated .vcf that there were variants that had at the same time is_mnp and is_indel set to True.

The vcf line (excluding individual sample's info) looks as follows:

KV38_Basile    94882   .       AC      AA      0.0     .       AB=0;ABP=0;AC=0;AF=0;AN=156;AO=1824;CIGAR=1M1X;DP=35043;D
PB=35196.5;DPRA=0;EPP=38.2301;EPPR=226.923;GTI=0;LEN=1;MEANALT=1.65385;MQM=44.6255;MQMR=46.4073;NS=78;NUMALT=1;ODDS=270.0
26;PAIRED=1;PAIREDR=1;PAO=155;PQA=5691;PQR=5580;PRO=152;QA=27536;QR=1112793;RO=33155;RPL=1004;RPP=43.3159;RPPR=19.8456;RP
R=820;RUN=1;SAF=12;SAP=3860.23;SAR=1812;SRF=17251;SRP=121.844;SRR=15904;TYPE=snp;technology.Illumina=1;ANN=AA|intergenic_
region|MODIFIER|FNLLGLLA_00047-FNLLGLLA_00051|FNLLGLLA_00047-FNLLGLLA_00051|intergenic_region|FNLLGLLA_00047-FNLLGLLA_000
51|||n.94883C>A||||||      GT:DP:AD:RO:QR:AO:QA:GL 0/0:524:474,46:474:15578:46:771:0,-87.3916,-1302.1

This record, explored from within an ipython session:

In [43]: rec.is_snp
Out[43]: False
In [44]: rec.is_mnp
Out[44]: True
In [45]: rec.REF
Out[45]: 'AC'
In [46]: rec.ALT
Out[46]: ['AA']
In [47]: rec.is_indel
Out[47]: True
In [48]: rec.POS
Out[48]: 94882

I also notice a TYPE=snp item in the vcf line above.

So I'm lost: Should this be considered an indel, a snp or an mnp?

The following line of code in property is_indel suggests me that any variant that is not is_sv and has a REF longer than 1 will be is_indel == True:

https://github.com/brentp/cyvcf2/blob/main/cyvcf2/cyvcf2.pyx#L1928

if len(self.REF) > 1: return True

This would make any mnp an indel also. Intuitively, this doesn't sound correct to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants