-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HGVS notation for dup in 109 becomes ins in 110 #1633
Comments
Hi @barbarian1803 , Kind regards, |
I have encountered the same problem and hope it can be updated as soon as possible. |
Hi is this still an issue in v112? thank you! |
This issue is resolved, but I have discovered a new problem. The CDS coordinates for some genes are incorrect. For example, the mutation SRGAP2:NM_015326.5, c.85A>T(p.T29S) has been annotated as c.994A>T(p.T332S). I suspect it is a database issue. |
@GSYongWu - I think it's a RefSeq problem not VEP. What build are you using? if GRCh37 then looking at the RefSeq GFF, entry for NM_015326.5 - something is a bit strange - the cDNA match starts at 910 not 1 Maybe you could try using GRCh38 rather than 37 - and if the problem goes away that shows it's a RefSeq problem and they can close this issue as fixed |
@davmlaw Is it possible that the GFF file used by VEP this time is incorrect? causing the coordinates for certain gene annotations to be inaccurate? This part has always been correct in the older version of VEP. |
Yes. Of course it can be wrong! The GFF is produced by getting sequences reported by labs around the world over many many years then aligning them using automated tools (algorithms built on our understanding of biology) against a pretty arbitrary reference sequence. Something can go wrong at every single step of that process, or arbitrary decisions made you can't know which is right and it is done at massive scale. A quick glance at the differences between refseq and Ensembl transcripts (which are trying to do pretty much the same thing) shows you the scale of how imperfect it is. It's super useful and valuable, though! Not to knock either teams The transcript sequences differ per version and the alignments for a given sequence can differ for a build When working this out it helps to explicitly list the genome builds and in your examples the transcript versions for your expected results (eg NM_015326.4 is length 6781, NM_015326.5 is length 6884) It is also better to raise a new issue for a new problem than add it to an existing unrelated issue raised by someone else, that is now fixed (as this makes it hard for the hardworkong VEP people to manage their project and keep track of issues) |
Hi @GSYongWu, |
Describe the issue
For below variant:
#CHROM POS ID REF ALT QUAL FILTER INFO
chr21 5233678 . A AATTT . . .
In VEP 109.3, this variant has HGVS notation: ENST00000623753.1:n.132-758_132-755dup
In VEP 110.1/111.0, this variant has notation: ENST00000623753.1:n.132-755_132-754insAAAT
I notice a lot of similar variant that used to be dup becomes ins in VEP 110 and 111.
The correct notation would be the dup.
Another example
#CHROM POS ID REF ALT QUAL FILTER INFO
chr21 13933439 . C CT . . .
It used to be : ENST00000451663.5:n.2429+398dup
now becomes: ENST00000451663.5:n.2429+398_2429+399insA
Additional information
Run via docker for version 110.1 and VEP web for latest version v111.
System
The text was updated successfully, but these errors were encountered: