You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A "GENEINFO" entry can contain multiple SYMBOL (ID's):
##INFO=<ID=GENEINFO,Number=1,Type=String,Description="Gene(s) for the variant reported as gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|)">
Example:
1 1474871 1295591 G C . . ALLELEID=1285386;CLNDISDB=MedGen:CN517202;CLNDN=not_provided;CLNHGVS=NC_000001.10:g.1474871G>C;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Benign;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=TMEM240:339453|LOC121967044:121967044;MC=SO:0001627|intron_variant;ORIGIN=1
In this case, "LOC121967044" is discarded, which could lead to mapping problems in process-vep.
System information
OS: Not applicable
Version: 5.0.0.dev0
Python version: Not applicable
Shell: Not applicable
How to Reproduce
Steps to reproduce the behavior:
Run train-data-creator with a VCF containing a sample with multiple GENEINFO entries.
Run VEP.
Convert VEP output VCF to TSV.
Run process-vep and see that only 1 of the entries has been mapped.
Expected behavior
Currently, process-vep maps 1 to 1 from the initially supplied SYMBOL to the VEP output SYMBOL. This needs to be changed so that it maps back 1 to many (1 being the VEP output SYMBOL, many being the "ID" column SYMBOLs)
Logs
If available, the generated logging information and/or error message (can also be attached as a file if very large).
Screenshots
If applicable, add screenshots to help explain your problem.
Describe the bug
A "GENEINFO" entry can contain multiple SYMBOL (ID's):
##INFO=<ID=GENEINFO,Number=1,Type=String,Description="Gene(s) for the variant reported as gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|)">
Example:
1 1474871 1295591 G C . . ALLELEID=1285386;CLNDISDB=MedGen:CN517202;CLNDN=not_provided;CLNHGVS=NC_000001.10:g.1474871G>C;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Benign;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=TMEM240:339453|LOC121967044:121967044;MC=SO:0001627|intron_variant;ORIGIN=1
In this case, "LOC121967044" is discarded, which could lead to mapping problems in
process-vep
.System information
How to Reproduce
Steps to reproduce the behavior:
train-data-creator
with a VCF containing a sample with multiple GENEINFO entries.process-vep
and see that only 1 of the entries has been mapped.Expected behavior
Currently,
process-vep
maps 1 to 1 from the initially supplied SYMBOL to the VEP output SYMBOL. This needs to be changed so that it maps back 1 to many (1 being the VEP output SYMBOL, many being the "ID" column SYMBOLs)Logs
If available, the generated logging information and/or error message (can also be attached as a file if very large).
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
#51 (comment)
The text was updated successfully, but these errors were encountered: