Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using one of the two annotation sources VCF is used when same annotation source name is used #1651

Open
Aisha-D opened this issue Apr 5, 2024 · 5 comments
Assignees

Comments

@Aisha-D
Copy link

Aisha-D commented Apr 5, 2024

Describe the issue

If two custom annotations with the same name are provided, vep will annotate the VCF using first custom annotation and ignore the second annotation source. Expectation was an error to be raised (to require clarification to differentiate the two annotation sources).

Example of command and VCF header output:

docker run -v /home/dnanexus:/opt/vep/.vep ensemblorg/ensembl-vep:release_103.1 ./vep 
-i /opt/vep/.vep/128970875-24085Q0017-24NGSHO13-8128-U-96527893_markdup_recalibrated_tnhaplotyper2_split.vcf 
-o /opt/vep/.vep/128970875-24085Q0017-24NGSHO13-8128-U-96527893_markdup_recalibrated_tnhaplotyper2_split_filevep.vcf 
--vcf --cache --refseq --exclude_predicted --symbol --hgvs --af_gnomad --check_existing --variant_class --numbers --offline 
--custom /opt/vep/.vep/clinvar_20240317_hg38_withchr.vcf.gz,ClinVar,vcf,exact,0,CLNSIG,CLNREVSTAT,CLNDN 
--custom /opt/vep/.vep/haemonc_1706_samples.vcf.gz,Prev,vcf,exact,0,AC,NS 
--custom /opt/vep/.vep/CosmicCodingMuts_GRCh38_v99.normal.vcf.gz,COSMIC,vcf,exact,0,ID 
--custom /opt/vep/.vep/CosmicNonCodingVariants_GRCh38_v99.normal.vcf.gz,COSMIC,vcf,exact,0,ID 
--plugin CADD,/opt/vep/.vep/./whole_genome_SNVs.tsv.gz,/opt/vep/.vep/./gnomad.genomes.r3.0.indel.tsv.gz --fields SYMBOL,VARIANT_CLASS,Consequence,EXON,HGVSc,HGVSp,gnomAD_AF,CADD_PHRED,Existing_variation,ClinVar,ClinVar_CLNDN,ClinVar_CLNSIG,COSMIC,Prev_AC,Prev_NS,Feature
##VEP="v103" time="2024-04-03 17:49:22" cache="/opt/vep/.vep/homo_sapiens_refseq/103_GRCh38" ensembl=103.4c8d44a ensembl-variation=103.06320c4 ensembl-io=103.353f93a ensembl-funcgen=103.b53bef4 1000genomes="phase3" COSMIC="92" ClinVar="202008" ESP="V2-SSA137" HGMD-PUBLIC="20194" assembly="GRCh38.p13" dbSNP="154" gencode="GENCODE 37" genebuild="2014-07" gnomAD="r2.1" polyphen="2.2.2" refseq="2020-09-29 12:45:25 - GCF_000001405.39_GRCh38.p13_genomic.gff" regbuild="1.0" sift="sift5.2.2"
##CADD_PHRED=PHRED-like scaled CADD score
##CADD_RAW=Raw CADD score
##INFO=<ID=ClinVar,Number=.,Type=String,Description="/opt/vep/.vep/clinvar_20240317_hg38_withchr.vcf.gz (exact)">
##INFO=<ID=ClinVar_CLNSIG,Number=.,Type=String,Description="CLNSIG field from /opt/vep/.vep/clinvar_20240317_hg38_withchr.vcf.gz">
##INFO=<ID=ClinVar_CLNREVSTAT,Number=.,Type=String,Description="CLNREVSTAT field from /opt/vep/.vep/clinvar_20240317_hg38_withchr.vcf.gz">
##INFO=<ID=ClinVar_CLNDN,Number=.,Type=String,Description="CLNDN field from /opt/vep/.vep/clinvar_20240317_hg38_withchr.vcf.gz">
##INFO=<ID=Prev,Number=.,Type=String,Description="/opt/vep/.vep/haemonc_1706_samples.vcf.gz (exact)">
##INFO=<ID=Prev_AC,Number=.,Type=String,Description="AC field from /opt/vep/.vep/haemonc_1706_samples.vcf.gz">
##INFO=<ID=Prev_NS,Number=.,Type=String,Description="NS field from /opt/vep/.vep/haemonc_1706_samples.vcf.gz">
##INFO=<ID=COSMIC,Number=.,Type=String,Description="/opt/vep/.vep/CosmicCodingMuts_GRCh38_v99.normal.vcf.gz (exact)">
##INFO=<ID=COSMIC_ID,Number=.,Type=String,Description="ID field from /opt/vep/.vep/CosmicCodingMuts_GRCh38_v99.normal.vcf.gz">
##INFO=<ID=SYMBOL,Number=.,Type=String,Description="The SYMBOL field from INFO/CSQ">
##INFO=<ID=VARIANT_CLASS,Number=.,Type=String,Description="The VARIANT_CLASS field from INFO/CSQ">
##INFO=<ID=Consequence,Number=.,Type=String,Description="The Consequence field from INFO/CSQ">
##INFO=<ID=EXON,Number=.,Type=String,Description="The EXON field from INFO/CSQ">
##INFO=<ID=HGVSc,Number=.,Type=String,Description="The HGVSc field from INFO/CSQ">
##INFO=<ID=HGVSp,Number=.,Type=String,Description="The HGVSp field from INFO/CSQ">
##INFO=<ID=gnomAD_AF,Number=.,Type=Float,Description="The gnomAD_AF field from INFO/CSQ">
##INFO=<ID=CADD_PHRED,Number=.,Type=String,Description="The CADD_PHRED field from INFO/CSQ">
##INFO=<ID=Existing_variation,Number=.,Type=String,Description="The Existing_variation field from INFO/CSQ">
##INFO=<ID=Feature,Number=.,Type=String,Description="The Feature field from INFO/CSQ">
##bcftools_split-vepVersion=1.12+htslib-1.12
##bcftools_split-vepCommand=split-vep -d -c - -a CSQ 128970875-24085Q0017-24NGSHO13-8128-U-96527893_markdup_recalibrated_tnhaplotyper2_allgenesvep.vcf; Date=Wed Apr  3 17:54:57 2024
##bcftools_annotateVersion=1.12+htslib-1.12
##bcftools_annotateCommand=annotate -x INFO/CSQ -o tmp.vcf; Date=Wed Apr  3 17:54:57 2024

System

  • VEP version: [103]
  • VEP Cache version: [103]
  • OS: [Ubuntu]
  • tabix installed [Yes]
@Aisha-D Aisha-D changed the title Using one of the two Using one of the two annotation sources VCF is used when same annotation source name is used Apr 5, 2024
@Aisha-D
Copy link
Author

Aisha-D commented Apr 5, 2024

Example of custom annotation order swapped:

ocker run -v /home/dnanexus:/opt/vep/.vep ensemblorg/ensembl-vep:release_103.1 ./vep -i /opt/vep/.vep/128858722-24079Q0066-24NGSHO12-8128-U-96527893_markdup_recalibrated_tnhaplotyper2_split.vcf -o /opt/vep/.vep/128858722-24079Q0066-24NGSHO12-8128-U-96527893_markdup_recalibrated_tnhaplotyper2_split_filevep.vcf --vcf --cache --refseq --exclude_predicted --symbol --hgvs --af_gnomad --check_existing --variant_class --numbers --offline 
--custom /opt/vep/.vep/clinvar_20240317_hg38_withchr.vcf.gz,ClinVar,vcf,exact,0,CLNSIG,CLNREVSTAT,CLNDN 
--custom /opt/vep/.vep/novaseq_205samples_211007.vcf.gz,Prev,vcf,exact,0,AC,NS 
--custom /opt/vep/.vep/CosmicNonCodingVariants_GRCh38_v99.normal.vcf.gz,COSMIC,vcf,exact,0,ID 
--custom /opt/vep/.vep/CosmicCodingMuts_GRCh38_v99.normal.vcf.gz,COSMIC,vcf,exact,0,ID --plugin CADD,/opt/vep/.vep/./in/vep_refs/whole_genome_SNVs.tsv.gz,/opt/vep/.vep/./in/vep_refs/gnomad.genomes.r3.0.indel.tsv.gz --fields SYMBOL,VARIANT_CLASS,Consequence,EXON,HGVSc,HGVSp,gnomAD_AF,CADD_PHRED,Existing_variation,ClinVar,ClinVar_CLNDN,ClinVar_CLNSIG,COSMIC,Prev_AC,Prev_NS,Feature
##INFO=COSMIC,Number=.,Type=String,Description="/opt/vep/.vep/CosmicNonCodingVariants_GRCh38_v99.normal.vcf.gz (exact)">
##INFO=COSMIC_ID,Number=.,Type=String,Description="ID field from /opt/vep/.vep/CosmicNonCodingVariants_GRCh38_v99.normal.vcf.gz">

@dglemos dglemos self-assigned this Apr 5, 2024
@dglemos
Copy link
Contributor

dglemos commented Apr 5, 2024

Hi @Aisha-D,
I can reproduce the issue - thanks for reporting it.
We are looking for a solution.

Best wishes,
Diana

@dglemos
Copy link
Contributor

dglemos commented Apr 8, 2024

The issue is in your command. In the custom annotation you use the same short name (COSMIC) for both cosmic annotations:

--custom /opt/vep/.vep/CosmicNonCodingVariants_GRCh38_v99.normal.vcf.gz,COSMIC,vcf,exact,0,ID 
--custom /opt/vep/.vep/CosmicCodingMuts_GRCh38_v99.normal.vcf.gz,COSMIC,vcf,exact,0,ID --plugin CADD,/opt/vep/.vep/./in/vep_refs/whole_genome_SNVs.tsv.gz,/opt/vep/.vep/./in/vep_refs/gnomad.genomes.r3.0.indel.tsv.gz --fields SYMBOL,VARIANT_CLASS,Consequence,EXON,HGVSc,HGVSp,gnomAD_AF,CADD_PHRED,Existing_variation,ClinVar,ClinVar_CLNDN,ClinVar_CLNSIG,COSMIC,Prev_AC,Prev_NS,Feature

This name has to be unique, if you change the names to COSMICNonCoding and COSMICCoding the output returns both files info correctly.

--custom /opt/vep/.vep/CosmicNonCodingVariants_GRCh38_v99.normal.vcf.gz,COSMICNonCoding,vcf,exact,0,ID 
--custom /opt/vep/.vep/CosmicCodingMuts_GRCh38_v99.normal.vcf.gz,COSMICCoding,vcf,exact,0,ID --plugin CADD,/opt/vep/.vep/./in/vep_refs/whole_genome_SNVs.tsv.gz,/opt/vep/.vep/./in/vep_refs/gnomad.genomes.r3.0.indel.tsv.gz --fields SYMBOL,VARIANT_CLASS,Consequence,EXON,HGVSc,HGVSp,gnomAD_AF,CADD_PHRED,Existing_variation,ClinVar,ClinVar_CLNDN,ClinVar_CLNSIG,COSMIC,Prev_AC,Prev_NS,Feature

Let me know if you have more questions.

Best wishes,
Diana

@Aisha-D
Copy link
Author

Aisha-D commented Apr 8, 2024

Hi Diana,
Thanks for looking into this. We resolved the issue but was hoping rather than overriding the data if the same name was used to instead raise an error.

@dglemos
Copy link
Contributor

dglemos commented Apr 10, 2024

That makes sense. We will update VEP in the future to check if there are any duplicated names.

Best wishes,
Diana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants