Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSG: The coordinate interval for display is of different length than for the reference allele #1775

Open
suzyhh opened this issue Oct 22, 2024 · 3 comments
Assignees

Comments

@suzyhh
Copy link

suzyhh commented Oct 22, 2024

Describe the issue

Hello, I'm testing v113 using the v113 docker obtained from docker hub and the v113 cache downloaded using the following command:
docker run --network host -t -i -v /mnt/data1/software/vep/:/opt/vep/.vep ensemblorg/ensembl-vep:release_113.0 INSTALL.pl -a c -s homo_sapiens_merged -y GRCh38

I have successfully tested v113 with a VCF containing structural variants, but I encounter an error when running with a VCF containing SNVs/indels. After the error the process just hangs doing nothing until I kill it. The output files contains the VCF header but no variants.

System

  • VEP version: v113.0 (Docker)
  • VEP Cache version: v113.0

Full VEP command line

vep --cache --offline \
    -i ~{vcf} \
    --dir_cache /opt/vep/.vep \
    --dir_plugins /opt/vep/.vep/Plugins/ \
    --vcf --compress_output bgzip \
    --merged \
    --fasta $refFasta \
    --assembly GRCh38 \
    --no_stats \
    --fork ~{fork} \
    --buffer_size ~{buffer} \
    --no_escape \
    --check_existing \
    --hgvs --hgvsg \
    --af --af_gnomadg \
    --protein --uniprot \
    --symbol \
    --numbers \
    --allele_number \
    --sift b --polyphen b \
    --pubmed \
    --show_ref_allele \
    --variant_class \
    --mane_select \
    --transcript_version \
    --flag_pick_allele_gene \
    --plugin MaxEntScan,/opt/vep/.vep/fordownload,NCSS,SWA \
    --plugin SpliceAI,snv=~{spliceaiSnv},indel=~{spliceaiIndel} \
    --plugin SpliceRegion \
    --plugin NearestExonJB,max_range=100 \
    --plugin REVEL,/opt/vep/.vep/new_tabbed_revel_grch38.tsv.gz \
    --plugin SpliceDistance \
    --plugin UTRAnnotator,file=/opt/vep/.vep/fordownload/uORF_5UTR_GRCh38_PUBLIC.txt \
    --plugin GnomadPli,file=~{gnomadv4Pli} \
    --plugin CADD,snv=~{caddSnv},indels=~{caddIndel} \
    --plugin AlphaMissense,file=~{alphaMissense} \
    --custom file=${hgmdZip},short_name=HGMD,format=vcf,type=exact,coords=0,fields=CLASS%PHEN \
    --custom file=~{clinVarVcf},short_name=ClinVar,format=vcf,type=exact,coords=0,fields=CLNSIG%CLNREVSTAT%CLNDN \
    --custom file=~{gnomadv4Vcf},short_name=gnomad4,format=vcf,type=exact,coords=0,fields=homCount%hetCount%hemiCount%gnomadFilter \
    --force_overwrite \
    -o ~{outName}_snvs.vep.vcf.gz

Full error message

STDERR 
STDERR -------------------- EXCEPTION --------------------
STDERR MSG: The coordinate interval for display is of different length than for the reference allele
STDERR STACK Bio::EnsEMBL::Variation::Utils::Sequence::hgvs_variant_notation /opt/vep/src/ensembl-vep/Bio/EnsEMBL/Variation/Utils/Sequence.pm:511
STDERR STACK Bio::EnsEMBL::Variation::VariationFeature::hgvs_genomic /opt/vep/src/ensembl-vep/Bio/EnsEMBL/Variation/VariationFeature.pm:2013
STDERR STACK Bio::EnsEMBL::VEP::OutputFactory::VariationFeatureOverlapAllele_to_output_hash /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/OutputFactory.pm:1314
STDERR STACK Bio::EnsEMBL::VEP::OutputFactory::TranscriptVariationAllele_to_output_hash /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/OutputFactory.pm:1606
STDERR STACK Bio::EnsEMBL::VEP::OutputFactory::get_all_VariationFeatureOverlapAllele_output_hashes /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/OutputFactory.pm:393
STDERR STACK Bio::EnsEMBL::VEP::OutputFactory::get_all_output_hashes_by_VariationFeature /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/OutputFactory.pm:347
STDERR STACK Bio::EnsEMBL::VEP::OutputFactory::VCF::get_all_lines_by_InputBuffer /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/OutputFactory/VCF.pm:316
STDERR STACK Bio::EnsEMBL::VEP::Runner::_buffer_to_output /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:422
STDERR STACK (eval) /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:671
STDERR STACK Bio::EnsEMBL::VEP::Runner::_forked_process /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:665
STDERR STACK Bio::EnsEMBL::VEP::Runner::_forked_buffer_to_output /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:496
STDERR STACK Bio::EnsEMBL::VEP::Runner::next_output_line /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:367
STDERR STACK Bio::EnsEMBL::VEP::Runner::run /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:208
STDERR STACK toplevel /opt/vep/src/ensembl-vep/vep:46
STDERR Date (localtime)    = Mon Oct 21 13:54:46 2024
STDERR Ensembl API version = 113
STDERR ---------------------------------------------------
STDERR Died in forked process 53

I have tested my command using v112 and it completes successfully with no errors/warnings so it does seem to be a v113 problem. Running locally I have found that the problem occurs when using the --hgvsg flag. If I run using only --hgvs I do not get the above error, but I do get several warnings, here is a small snippet:

WARNING: 12 : WARNING: Transcript-assembly mismatch in rs6650119
WARNING: Transcript-assembly mismatch in rs6650119
WARNING: 13 : WARNING: Transcript-assembly mismatch in rs2275166
WARNING: Transcript-assembly mismatch in rs2275166
WARNING: 14 : WARNING: Transcript-assembly mismatch in rs309472
WARNING: Transcript-assembly mismatch in rs309472
WARNING: Transcript-assembly mismatch in rs586178
WARNING: Transcript-assembly mismatch in rs586178
WARNING: 39 : WARNING: Transcript-assembly mismatch in rs193922695
WARNING: Transcript-assembly mismatch in rs193922695
WARNING: 52 : WARNING: Transcript-assembly mismatch in rs7601549
WARNING: Transcript-assembly mismatch in rs7601549

I do not receive any errors or warning when I run v113 without both --hgvs and --hgvsg. This unfortunately renders v113 unusable for us as we rely on the hgvs annotations.

Many thanks!

@likhitha-surapaneni likhitha-surapaneni self-assigned this Oct 22, 2024
@likhitha-surapaneni
Copy link
Contributor

Hi @suzyhh , thank you for reporting the issue. Can you kindly provide us the input file/test input used?

@suzyhh
Copy link
Author

suzyhh commented Oct 22, 2024

@likhitha-surapaneni

I have attached the input file (bgzipped for compatibility with github) that generates the error - ta!
TwEx_ProbandF-TwEx_MotherF-TwEx_FatherF_gnomad_filtered_mt.vcf.gz

@likhitha-surapaneni
Copy link
Contributor

likhitha-surapaneni commented Oct 22, 2024

Hi @suzyhh , thank you for providing us the file. We know why --hgvsg might be erroring out and are trying to push in a post-release fix for it (related to Ensembl/ensembl-variation#1124)

Regarding the warnings associated with --hgvs, we are trying to investigate this and will update you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants