Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent HGNC:ID results between single-thread and multi-thread in vep #1759

Open
karlestira opened this issue Sep 26, 2024 · 3 comments
Assignees

Comments

@karlestira
Copy link

Describe the issue

vep give different result when using multi-thread(--fork).

problem:
Some gene(like ENSG00000169047 or ENSG00000168769) will loss its refseq HGNC ID(near field EntrezGene) when using --fork, and they are shown in single-thread result.

Additional information

This inconsistent is due to the thread setting, same threads give same results bewteen different running, but different threads setting lead to different result.

I believe this is a multi-thread inconsistent bug. And I think this bug happens widely, Any WES vcf and VEP merged cache can reproduce the problem, no specific inputs need.

System

  • VEP version: 112.0(conda build: pl5321h2a3209d_0, conda channel: anaconda/cloud/bioconda)
  • VEP Cache version: homo_sapiens_merged/112_GRCh37
  • Perl version: 5.32.1
  • OS: Debian GNU/Linux 10 (buster)
  • tabix installed ? tabix 1.20 from conda

Full VEP command line

vep --input_file test.vcf --output_file test.vep.10.vcf --format vcf --vcf --symbol --biotype --hgvs --fasta ucsc.hg19.fa --offline --cache --dir_cache /opt/vep/database --no_stats --merged --fork 10 --buffer_size 10000

info line in output vcf:
##VEP="v112.0" API="v112" time="2024-09-26 17:00:31" cache="/opt/vep/database/homo_sapiens_merged/112_GRCh37" ensembl=112.7104005 ensembl-funcgen=112.be19ffa ensembl-io=112.2851b6f ensembl-variation=112.4113356 1000genomes="phase3" COSMIC="98" ClinVar="202306" HGMD-PUBLIC="20204" assembly="GRCh37.p13" dbSNP="156" gencode="GENCODE 19" genebuild="2011-04" gnomADe="r2.1" polyphen="2.2.2" refseq="105.20220307 - GCF_000001405.25_GRCh37.p13_genomic.gff" regbuild="1.0" sift="sift5.2.2"

Full error message

No error message.

@likhitha-surapaneni likhitha-surapaneni self-assigned this Sep 27, 2024
@likhitha-surapaneni
Copy link
Contributor

Hi @karlestira,
Unfortunately I am not able to reproduce this issue on my end. Is there a specific test input file that can be shared to help us debug this?

Kind regards,
Likhitha

@karlestira
Copy link
Author

karlestira commented Oct 8, 2024

Hi @karlestira, Unfortunately I am not able to reproduce this issue on my end. Is there a specific test input file that can be shared to help us debug this?

Kind regards, Likhitha

vcf is from vardict, and some pre-process has been done.

cmd:
5 threads:
vep --input_file NA12878L1.vardict.head10000.vcf --output_file NA12878L1.vep.5.head10000.vcf --format vcf --vcf --symbol --biotype --hgvs --fasta ucsc.hg19.fa --offline --cache --dir_cache vep_db --no_stats --merged --fork 5 --buffer_size 10000
10 threads:
vep --input_file NA12878L1.vardict.head10000.vcf --output_file NA12878L1.vep.10.head10000.vcf --format vcf --vcf --symbol --biotype --hgvs --fasta ucsc.hg19.fa --offline --cache --dir_cache vep_db --no_stats --merged --fork 10 --buffer_size 10000

using VEP database download from ftp(sorry I forgot the url) with the name: homo_sapiens_merged_vep_112_GRCh37.tar.gz

then:
diff NA12878L1.vep.5.head10000.vcf NA12878L1.vep.10.head10000.vcf

In my system, the diff is between line 62(the cmd line, it is no problem), 4892, 4893, 4987, 4988(these 4 lines is different in HGNC ID when transcript is from refseq)

NA12878L1.vardict.head10000.vcf.gz

@TimD1
Copy link

TimD1 commented Oct 23, 2024

I have encountered the same issue, using VEP version 111.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants