Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Most SV types missing from genotyped vcf #139

Open
afurches opened this issue Dec 6, 2024 · 0 comments
Open

Most SV types missing from genotyped vcf #139

afurches opened this issue Dec 6, 2024 · 0 comments

Comments

@afurches
Copy link

afurches commented Dec 6, 2024

Hi,

I'm using MetaSV to merge and genotype BreakDancer calls (CTX,DEL,INS,INV,ITX) across multiple contigs and subpopulations. To replicate what I'm doing with other software, I want to use MetaSV to:

  1. merge SV calls across groups and contigs,
  2. use the merged vcf file to genotype all samples individually,
  3. merge all genotyped vcf files into a multisample vcf.

Step (1) seems to work well and quickly - I'm able to get breakdancer.vcf.gz and variant.vcf.gz output files, and all SV types are present.

$ zcat breakdancer.vcf.gz | grep -v '#' | cut -f5 | sort | uniq
<CTX>
<DEL>
<INS>
<INV>
<ITX>

$ zcat variants.vcf.gz | grep -v '#' | cut -f5 | sort | uniq
<CTX>
<DEL>
<INS>
<INV>
<ITX>

But when I genotype samples using either merged vcf (--breakdancer_vcf), only DEL calls are genotyped/reported, even if I specify --svs_to_report {INV,CTX,INS,DEL,ITX}.
For other SV types, the error files say Skipping Record(...) due to small size.

EXAMPLE:

...
ERROR 2024-12-06 13:02:29,571 metasv.sv_interval   Ignoring record due to missing SVTYPE or INFO field in Record(CHROM=genome_chloro|Contig01+38342752_1, POS=82170, REF=G, ALT=[<INS>])
WARNING 2024-12-06 13:02:29,571 metasv.sv_interval   Skipping Record(CHROM=genome_chloro|Contig01+38342752_1, POS=83010, REF=T, ALT=[<ITX>]) due to small size
WARNING 2024-12-06 13:02:29,571 metasv.sv_interval   Skipping Record(CHROM=genome_chloro|Contig01+38342752_1, POS=84336, REF=T, ALT=[<ITX>]) due to small size
ERROR 2024-12-06 13:02:29,571 metasv.sv_interval   Ignoring record due to missing SVTYPE or INFO field in Record(CHROM=genome_chloro|Contig01+38342752_1, POS=84903, REF=A, ALT=[<INS>])
ERROR 2024-12-06 13:02:29,572 metasv.sv_interval   Ignoring record due to missing SVTYPE or INFO field in Record(CHROM=genome_chloro|Contig01+38342752_1, POS=104631, REF=A, ALT=[<INS>])
INFO 2024-12-06 13:02:29,572 metasv.main          SV types are set(['DEL'])
INFO 2024-12-06 13:02:29,572 metasv.main          Do merging
INFO 2024-12-06 13:02:29,572 metasv.main          Processing SVs of type DEL
INFO 2024-12-06 13:02:29,572 metasv.main          Intra-tool Merging SVs of type DEL
INFO 2024-12-06 13:02:29,572 metasv.main          First level merging for DEL for tool BreakDancer
INFO 2024-12-06 13:02:29,902 metasv.main          Inter-tool Merging SVs of type DEL
INFO 2024-12-06 13:02:30,061 metasv.main          Output merged VCF without assembly
INFO 2024-12-06 13:02:30,313 metasv.main          ('DEL', 'LowQual', 'IMPRECISE', ('BreakDancer',)):572
INFO 2024-12-06 13:02:30,342 metasv.main          Clean up pybedtools
INFO 2024-12-06 13:02:30,342 metasv.main          All Done!
# genotyped output
$ zcat variants.vcf.gz | grep -v '#' | cut -f5 | sort | uniq
<DEL>

Why are all SV types are present in the merged vcfs, but only DEL are genotyped?

I'll plan on using a different software to genotype and merge the calls for now, but wanted to ask whether this is a bug.

Thanks,
A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant