-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
##INFO fields being dropped #302
Comments
hi @FlexLuthORF can you attach a small VCF (can be just the header and a single variant) that shows the issue? |
2201410002_head70.vcf.txt The first vcf is the one I am editing with cyvcf2 and the second one is the output that has strangely missing and cleaved ## fields. |
If I run this:
I get no error. Here are the contents of issue302.py. If you can show me how to recreate, then we can debug, but I suspect something else is going on: from cyvcf2 import VCF, Writer
def change_genotypes(input_vcf, bedfile, output_path):
vcf_reader = VCF(input_vcf, strict_gt=True)
# Create a new VCF Writer using the input VCF as a template
vcf_writer = Writer(output_path, vcf_reader)
for record in vcf_reader:
for index, sample in enumerate(vcf_reader.samples):
if True:
current_genotype = record.genotypes[index]
if current_genotype is None or -1 in current_genotype:
record.genotypes[index] = [-1, -1, False] # Set genotype to unknown
elif 1 in current_genotype:
record.genotypes[index] = [-1, 1, False] # Set genotype to ./1
else:
record.genotypes[index] = [-1, 0, False] # Set genotype to ./0
# Reassign the genotypes field
record.genotypes = record.genotypes
vcf_writer.write_record(record)
vcf_writer.close()
vcf_reader.close()
if __name__ == "__main__":
import sys
change_genotypes(sys.argv[1], None, "-") |
Can you show the output vcf file? I get no traceback error either. It's that it messes up the ## fields and if ##fileformat is missing then I cant bcftools index the .vcf.gz It does change the genotypes and all that. Its just dropping ## fields and appending the new file location at the top with no ## before it. `cat 2201410002_hemi.bed chr2 90225183 90249908 IGKV1-NL1` thats the bed file it is using in case that helps. Only one line |
This happens for you using my code above? The o.vcf is well formed with the correct headers. |
Interesting. You are correct, your code has all the ## fields. I cant figure out why the bit of me checking if the entry is within coords on a bed file is causing the issue. Thank you for your help. If I figure out the issue I will post it here... Though for now, since it was only the top few ## I am just using some bash commands to strip and re-add the info fields so I can continue my analysis. Perhaps how I am writing the vcf afterwards?
I am calling the python script within a batch script and redircting the returned vcf into a output file. Yeah I think this is the problem and its an issue on my side. |
I am editing the below vcf file with the following script:
I expect it to copy all ##info fields from the template to the new vcf, but it seems to truncate it? I am missing:
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##bcftoolsVersion=1.19+htslib-1.19.1
##bcftoolsCommand
from the newly created file and it is causing downstream issues. It also seems to be placing the path to the new file right at the top with now ## before it.
The text was updated successfully, but these errors were encountered: