Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate VCF header when writing, if no header is explicitly supplied #1021

Merged
merged 3 commits into from
Apr 24, 2023

Conversation

tomwhite
Copy link
Collaborator

Not ready to merge (but happy to discuss) since it doesn't have full coverage yet.

Copy link
Collaborator

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I think we need to update any existing headers rather than autogenerate unfortunately. All sorts of stuff will be stored in these headers and autogenerating a minimal header for the data will lose that information.

sgkit/io/vcf/vcf_writer.py Show resolved Hide resolved
sgkit/io/vcf/vcf_writer.py Show resolved Hide resolved
@tomwhite tomwhite marked this pull request as ready for review February 20, 2023 17:26
@tomwhite tomwhite force-pushed the vcf-generate-header branch 2 times, most recently from 0458a3b to 1b8e57f Compare February 28, 2023 14:04
@tomwhite
Copy link
Collaborator Author

OK, I've updated to follow the approach suggested by @jeromekelleher, which is to use existing VCF header lines if they are available - and only generate header lines if they are not available. See new API doc for details.

Copy link
Collaborator

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not gone through the gory details, but LGTM. Perhaps @benjeffery could take a closer look?

@mergify
Copy link
Contributor

mergify bot commented Apr 5, 2023

This PR has conflicts, @tomwhite please rebase and push updated version 🙏

@mergify mergify bot added the conflict PR conflict label Apr 5, 2023
@mergify mergify bot removed the conflict PR conflict label Apr 5, 2023
@benjeffery
Copy link
Collaborator

@tomwhite I see you're still pushing here, let me know when I should review.

@tomwhite
Copy link
Collaborator Author

tomwhite commented Apr 5, 2023

@benjeffery It's ready for a review now (I was just fixing conflicts). Thanks!

@tomwhite tomwhite added this to the 0.7.0 milestone Apr 18, 2023
Copy link
Collaborator

@benjeffery benjeffery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just a couple of comments.
Should there also be tests for what happens in edge cases? Empty string header arguments, that kind of thing.

sgkit/io/vcf/vcf_writer.py Outdated Show resolved Hide resolved
for variant in v:
assert "NS" not in dict(variant.INFO).keys()
assert "HQ" not in variant.FORMAT
assert variant.genotypes is not None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also assert that H3 and GL are retained?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@tomwhite
Copy link
Collaborator Author

Should there also be tests for what happens in edge cases? Empty string header arguments, that kind of thing.

If a header is passed in by the user then it will be used as-is, so an empty string header will result in a headerless VCF. I'm not sure that's something we want to support or encourage though - perhaps we leave it as undefined at this point?

@codecov-commenter
Copy link

codecov-commenter commented Apr 24, 2023

Codecov Report

Merging #1021 (40b95dc) into main (2262854) will not change coverage.
The diff coverage is 100.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##              main     #1021    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           49        49            
  Lines         4748      4871   +123     
==========================================
+ Hits          4748      4871   +123     
Impacted Files Coverage Δ
sgkit/io/vcf/vcf_writer.py 100.00% <100.00%> (ø)
sgkit/model.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@benjeffery
Copy link
Collaborator

Great, this looks ready to merge now.

@jeromekelleher jeromekelleher added the auto-merge Auto merge label for mergify test flight label Apr 24, 2023
@mergify mergify bot merged commit 4124a40 into sgkit-dev:main Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Auto merge label for mergify test flight
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generate VCF header when writing VCF
4 participants