Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add contig_lengths dataset attribute if found in the VCF file #946

Merged
merged 2 commits into from
Nov 3, 2022

Conversation

tomwhite
Copy link
Collaborator

@tomwhite tomwhite commented Nov 2, 2022

Part of #464, and needed for tskit-dev/tsinfer#748

Copy link
Collaborator

@benjeffery benjeffery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Are there docs where the mapping of VCF fields to dataset keys are listed?

@@ -525,6 +525,10 @@ def vcf_to_zarr_sequential(
ds.attrs["filters"] = filters
ds.attrs["vcf_zarr_version"] = "0.1"
ds.attrs["vcf_header"] = vcf.raw_header
try:
ds.attrs["contig_lengths"] = vcf.seqlens
except AttributeError:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the cyvcf2 source I see that this raises if the lengths are absent in the header, so seem ok to then omit them in our output.

@tomwhite
Copy link
Collaborator Author

tomwhite commented Nov 2, 2022

Are there docs where the mapping of VCF fields to dataset keys are listed?

The mandatory ones are listed on https://github.com/pystatgen/vcf-zarr-spec/blob/main/vcf_zarr_spec.md#vcf-zarr-group-attributes. Perhaps we should add this one as an optional attribute there?

Copy link
Collaborator

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tomwhite tomwhite added the auto-merge Auto merge label for mergify test flight label Nov 3, 2022
@mergify mergify bot merged commit 6fc8d53 into sgkit-dev:main Nov 3, 2022
@tomwhite tomwhite deleted the contig_lengths branch November 7, 2022 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Auto merge label for mergify test flight
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants