-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read with xtensor-zarr, support v3 #34
Read with xtensor-zarr, support v3 #34
Conversation
Thanks for working on this! I see the same behavior locally as on the CI here. I could be wrong (I haven't worked with the BLOSC library previously), but it might be related to the addition of Apparently BLOSC_MAX_OVERHEAD is 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. We can either wait for a fix of the failing z5py case or mark it as a known failure for now and enable the test again later.
test/test_read_all.py
Outdated
def read_with_xtensor_zarr(fpath, ds_name): | ||
if ds_name == "blosc": | ||
ds_name = "blosc/lz4" | ||
fname = "a.npz" | ||
if os.path.exists(fname): | ||
os.remove(fname) | ||
subprocess.check_call(["generate_data/xtensor_zarr/build/run_xtensor_zarr", fpath, ds_name]) | ||
return np.load(fname)["a"] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice approach. I see that you built this .npz writer into the main.cpp
program whenever two command line arguments are provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we use the same executable for writing and reading.
Thanks for looking into it @grlee77. zarr-python seems to be more tolerant, as it can read z5py-zarr-blosc successfully, but that doesn't mean these trailing 16 bytes are valid. I'll investigate before we merge this PR. |
Let me know what you find, should hopefully be a simple fix in z5. Maybe I am accidentally padding something when writing blosc. |
python-blosc doesn't seem to be able to read the chunks either: >>> import blosc
>>> with open("data/z5py.zr/blosc/lz4/0.0.0", "rb") as f:
... data = f.read()
...
>>> d = blosc.decompress(data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/david/mambaforge/envs/zarr_implementations-dev/lib/python3.8/site-packages/blosc/toplevel.py", line 594, in decompress
return _ext.decompress(bytes_like, as_bytearray)
blosc_extension.error: Error 10032 : not a Blosc buffer or header info is corrupted @constantinpape I think you should not store the maximum possible size of the compressed data. Blosc gives you the actual size after it has compressed, you can see how we do it in xtensor-io. |
Thanks, I will have a look. |
Hopefully it is as simple as removing BLOSC_MAX_OVERHEAD from this line |
I had an initial look and I can reproduce this locally. Unfortunately it looks like just removing the overhead here does not fix the issue. |
I had another look, and removing the |
Ok, I drafted a new release. |
I updated the env so that the correct z5py version is installed. That seems to work, but now a bunch of other tests fail. |
It looks like there is a new |
I see, that's probably due to the zarr release that happened in the meantime. This might be solved by #33 already and it would be enough to rebase onto master. |
366b0b9
to
676f7d1
Compare
I just rebased but that is not enough. I guess that this |
@davidbrochart, I can help take a look at this. I think the issue is that the nested vs. flat here was implemented prior to Basically there are v2 files where 'dimension_separator' is '/', but I don't think the zarr_implementations/test/test_read_all.py Lines 73 to 84 in 7349e03
I think Josh said that key_separator argument may be going away, so I should update the zarr python generators/tests to rely on the |
This PR will require zarr-python >= 2.8 so that the |
zarr 2.8 introduces the dimension_separator metadata key
Thanks @grlee77, all green now! |
🚀 Merging so I can rebase the jzarr PR. |
It looks like xtensor-zarr cannot read zarr v2 written by z5py using blosc compression. What I can see:
"fill_value": 0.0
, which is not a valid unsigned 8-bit integer value, although it can be cast to it.