-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading single chunk takes 10x longer than remfile #74
Comments
I think what's going on here... h5py can read partial chunks - and in this case there is no compression so this is possible whereas lindi/zarr is set up to always read entire chunks According to the lindi.json file, the chunk size is [13653, 384] Maybe this is a zarr limitation/constraint/feature? |
Ah, that makes sense. After changing the slice size to equal the chunk size, lindi is now only ~2x the speed of remfile. In inspecting the execution, it looks like zarr makes the request for key But also in digging through the Zarr code, I found that Zarr might be able to support partial reads: Right now, execution is going through the |
Ah. It will be good to figure out whether the duplicate request can be avoided... and/or whether we should implement some caching for this type of situation. Do you think we should set the get_partial_values attribute somehow? |
Yeah, I think that would be nice, but not urgent. For most large reads, I think it would not make a big difference because the read will be mostly full chunks and some part of a chunk on each axis. And most big datasets are compressed. If you have time, it would be great if you can take a look but no pressure. Otherwise, I'll try to take a look at it next week. |
Makes sense. I'm not going to work on it right now. |
Using
remfile
as below:Takes 0.2 seconds on my laptop.
Using
lindi
as below:Takes 2.4 seconds on my laptop.
The data chunk size is (13653, 384) with no compression. Nothing stands out in the LINDI JSON. I'm not sure if I am doing something wrong or if there is an efficiency somewhere in the system.
I'll start looking into it. @magland, do you have any ideas about what might be going on?
The text was updated successfully, but these errors were encountered: