-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support exporting dataset #83
Conversation
I don't see a missed omero-cli-zarr/src/omero_zarr/raw_pixels.py Line 111 in 5d66d65
|
@dominikl That's great. I wonder if you could add
I think that represents the summary of the discussion at ome/ngff#31 We probably want to consider some different naming of Dataset and Images (not just ID.zarr)? I guess we could do:
"collection": {
|
Yes, I guess in the end the dataset has to be a "zarr" itself with the appropriate metadata instead of just a directory. The PR is more a draft, trying to export images in parallel, which unfortunately doesn't work very well, needs some more debugging. I agree too, names are nicer than meaningless Ids, but leads to conflicts. |
Even without the addition of Incidentally this raises the question of whether |
5d66d65
to
b61435c
Compare
Tested locally, doesn't work very well either. After a while the export dies with (even with only 3 threads):
Does that mean the RawPixelsStore simply can't be called in parallel? Or is the parallel |
A single RawPixelsStore should block multiple access, i.e. it should be thread-safe but not concurrent. Can you find the underlying exception that was thrown? |
It was actually an
Is there a memory leak somewhere on the server side when repeatedly calling RawPixelsStore.getPlane()? |
The only route I know of is If the pixel store doesn't get closed and more are created. |
I think the problem is that I create a |
the You could also try |
eb558b5
to
be6225a
Compare
for more information, see https://pre-commit.ci
Thanks. I tried |
I wonder if some images aren't getting all their planes downloaded - so they don't get closed. NB: We discussed an alternative export strategy (when you don't have access to the binary repo):
This means that the server is doing much less work and maybe more parallelisation is possible. |
👍 Would be good to have different strategies.
|
img_group named image.getName() added to collection
So, with a bit more testing, I realise that my approach of blindly using the Image name for the zarr image group causes issues with various characters, such as |
Ah true, you'd have to escape all posix and windows special characters, that can be tricky. How about using UUIDs as directory names (instead of the arbitrary omero IDs), and put the names into the metadata? That means one always has to lookup the metadata to make sense of the directory structure, but saves the hassle of handling all sorts of special characters. |
Superseded by #88 |
Attempt to add 'export dataset' functionality. Exports all images of a dataset into a directory with the dataset id as name. Export of the images happens in parallel using dask.
But something's not quite right. After running for a while you'll get a out of Java heap space error on the server side:
Is that an error on the client side (session not closed, etc.) or a server side issue? Any ideas @sbesson @joshmoore ?