Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy loading of wells and multi-site plates #66

Closed
jluethi opened this issue Jun 7, 2022 · 29 comments
Closed

Lazy loading of wells and multi-site plates #66

jluethi opened this issue Jun 7, 2022 · 29 comments
Labels
High Priority Current Priorities & Blocking Issues

Comments

@jluethi
Copy link
Collaborator

jluethi commented Jun 7, 2022

[not a fractal workflow issue, but we should have a high-level issue to discuss our progress on this]

We want to be able to lazily load single wells, not just whole plates. And we also want to support plates where each well consists of more than a single site (currently all fused into one, see #36).

This is work required on the ome-zarr-py side to allow to load images accordingly. Also, while doing this loading, we'll need to find a place to store well dimensions (e.g. are the sites arranged as 9 by 8 or 8 by 9) somewhere in the OME-Zarr file and read it. See this issue for guidance of how we can implement this: ome/ome-zarr-py#200

@jluethi jluethi added High Level Overview planning issues discussion We should discuss specific design choices labels Jun 7, 2022
@jluethi jluethi added this to the Improve OME-Zarr viewing milestone Jun 8, 2022
@jluethi jluethi added External Library and removed High Level Overview planning issues discussion We should discuss specific design choices labels Jun 13, 2022
@jluethi jluethi added the High Priority Current Priorities & Blocking Issues label Jun 15, 2022
@jluethi
Copy link
Collaborator Author

jluethi commented Jun 16, 2022

To summarize so of today's development:
@mfranzon and I started developing the lazy loading of multi-site OME-Zarrs. Marco's work on the plugin side is here:
https://github.com/fractal-napari-plugins-collection/ome-zarr-py/tree/fov_reader

I quickly looked through it now. @mfranzon can we rebase that to the current main of ome-zarr-py? They now have their own implementation that allows reading multiple resolution levels and we don't need to suggest our quick fix there anymore.
Also, is there a reason to go directly for Plates? I thought we had agreed to build the lazy loading for wells and then generalize to Plates using the Wells.
Finally, your current implementation still looks like it's row & column focused. I think we should aim for a coordinate-based approach if possible. I'll elaborate in the next post on my work on that.
Let's have a quick call tomorrow morning to touch base about these questions.

@jluethi
Copy link
Collaborator Author

jluethi commented Jun 16, 2022

I've been working on how we could save positional information and where in the metadata this would go. Was a bit confusing to figure out, but I think I found a good answer now that fits in the spec and fits with the recommendations we got in the ome-zarr issue here. I will try to visually explain it, so that it will be easier for others to understand than it was for me :)

The metadata is saved in the .zattrs file on the field of view level, thus each field can get metadata about its positioning (which should give us good flexibility with regard to positioning of fovs). Here is where this goes in the file hierarchy:

├── 20200812-CardiomyocyteDifferentiation14-Cycle1.zarr
│   └── B
│       └── 03
│           ├── 0
│           │   ├── .zattrs    <= multiscales metadata for site 0
│           │   ├── 0
│           │   ├── 1
│           │   ├── 2
│           │   ├── 3
│           │   └── 4
│           ├── 1
│           │   ├── .zattrs    <= multiscales metadata for site 1
│           │   ├── 0
│           │   ├── 1
│           │   ├── 2
│           │   ├── 3
│           │   └── 4
│           ├── 2
│           │   ├── .zattrs    <= multiscales metadata for site 2
│           │   ├── 0
│           │   ├── 1
│           │   ├── 2
│           │   ├── 3
│           │   └── 4
│           └── 3
│               ├── .zattrs    <= multiscales metadata for site 3
│               ├── 0
│               ├── 1
│               ├── 2
│               ├── 3
│               └── 4

The metadata can contain coordinateTransformations metadata. They contain fields of type scale (=> pixel sizes, we can also implement those soon, maybe we'll have to already) and translation . Translation information is what we want to be using to specify the position of sites.

We add the coordinateTransformations translation metadata at the dataset level, i.e. per pyramid level. An example .zattrs file can look like this:

{
    "multiscales": [
        {
            "axes": [
                {
                    "name": "c",
                    "type": "channel"
                },
                {
                    "name": "z",
                    "type": "space",
                    "unit": "micrometer"
                },
                {
                    "name": "y",
                    "type": "space"
                },
                {
                    "name": "x",
                    "type": "space"
                }
            ],
            "datasets": [
                {
                    "path": "0",
                    "coordinateTransformations": [{
                        "type": "translation",
                        "translation": [0.0, 2160.0, 0.0]
                    }]
                },
                {
                    "path": "1"
                },
                {
                    "path": "2"
                },
                {
                    "path": "3"
                },
                {
                    "path": "4"
                }
            ],
            "version": "0.4"
        }
    ]
}

A few notes:

  • Translation has 3 dimensions here for us, [z, y, x]. Channels aren't translated and I assume neither is time (but we should test how this deals with time data eventually, just not yet) => see below
  • We may need to define translation metadata for each pyramid level, not fully sure. The current napari ome zarr reader only seems to read the highest resolution coordinateTransformation though if I understand it correctly, see here. Maybe when correct scale metadata is provided for each pyramid level (at the same place as the translation metadata), the framework already handles the scaling for lower resolutions? => see below
  • I'm not fully sure whether this scale could be in units or whether it's always in pixels. For now, I'm treating it as a thing in pixels and that seems to work (e.g. shift by 2560 or 2160 pixels). => see below
  • We probably need to be in ome-ngff version 0.4, as it was only defined in that standard (see version in the example file)
  • The cool thing about those translation values is that they are arbitrary pixel shifts, so we could define any position we want (we'll just have to handle it somewhere as well)

@jluethi
Copy link
Collaborator Author

jluethi commented Jun 16, 2022

The napari-ome-zarr plugin (through the ome-zarr-py library) can already interpret these translation information correctly for single OME-Zarr files (not for wells or plate, just for single FOVs). Thus, that means we are using the values here as we should and as recommended by Will Moore here.

I created two example datasets with this metadata hardcoded, both are variants of the single site, 2x2 UZH dataset. I loaded them site by site, thus each site gets added as a separate channel (much longer channel list, would not scale for full plates at all, but good for testing).

In the first dataset, /data/active/jluethi/Fractal/UZH_1_well_2x2_sites_multifov_translationMetadata, the metadata is correct for the right 2x2 grid.
Screenshot 2022-06-16 at 17 13 42

In the second dataset, /data/active/jluethi/Fractal/UZH_1_well_2x2_sites_multifov_translationMetadata1x4, the translation parameters create a 1x4 grid. This data wasn't image that way, but a user could image data in such a fashion and we should support any of these settings :)
Screenshot 2022-06-16 at 17 11 48

These two test sets should give us a good opportunity to test our lazy-loading of Wells (& eventually Plates) with multiple field of views saved as separate images. Also, if we can follow this logic of placing images into a dask array based on such coordinates, we have an easy road to also process arbitrary patterns of sites, see e.g. https://github.com/fractal-analytics-platform/mwe_fractal/issues/23

Given that this seems to be the good spot to put positional metadata (if we also manage to use it), there are 2 parts remaining then:

  1. Getting this metadata during Zarr file creation. Either calculating it based on image shapes (2160x2560), site numbers (1-4 in this case) and well dimensions (e.g. 2x2 in this case). Or parsing it directly from a Yokogawa metadata file (see here: https://github.com/fractal-analytics-platform/mwe_fractal/issues/46)
  2. Actually using this metadata during lazy loading to correctly place sites in a well and eventually load multiple wells to see full plates

I will be tackling steps towards making 1 more general, getting https://github.com/fractal-analytics-platform/mwe_fractal/issues/46 ready for implementation. We can work with the two example OME-Zarr files /data/active/jluethi/Fractal/UZH_1_well_2x2_sites_multifov_translationMetadata and /data/active/jluethi/Fractal/UZH_1_well_2x2_sites_multifov_translationMetadata1x4 in the meantime.
And let's discuss tomorrow how we best push towards 2 given this metadata @mfranzon

@jluethi
Copy link
Collaborator Author

jluethi commented Jun 16, 2022

I'm not fully sure whether this scale could be in units or whether it's always in pixels. For now, I'm treating it as a thing in pixels and that seems to work (e.g. shift by 2560 or 2160 pixels).

Reading through the spec once more has answered this question:

They MUST contain exactly one scale transformation that specifies the pixel size in physical units or time duration. [...] It MAY contain exactly one translation that specifies the offset from the origin in physical units. If translation is given it MUST be listed after scale to ensure that it is given in physical coordinates.

I will update my two examples to correspond to this specification and check whether things still work that way :) Also kind of useful, given that coordinates we get from the Yokogawa are in physical units :)

Also:

Translation has 3 dimensions here for us, [z, y, x]. Channels aren't translated and I assume neither is time (but we should test how this deals with time data eventually, just not yet)

Answer:

The length of the scale and translation array MUST be the same as the length of "axes"

=> Must be length for for our case, with 1.0 for the channel axis. Will also test that.

@jluethi
Copy link
Collaborator Author

jluethi commented Jun 16, 2022

I adapted the two example OME-Zarrs to the spec with correct translations in physical scales and with scale parameters for every pyramid level. It works as expected. The .zattrs file now looks like this:

{
    "multiscales": [
        {
            "axes": [
                {
                    "name": "c",
                    "type": "channel"
                },
                {
                    "name": "z",
                    "type": "space",
                    "unit": "micrometer"
                },
                {
                    "name": "y",
                    "type": "space"
                },
                {
                    "name": "x",
                    "type": "space"
                }
            ],
            "datasets": [
                {
                    "path": "0",
                    "coordinateTransformations": [{
                        "type": "scale",
                        "scale": [1.0, 1.0, 0.1625, 0.1625]
                    },
                    {
                        "type": "translation",
                        "translation": [0.0, 0.0, 0.0, 416.0]
                    }]
                },
                {
                    "path": "1",
                    "coordinateTransformations": [{
                        "type": "scale",
                        "scale": [1.0, 1.0, 0.4875, 0.4875]
                    }]
                },
                {
                    "path": "2",
                    "coordinateTransformations": [{
                        "type": "scale",
                        "scale": [1.0, 1.0, 1.4625, 1.4625]
                    }]
                },
                {
                    "path": "3",
                    "coordinateTransformations": [{
                        "type": "scale",
                        "scale": [1.0, 1.0, 4.3875, 4.3875]
                    }]
                },
                {
                    "path": "4",
                    "coordinateTransformations": [{
                        "type": "scale",
                        "scale": [1.0, 1.0, 13.1625, 13.1625]
                    }]
                }
            ],
            "version": "0.4"
        }
    ]
}

@mfranzon
Copy link
Collaborator

Quick notes, @jluethi

I quickly looked through it now. @mfranzon can we rebase that to the current main of ome-zarr-py? They now have their own implementation that allows reading multiple resolution levels and we don't need to suggest our quick fix there anymore.

Yes, no problem on this I will test ita little to be sure that the new ome-zarr-py version works.

Also, is there a reason to go directly for Plates? I thought we had agreed to build the lazy loading for wells and then generalize to Plates using the Wells.

I got the point, my bad, probably during our discussion yesterday I misunderstood what you mean when you talking about 'Well'. I thought that we wanted to modify the Plate class but at Well level, so introducing the sites handling well per well without taking care of the Plate level. This should have ended up in a situation in which, each Well in a Plate had correct number and position of sites but not the position of the Wells in a Plate.
Instead, and now I understand, you mean Well class, so your idea is to modify the Well class in order to handling correctly the sites, but this means that you have to load well x well and not the single plate. If you are agree on that, as first step, I rapidly move on well.

Finally, your current implementation still looks like it's row & column focused. I think we should aim for a coordinate-based approach if possible. I'll elaborate in the next post on my work on that. Let's have a quick call tomorrow morning to touch base about these questions.

True, I tried a grid approach as quick implementation, I hoped to have it working in a couple of hours, instead it requires me a little bit of effort. This time no misunderstanding, we agreed on the coordinate-based :)! I wanted to implement a grid base, quick and dirty, just to have the feeling of the visualization and check if we got some improvements in napari or if we have to change something also from the plugin point of view.

@jluethi
Copy link
Collaborator Author

jluethi commented Jun 17, 2022

Instead, and now I understand, you mean Well class, so your idea is to modify the Well class in order to handling correctly the sites, but this means that you have to load well x well and not the single plate. If you are agree on that, as first step, I rapidly move on well.

Exactly. This approach seems reasonable to me and is also what Will Moore suggested in the ome-zarr issue here

  • Create some sample data where the Fields of a Well have correct translations, so that opening the images individually in napari shows the correct layout.
  • Have go at updating the ome-zarr-py Well class, to respect the translations when stitching Fields into a Well.
  • If this seems viable, then open a PR (or a more specific issue) to propose defining that behaviour in the ngff spec.
  • Bonus: investigate whether you can also stitch those multi-fov Wells into a Plate

While the thing that is bonus here clearly is part of our scope. But having the Plate class use the Well class may be a good generalization and separation of concerns instead of implementing Well handling that can get a bit more complicated over time both in the Plate and the Well class.

I wanted to implement a grid base, quick and dirty, just to have the feeling of the visualization and check if we got some improvements in napari or if we have to change something also from the plugin point of view.

Sounds good :) If it's easy-ish to implement, that a good default. If you notice that you need the physical coordinates again, we can also work with them directly instead of having to compute them based on grid parameters :) e.g. hard code the shift parameters for the 2x2 case first, and then we already know where we'll get this information in the metadata afterwards

@tcompa
Copy link
Collaborator

tcompa commented Jun 20, 2022

We are testing a custom version of reader.py, where each FOV is assigned to a single Node() instance (in the Well class).

Pros: it automatically reads metadata, nice! If we use the ones by Joel (replicated for the four FOVs and adjusted to create a grid), we obtain a figure where the 2x2 grid exists even if we completely removed its definition reader.py -- see figure below. Note that we just chose the displacements to produce a 2x2 grid, without checking the correct order.. also note that the "scale" transformation is implemented, and a single FOV is 351x416 in physical units (as we see e.g. by moving the mouse pointer over the image).

Cons: channels from different FOVs are not considered independently, which cannot be the way to go.

Still working on it. It's possible that the independent-channel issue cannot be solved when working with independent FOVs, and then we'll switch back to other previous ideas.

Screenshot from 2022-06-20 11-52-52

@jluethi
Copy link
Collaborator Author

jluethi commented Jun 20, 2022

To briefly summarize our discussion:
This is promising. The open question is whether one can use Nodes without having them go to different layers in napari, because napari layers don't scale well and we'd quickly surpass 100 layers for a single well if every fov is its separate layer.
Thus, if this can be done while mapping them all into the same layer, that would be very promising

@jluethi
Copy link
Collaborator Author

jluethi commented Jun 20, 2022

@tcompa Thinking about this a bit more now and reading a bit of documentation, I think it's unlikely to work to put multiple fovs with different translation parameters in the same layer. In napari's layer logic, each layer has one translate property, thus moving images via translation metadata is something that happens at the layer level.
That means my understanding is that if we want to use translate, it needs to be per layer. And that means we'd need to create a big dask array for the well and fill it (or an even bigger dask array for the plate).
Starting to understand why ome-zarr just went with concatenating images (much easier) and only using a single site per plate (predictable sizes)...

@tcompa
Copy link
Collaborator

tcompa commented Jun 21, 2022

Let's keep two things separate: pyramid loading at the single-well level (be it for one or many FOVs) and lazy loading of multiple FOVs.

The pyramid loading within the Well class seems quite simple, and here is a version of ome_zarr's reader.py that does it:

class Well(Spec):
    @staticmethod
    def matches(zarr: ZarrLocation) -> bool:
        return bool("well" in zarr.root_attrs)

    def __init__(self, node: Node) -> None:
        super().__init__(node)
        self.well_data = self.lookup("well", {})
        LOGGER.info("well_data: %s", self.well_data)

        image_paths = [image["path"] for image in self.well_data.get("images")]
        field_count = len(image_paths)
        self.column_count = math.ceil(math.sqrt(field_count))
        self.row_count = math.ceil(field_count / self.column_count)

        # Use first Field for rendering settings, shape etc.
        image_zarr = self.zarr.create(image_paths[0])
        image_node = Node(image_zarr, node)
        self.x_index = len(image_node.metadata["axes"]) - 1
        self.y_index = len(image_node.metadata["axes"]) - 2
        level = 0  # load full resolution image
        self.numpy_type = image_node.data[level].dtype
        self.img_metadata = image_node.metadata
        self.img_pyramid_shapes = [d.shape for d in image_node.data]

        # Create a dask pyramid for the well
        pyramid = []
        for level, tile_shape in enumerate(self.img_pyramid_shapes):
            lazy_well = self.get_lazy_well(level, tile_shape)
            pyramid.append(lazy_well)

        # Set the node.data to be pyramid view of the plate
        node.data = pyramid
        node.metadata = image_node.metadata

    def get_lazy_well(self, level: int, tile_shape: tuple) -> da.Array:

        # stitch images into a grid
        def get_field(tile_name: str, level: int) -> np.ndarray:
            """tile_name is 'row,col'"""
            row, col = (int(n) for n in tile_name.split(","))
            field_index = (self.column_count * row) + col
            path = f"{field_index}/{level}"
            LOGGER.debug(f"LOADING tile... {path}")
            try:
                data = self.zarr.load(path)
            except ValueError:
                LOGGER.error(f"Failed to load {path}")
                data = np.zeros(self.img_pyramid_shapes[level], dtype=self.numpy_type)
            return data


        lazy_reader = delayed(get_field)

        lazy_rows = []
        for row in range(self.row_count):
            lazy_row: List[da.Array] = []
            for col in range(self.column_count):
                tile_name = f"{row},{col}"
                LOGGER.debug(f"creating lazy_reader. row:{row} col:{col}")
                lazy_tile = da.from_delayed(
                    lazy_reader(tile_name, level),
                    shape=tile_shape,
                    dtype=self.numpy_type,
                )
                lazy_row.append(lazy_tile)
            lazy_rows.append(da.concatenate(lazy_row, axis=self.x_index))
        return da.concatenate(lazy_rows, axis=self.y_index)

It is nothing fancy, but it works.

When loading a well made of 5x5 sites (all merged into a single FOV), the improved responsiveness is quite clear.

@tcompa
Copy link
Collaborator

tcompa commented Jun 21, 2022

Some more info about this pyramid-loading well example:

Part of the logs read

14:43:33 INFO datasets [{'path': '0'}, {'path': '1'}, {'path': '2'}, {'path': '3'}, {'path': '4'}]
14:43:33 INFO resolution: 0
14:43:33 INFO  - shape ('c', 'z', 'y', 'x') = (3, 1, 10800, 12800)
14:43:33 INFO  - chunks =  ['1', '1', '2160', '2560']
14:43:33 INFO  - dtype = uint16
14:43:33 INFO resolution: 1
14:43:33 INFO  - shape ('c', 'z', 'y', 'x') = (3, 1, 5400, 6400)
14:43:33 INFO  - chunks =  ['1', '1', '2700', '3200']
14:43:33 INFO  - dtype = uint16
14:43:33 INFO resolution: 2
14:43:33 INFO  - shape ('c', 'z', 'y', 'x') = (3, 1, 2700, 3200)
14:43:33 INFO  - chunks =  ['1', '1', '2160 (+ 540)', '2560 (+ 640)']
14:43:33 INFO  - dtype = uint16
14:43:33 INFO resolution: 3
14:43:33 INFO  - shape ('c', 'z', 'y', 'x') = (3, 1, 1350, 1600)
14:43:33 INFO  - chunks =  ['1', '1', '1350', '1600']
14:43:33 INFO  - dtype = uint16
14:43:33 INFO resolution: 4
14:43:33 INFO  - shape ('c', 'z', 'y', 'x') = (3, 1, 675, 800)
14:43:33 INFO  - chunks =  ['1', '1', '675', '800']
14:43:33 INFO  - dtype = uint16

and indeed when repeatedly zooming in/out I see that napari loads levels 0, 1 and 2.

@jluethi
Copy link
Collaborator Author

jluethi commented Jun 23, 2022

I've been testing the well lazy-loading implementation from @tcompa on the 9x8 test case. Here are a few observations:

  1. The two approaches, mapblocks & stack load aren't much different in performance.
  2. Loading Multi-FOV 9x8 well as a well with mapblocks via VPN (with NAPARI_ASYNC=1), it takes 3.6-4.2s until the well is loaded (according to the DEBUG message at the end of loading, is about accurate with manual measurements)
  3. When loading single-FOV 9x8 well as a plate, the same data is loaded in about 1.5s.
  4. When loading the single-FOV 9x8 well as a well using the mapblocks implementation, it loads in 0.5s

=> The increased amounts of sites and low-res pyramids distributed over more small files in the multi-FOV cases has a strong overhead performance penalty for initial loading. Even for a single well, it's almost 10x slower.

Browsing the image, the difference is less pronounced, but still there. Zooming into the top left site is a bit laggier for the Multi-FOV case. It takes about 11s on my setup to load it at full res. While on the single-FOV setup (when also loaded as a single well), it feels a bit smoother, loads full res of top left side in about 4s

=> The performance penalty is still there for browsing, but not quite as pronounced anymore.

Overall, those performance hits aren't great, but we could probably live with them. The question is how they scale with size of the experiment. I'm currently processing the 23 well test case, let's check whether there's an easy way to generalize the multi-FOV framework to the plate to see performance there.
Let's also think whether there are obvious performance hits we're currently taking that we could avoid

@jluethi
Copy link
Collaborator Author

jluethi commented Jun 23, 2022

Reading through the dask documentation, this section has some very interesting information:
Loading Chunked Data

Modern NDArray storage formats like HDF5, NetCDF, TIFF, and Zarr, allow arrays to be stored in chunks or tiles so that blocks of data can be pulled out efficiently without having to seek through a linear data stream. It is best to align the chunks of your Dask array with the chunks of your underlying data store.

However, data stores often chunk more finely than is ideal for Dask array, so it is common to choose a chunking that is a multiple of your storage chunk size, otherwise you might incur high overhead.

For example, if you are loading a data store that is chunked in blocks of (100, 100), then you might choose a chunking more like (1000, 2000) that is larger, but still evenly divisible by (100, 100). Data storage technologies will be able to tell you how their data is chunked.

I wonder whether we could use this as well, e.g. depending on the pyramid level, we load multiple small "on-disk" chunks into a larger dask array chunk. Given they recommend doing this here, maybe this would deal with some of the performance overhead?
When I was profiling the ome zarr py library, the initial creation of the dask array is very fast, the part where it slows down is actually performing the reads from disk. Could be that this will remain slow because of IO overhead, but if it's more of a dask overhead at the moment, this would be promising

There is additional interesting discussion on choosing good chunk sizes here (and how the dask array chunk size should be larger, in the multiple to 100s of MB (not the kbs we have for tiny pyramid chunks): https://blog.dask.org/2021/11/02/choosing-dask-chunk-sizes

@jluethi
Copy link
Collaborator Author

jluethi commented Jun 23, 2022

The simplest approach to change dask chunking didn't help:
I assigned the dask array as before using map_blocks, but then called well.rechunk(new_chunks). The performance for that is basically the same as the mapblocks loading. Thus, either the IO is the limiting factor (in which case there isn't much we can do) or the dask overhead is created during the initial call of get_field_for_map_blocks (in which case we'd need to refactor this part to directly load larger chunks made of of multiple on-disk chunks).

Maybe we can test this in a simpler dummy case and figure out whether the performance is limited by the size of dask chunks or by IO of many files.

@tcompa
Copy link
Collaborator

tcompa commented Jun 24, 2022

Here's a quick test for loading a single dask array from either a single-FOV or multi-FOV zarr file. I tried to isolate it as much as possible from ome-zarr's reader.py, but I kept their ZarrLocation class to mimic what would happen there.

Notes on the test script:

  1. By setting Debug = True, we can load the numpy arrays directly (without any dask delayed feature), which is useful to verify we are then providing the right shapes to from_delayed.
  2. I also added a final rechunking, so that the resulting arrays in the single/multi-FOV should be exactly the same.
  3. I added a da.mean calculation at the end of each case; this is just a sanity check that these two arrays are the same, and also to make sure that we are actually loading all their elements.

Test script (for a single-well and 9x8-sites case):

Click to expand!
import time
import dask
import dask.array as da
from ome_zarr.io import ZarrLocation


def get_field(zarr, field_index: int, level: int):
  path = f"{field_index}/{level}"
  data = zarr.load(path)
  return data
lazy_get_field = dask.delayed(get_field)


# Pyramid shapes (coarsening factor = 2)
FOV_shapes = [
      (3, 1, 2160, 2560),
      (3, 1, 1080, 1280),
      (3, 1, 540, 640),
      (3, 1, 270, 320),
      (3, 1, 135, 160),
      ]

size = "9x8"
kinds = ["singlefov", "multifov"]

Debug = False

for level in range(4):
  for kind in kinds:
      print(f"{size}, level={level}, {kind}")
      if kind == "singlefov":
          rows, columns = [int(x) for x in size.split("x")]
          tile_shapes = [(x[0], x[1], x[2] * rows, x[3] * columns) for x in FOV_shapes]
          rows, columns = 1, 1
      elif kind == "multifov":
          rows, columns = [int(x) for x in size.split("x")]
          tile_shapes = FOV_shapes[:]
      well = f"/home/tommaso/Desktop/20200812-Cardio-1well-{size}-{kind}-mip.zarr/B/03"
      zarr = ZarrLocation(well)

      t_0 = time.perf_counter()
      lazy_rows = []
      for row in range(rows):
          lazy_row = []
          for col in range(columns):
              field_index = (rows * col) + row

              if Debug:
                  # Load np arrays, to verify shapes are correct
                  tile = get_field(zarr, field_index, level)
                  assert tile.shape == tile_shapes[level]
              else:
                  # Lazily load dask arrays
                  tile = da.from_delayed(lazy_get_field(zarr, field_index, level), shape=tile_shapes[level], dtype="uint16")
              lazy_row.append(tile)
          lazy_rows.append(da.concatenate(lazy_row, axis=-1))
      x = da.concatenate(lazy_rows, axis=-2)
      y = x.rechunk(chunks={2:2160, 3:2560})
      z = y.compute()
      mean_z = da.mean(z).compute()
      t_1 = time.perf_counter()
      print(f"  Final shape: {y.shape}")
      print(f"  Final chunks: {y.chunks}")
      print(f"  Average(array)={mean_z}")
      print(f"  Elapsed: {t_1 - t_0:.4f} s")
      print()
  print("-----")
  print()

The summary of the output is:

level   | single-FOV      | multi-FOV
0       |    0.46         |    0.42
1       |    0.11         |    0.21
2       |    0.04         |    0.17
3       |    0.02         |    0.16

This supports our guess that loading the low-res pyramid levels is what kills the multi-FOV visualization performance.

As a reference, here is the full output:

Click to expand!
9x8, level=0, singlefov
version mismatch: detected:FormatV03, requested:FormatV04
Final shape: (3, 1, 19440, 20480)
Final chunks: ((3,), (1,), (2160, 2160, 2160, 2160, 2160, 2160, 2160, 2160, 2160), (2560, 2560, 2560, 2560, 2560, 2560, 2560, 2560))
Average(array)=212.38725897392618
Elapsed: 0.4591 s

9x8, level=0, multifov
version mismatch: detected:FormatV03, requested:FormatV04
Final shape: (3, 1, 19440, 20480)
Final chunks: ((3,), (1,), (2160, 2160, 2160, 2160, 2160, 2160, 2160, 2160, 2160), (2560, 2560, 2560, 2560, 2560, 2560, 2560, 2560))
Average(array)=212.38725897392618
Elapsed: 0.4241 s

-----

9x8, level=1, singlefov
version mismatch: detected:FormatV03, requested:FormatV04
Final shape: (3, 1, 9720, 10240)
Final chunks: ((3,), (1,), (2160, 2160, 2160, 2160, 1080), (2560, 2560, 2560, 2560))
Average(array)=198.68136411313657
Elapsed: 0.1108 s

9x8, level=1, multifov
version mismatch: detected:FormatV03, requested:FormatV04
Final shape: (3, 1, 9720, 10240)
Final chunks: ((3,), (1,), (2160, 2160, 2160, 2160, 1080), (2560, 2560, 2560, 2560))
Average(array)=198.68136411313657
Elapsed: 0.2115 s

-----

9x8, level=2, singlefov
version mismatch: detected:FormatV03, requested:FormatV04
Final shape: (3, 1, 4860, 5120)
Final chunks: ((3,), (1,), (2160, 2160, 540), (2560, 2560))
Average(array)=184.74524079164524
Elapsed: 0.0364 s

9x8, level=2, multifov
version mismatch: detected:FormatV03, requested:FormatV04
Final shape: (3, 1, 4860, 5120)
Final chunks: ((3,), (1,), (2160, 2160, 540), (2560, 2560))
Average(array)=184.74524079164524
Elapsed: 0.1706 s

-----

9x8, level=3, singlefov
version mismatch: detected:FormatV03, requested:FormatV04
Final shape: (3, 1, 2430, 2560)
Final chunks: ((3,), (1,), (2160, 270), (2560,))
Average(array)=169.0402980324074
Elapsed: 0.0159 s

9x8, level=3, multifov
version mismatch: detected:FormatV03, requested:FormatV04
Final shape: (3, 1, 2430, 2560)
Final chunks: ((3,), (1,), (2160, 270), (2560,))
Average(array)=169.0402980324074
Elapsed: 0.1628 s

-----

@tcompa
Copy link
Collaborator

tcompa commented Jun 24, 2022

More high-level comment on those results: this doesn't look good for the multi-FOV route.

Let's keep in mind that we expect two lazy features:

  1. Only loading pyramid levels at the right resolution.
  2. Only loading portions of the image which are shown, or maybe their closest neighbors.

I think that in the case of low-resolution levels, the combination of these two features is not working (in the multi-FOV case). The reason should be that when working at low-res (e.g. with the first napari view, before zooming in) we typically have several FOVs all showing, since each one of them is quite small (e.g. a few hundreds of pixels in each one of X and Y directions, or even less if we use a more aggressive coarsening) and we are looking at the entire well (or a large fraction of it). This means that any small movement in this context hits the bottleneck of loading many small files (in the multi-FOV case).

The single-FOV case doesn't have this issue, because in that case we only have one or very few files for low-res levels, which can be loaded fast (see results in previous comment).

Thus it seems that loading all those low-res or mid-res levels (which we have to do quite often, unless we want to stick with the highest-res level only.. which makes no sense) could be the bottleneck of napari performances for multi-FOV data.

@tcompa
Copy link
Collaborator

tcompa commented Jun 24, 2022

Concerning a slightly independent feature we need (that is, the pyramid loading within the Well class), I've put my simple solution into a separate fork of ome-zarr-py (tcompa/ome-zarr-py@89237e9) and we can later decide what to do: opening a PR for them or updating fractal-napari-plugins-collection (which I think is currently obsolete).

Probably a PR with this isolated change (and no discussion of multi-FOV) is the right way to go?

@jluethi
Copy link
Collaborator Author

jluethi commented Jun 24, 2022

Great, very useful to have @tcompa!
A PR of this seems like a good thing to me, but let's quickly evaluate where we stand with multi-FOV on Monday. Would also be a good point to discuss the scaling issues of multi-FOV, if they really are this hard wall we hit. Because then, this way would be lazy loading of wells, but it wouldn't work well for plates and I'm curious to hear the OME-devs opinion then and how useful fov saving in hcs format really is if we can't load it with good performance.
I want to run one more set of tests on that before though :)

@jluethi
Copy link
Collaborator Author

jluethi commented Jun 24, 2022

A few thoughts:

  1. The loading clearly is slower when loading many FOVs than a single FOV for the low res pyramid levels (and about equal to the high resolution levels, as expected). That is problematic.
  1. I tried to reproduce that locally by loading the top pyramid level of the 9x8 well either from single FOV or multi FOV. Loading multi-FOV is much slower than single level (in my tests going over wifi & VPN: Single: Elapsed: 0.3788 s, Multi: Elapsed: 16.5783 s). That is without any dask combining, just directly saving to the numpy array at known positions.3.
  2. This magnitude of differences also comparable with the speed differences seen when loading in napari, thus file loading is not just significantly slower, but would explain the whole performance hit.
  3. This slowness would make zooming around the plate at low res pyramid levels almost impossible, so very critical.
  4. One thing that was slightly confusing: Are we saving the chunks nicely? When coarsening 2x, a chunk at pyramid level 4 is 26 KB for the single FOV (according to the Finder), but this chunk contains 2.97 MiB of data according to zarr loading in python, while a single chunk at pyramid level 4 is already 22 KB (containing 42.19 KB according to zarr loading in python). Some level 4 pyramid chunks go to e.g. 623 kB depending on the pyramid level, thus maybe that's a compression thing? The thing that confuses me is that chunks can be much smaller (e.g. 848 bytes on disk for level 4 of 3x coarsening)6.
    => Are we sure chunks are saved correctly for multi-fov case?

But if those chunk sizes remain like this, I don't think multi-fov approaches are feasible for something we want to be able to browse on a plate level...

Code here!
# Load the pyramid level from a single file for 1 channel
t_0 = time.perf_counter()
shape = (1215, 1280)
data = np.zeros(shape, dtype='uint16')
C_index = 0
Z_index = 0
base_path = Path('/Users/joel/shares/dataShareJoel/jluethi/Fractal/Temporary_data_UZH_1_well_9x8_sites_2x/20200812-CardiomyocyteDifferentiation14-Cycle1.zarr/B/03')
data3 = get_field(ZarrLocation(base_path), field_index = 0, level=4)[C_index, Z_index, :, :]
data[:, :] = data3
t_1 = time.perf_counter()
print(f"  Elapsed: {t_1 - t_0:.4f} s")

# Load the pyramid level from 72 files
t_0 = time.perf_counter()
shape = (1215, 1280)
step_size = (135, 160)
data = np.zeros(shape, dtype='uint16')
base_path = Path('/Users/joel/shares/dataShareJoel/jluethi/Fractal/Temporary_data_UZH_1_well_9x8_sites_multifov_2x/20200812-CardiomyocyteDifferentiation14-Cycle1.zarr/B/03')
for site in range(72):
    #curr_data = get_field(ZarrLocation(base_path), field_index = 0, level=4)[C_index, Z_index, :, :]
    row = int(site / 8)
    col = site % 8
    data[row*step_size[0]:(row+1)*step_size[0], col * step_size[1]: (col+1) * step_size[1]] = get_field(ZarrLocation(base_path), field_index = site, level=4)[C_index, Z_index, :, :]

t_1 = time.perf_counter()
print(f"  Elapsed: {t_1 - t_0:.4f} s")

@tcompa
Copy link
Collaborator

tcompa commented Jun 27, 2022

One thing that was slightly confusing: Are we saving the chunks nicely? When coarsening 2x, a chunk at pyramid level 4 is 26 KB for the single FOV (according to the Finder), but this chunk contains 2.97 MiB of data according to zarr loading in python, while a single chunk at pyramid level 4 is already 22 KB (containing 42.19 KB according to zarr loading in python). Some level 4 pyramid chunks go to e.g. 623 kB depending on the pyramid level, thus maybe that's a compression thing? The thing that confuses me is that chunks can be much smaller (e.g. 848 bytes on disk for level 4 of 3x coarsening)6.
=> Are we sure chunks are saved correctly for multi-fov case?

I am a bit confused by this remark, could you please clarify? (or let's discuss it live, if it's easier)

You say that a chunk at level 4 takes 26 kB on disk and has 3 MB of data, while the same chunk at level 4 also takes 22 kB and contains 42 kB of data. What changed between the two cases?

And which level-4 chunks reach 623 kB?

Concerning the "848 B vs 22 kB", this seems a reasonable ratio when changing the coarsening factor from 2 to 3:

22000/850 ~ 26
9**4 / 4**4 ~ 26

In any case: it's possible that we have something wrong in how we save things, let's check it.

@tcompa
Copy link
Collaborator

tcompa commented Jun 27, 2022

Quick check on my side.

I look at a single well with 9x8 sites, and I build 4 pyramid levels (0, 1, 2, 3) with XY coarsening factor 2.

These are the file sizes on disk for each chunk:

Level File size (single-FOV) File size (multi-FOV, looking at a single FOV)
0 6 M 6 M
1 7 M 1.7 M
2 7 M 0.4 M
3 6 M 0.1 M

And here is the total number of files per level, in the two cases:

Level Number of files (single-FOV) Number of files (multi-FOV, across all FOVs)
0 216 216
1 48 216
2 12 216
3 6 216

At the moment I don't see anything unexpected in these numbers (the fact that multi-FOV scheme produces a lot of small files is suboptimal but consistent).

@jluethi
Copy link
Collaborator Author

jluethi commented Jun 27, 2022

@tcompa Great overview. What is the total size per layer for multi-FOV? I think single FOV sizes can vary quite a bit, would be interesting if the total is ~10% higher, 2x higher or 10x higher...

@tcompa
Copy link
Collaborator

tcompa commented Jun 27, 2022

Total on-disk size per level:

Level Total size (single-FOV) Total size (multi-FOV)
0 1.27 GB 1.3 GB
1 322 MB 332 MB
2 82 MB 87 MB
3 21 MB 25 MB

And for this example FOVs are quite homogeneous, with folders fluctuating from 22 to 25 M (counting all levels at the same time).

@tcompa
Copy link
Collaborator

tcompa commented Jun 27, 2022

WARNING: All my tests in this issue are on MIP-projected zarr files, but the size ratios seem to be similar in the original files.
(tested live with Joel)

@tcompa
Copy link
Collaborator

tcompa commented Jun 27, 2022

PR on ome-zar-py, with pyramid loading for Well: ome/ome-zarr-py#209

@jluethi
Copy link
Collaborator Author

jluethi commented Jun 27, 2022

With the PR above, we can close this issue. It's possible to lazy-load multi-FOV wells now, but this approach does not scale to plates (see here: https://github.com/fractal-analytics-platform/mwe_fractal/issues/74)

@tcompa
Copy link
Collaborator

tcompa commented Jul 4, 2022

Here's a quick test for loading a single dask array from either a single-FOV or multi-FOV zarr file. I tried to isolate it as much as possible from ome-zarr's reader.py, but I kept their ZarrLocation class to mimic what would happen there.

Notes on the test script:

1. By setting `Debug = True`, we can load the numpy arrays directly (without any dask delayed feature), which is useful to verify we are then providing the right shapes to `from_delayed`.

2. I also added a final `rechunking`, so that the resulting arrays in the single/multi-FOV should be exactly the same.

3. I added a `da.mean` calculation at the end of each case; this is just a sanity check that these two arrays are the same, and also to make sure that we are actually loading all their elements.

Test script (for a single-well and 9x8-sites case):
Click to expand!

The summary of the output is:

level   | single-FOV      | multi-FOV
0       |    0.46         |    0.42
1       |    0.11         |    0.21
2       |    0.04         |    0.17
3       |    0.02         |    0.16

This supports our guess that loading the low-res pyramid levels is what kills the multi-FOV visualization performance.

As a reference, here is the full output:
Click to expand!

This should be retested, after #92.

@tcompa
Copy link
Collaborator

tcompa commented Jul 5, 2022

I confirm that recent tests of timing on my side (#66 (comment)) were performed on MIP zarr files, which (somewhat by accident?) had the correct dtypes (uint16) at all pyramid levels. So #92 does not apply to those numbers.

I still have to verify whether file sizes were checked for data with the correct/wrong dtype.

jacopo-exact pushed a commit that referenced this issue Nov 16, 2022
First attempt towards pelkmanslab parsl/slurm config - ref #34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
High Priority Current Priorities & Blocking Issues
Projects
None yet
Development

No branches or pull requests

3 participants