[Feature]: Make `get_data_in_units` not load entire array into memory #1881

rly · 2024-04-01T07:51:51Z

What would you like to see added to PyNWB?

As mentioned in #1880, get_data_in_units() loads the entire dataset into memory. For large datasets, that is impractical and will silently blow up a user's RAM.

Is your feature request related to a problem?

No response

What solution would you like?

What do you think about supporting the syntax timeseries.data_in_units[1000:2000, 5:10], i.e., adding a simple wrapper class WrappedArray that defines __getitem__ and delegates the slice argument to the underlying list / numpy array / h5py.Dataset / zarr.Array object.

We can reuse this wrapper class elsewhere to help with addressing slicing differences between different array backends (#1702) and improving performance in h5py slicing (h5py/h5py#293). As mentioned in #1702, full unification of these libraries is outside the scope of this project, but I think providing this wrapper class with its few enhancements would only help.

If we do this, the wrapper class would probably live in HDMF.

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you checked the Contributing document?
Have you ensured this change was not already requested?

The text was updated successfully, but these errors were encountered:

h-mayorquin · 2024-04-01T23:47:54Z

Interesting idea. I am personally curios about how the implementation of WrappedArray would look like.

Another alternative is to pass a slice as an argument to get_data_in_units but that way the expresiveness of getitem that most people know from numpy is lost.

rly added priority: low alternative solution already working and/or relevant to only specific user(s) and removed priority: medium non-critical problem and/or affecting only a small set of NWB users labels Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Make `get_data_in_units` not load entire array into memory #1881

[Feature]: Make `get_data_in_units` not load entire array into memory #1881

rly commented Apr 1, 2024

h-mayorquin commented Apr 1, 2024 •

edited

Loading

[Feature]: Make get_data_in_units not load entire array into memory #1881

[Feature]: Make get_data_in_units not load entire array into memory #1881

Comments

rly commented Apr 1, 2024

What would you like to see added to PyNWB?

Is your feature request related to a problem?

What solution would you like?

Do you have any interest in helping implement the feature?

Code of Conduct

h-mayorquin commented Apr 1, 2024 • edited Loading

[Feature]: Make `get_data_in_units` not load entire array into memory #1881

[Feature]: Make `get_data_in_units` not load entire array into memory #1881

h-mayorquin commented Apr 1, 2024 •

edited

Loading