[Feature]: Make get_data_in_units
not load entire array into memory
#1881
Labels
category: proposal
proposed enhancements or new features
priority: low
alternative solution already working and/or relevant to only specific user(s)
What would you like to see added to PyNWB?
As mentioned in #1880,
get_data_in_units()
loads the entire dataset into memory. For large datasets, that is impractical and will silently blow up a user's RAM.Is your feature request related to a problem?
No response
What solution would you like?
What do you think about supporting the syntax
timeseries.data_in_units[1000:2000, 5:10]
, i.e., adding a simple wrapper classWrappedArray
that defines__getitem__
and delegates the slice argument to the underlying list / numpy array / h5py.Dataset / zarr.Array object.We can reuse this wrapper class elsewhere to help with addressing slicing differences between different array backends (#1702) and improving performance in h5py slicing (h5py/h5py#293). As mentioned in #1702, full unification of these libraries is outside the scope of this project, but I think providing this wrapper class with its few enhancements would only help.
If we do this, the wrapper class would probably live in HDMF.
Do you have any interest in helping implement the feature?
Yes.
Code of Conduct
The text was updated successfully, but these errors were encountered: