-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass .chunk/rechunk calls through for chunked arrays without ChunkManagers #9286
base: main
Are you sure you want to change the base?
Changes from 15 commits
f860ce5
77947c3
da6799a
34dadae
6e26a2d
78e8ff4
6feede3
621ea0c
c305b61
4f82d9d
a9bd35d
49e2b5f
a24489b
2cce6a0
3cadd53
556161d
0296f92
9c2ab5e
665727b
54adae7
5863859
b1024c9
e926748
def3131
8db961e
955c56e
58d1cc1
aa3afff
bd29f79
7ad5f07
d14b705
aac1566
a0d1e84
ce1df3a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,7 +41,7 @@ | |
_SupportsReal, | ||
) | ||
from xarray.namedarray.parallelcompat import guess_chunkmanager | ||
from xarray.namedarray.pycompat import to_numpy | ||
from xarray.namedarray.pycompat import is_chunked_array, to_numpy | ||
from xarray.namedarray.utils import ( | ||
either_dict_or_kwargs, | ||
infix_dims, | ||
|
@@ -819,11 +819,18 @@ def chunk( | |
if dim in chunks | ||
} | ||
|
||
chunkmanager = guess_chunkmanager(chunked_array_type) | ||
|
||
data_old = self._data | ||
if chunkmanager.is_chunked_array(data_old): | ||
data_chunked = chunkmanager.rechunk(data_old, chunks) # type: ignore[arg-type] | ||
if is_chunked_array(data_old): | ||
print(f"problematic chunks = {chunks}") | ||
# if is_dict_like(chunks) and chunks != {}: | ||
# chunks = tuple(chunks.get(n, s) for n, s in enumerate(data_old.shape)) # type: ignore[assignment] | ||
|
||
print(f"hopefully normalized chunks = {chunks}") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is really irritating - if I keep these lines commented out then my There are so many possible valid argument types for It would be much nicer for all possible The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This was actually mostly my fault for having a bug in that test, fixed by 0296f92.
But there is still some unnecessary complexity that would be nice to remove. The main reason why the weird (If we do just use that we should perhaps vendor it though - as cubed does already). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I managed to sort this all out, so now everything goes through Question is now do I:
I think we actually have to do (1), because we now have a codepath which will try to call There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dcherian I would appreciate your input on this vendoring question before I move ahead with it ^ There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Vendor it! Sorry for the delay. We can generalize if it's ever needed |
||
|
||
# Assume any chunked array supports .rechunk - if it doesn't then at least a clear AttributeError will be raised. | ||
# Deliberately don't go through the chunkmanager so as to support chunked array types that don't need all the special computation methods. | ||
# See GH issue #8733 | ||
data_chunked = data_old.rechunk(chunks) # type: ignore[union-attr] | ||
else: | ||
if not isinstance(data_old, ExplicitlyIndexed): | ||
ndata = data_old | ||
|
@@ -841,13 +848,13 @@ def chunk( | |
if is_dict_like(chunks): | ||
chunks = tuple(chunks.get(n, s) for n, s in enumerate(ndata.shape)) # type: ignore[assignment] | ||
|
||
chunkmanager = guess_chunkmanager(chunked_array_type) | ||
data_chunked = chunkmanager.from_array(ndata, chunks, **from_array_kwargs) # type: ignore[arg-type] | ||
|
||
return self._replace(data=data_chunked) | ||
|
||
def to_numpy(self) -> np.ndarray[Any, Any]: | ||
"""Coerces wrapped data to numpy and returns a numpy.ndarray""" | ||
# TODO an entrypoint so array libraries can choose coercion method? | ||
return to_numpy(self._data) | ||
|
||
def as_numpy(self) -> Self: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This
is_chunked_array(arr) and has_chunkmanager(arr)
pattern becomes necessary because we are now considering the possibility thatis_chunked_array(arr) == True
buthas_chunkmanager(arr) == False
, whereas previously these were assumed to always be consistent.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@headtr1ck I got a notification saying you commented saying
(But I can't find your comment.)
It's a good question though. I think there are some array types that don't define a
.chunks
where you might still want to use otherChunkManager
methods.In particular JAX is interesting - it has a top-level
pmap
function which applies a function over multiple axes of an array similar toapply_gufunc
. It distributes computation, but not over.chunks
(which JAX doesn't define), instead over a global variablejax.local_device_count
.This is why I think we should rename
ChunkManager
toComputeManager
.cc @alxmrs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came to the same conclusion, that's why I deleted the comment, sry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries! I prefer to leave all my half-baked thoughts in the open and double or triple-post 😅 If you were wondering it then other people will definitely have the same question!
I could leave this to a second PR, to isolate the breaking changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomNicholas FYI JAX does now support something a bit like chunking via sharding of
jax.Array
, there's a good summary here: https://jax.readthedocs.io/en/latest/notebooks/Distributed_arrays_and_automatic_parallelization.htmlIIUC this is now preferred over
pmap
.