Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[draft] Accept Cubed arrays instead of dask #249

Closed
wants to merge 8 commits into from

Conversation

TomNicholas
Copy link
Member

Very very rough changes to see what happens if you try to give flox a cubed.Array. Not many changes are required to get the cubed array inputs all the way to the reduction step, but I have not yet been able to run it, so there might be additional incompatibilities that turn up.

flox/core.py Outdated Show resolved Hide resolved
Comment on lines +1298 to +1300
from xarray.core.parallelcompat import get_chunked_array_type

chunkmanager = get_chunked_array_type(array)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously this approach would introduce a dependency on xarray, which presumably is not desirable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine just having dask_kwargs and cubed_kwargs instead of all this complexity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably should have just done that in xarray itself 😅

Comment on lines +1392 to +1394
# meta=array._meta,
align_arrays=False,
name=f"{name}-chunk-{token}",
# name=f"{name}-chunk-{token}",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_meta and name are dask-specific. Are they used for anything important here or just for labelling tasks in the graph?

Copy link
Collaborator

@dcherian dcherian Jun 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you don't provide meta, dask will try to figure it out and then break?

@TomNicholas TomNicholas marked this pull request as draft June 24, 2023 17:53
else:
combine = partial(_grouped_combine, engine=engine, sort=sort)
combine_name = "grouped-combine"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a test for these names that will need to be fixed.

flox/aggregations.py Outdated Show resolved Hide resolved
@@ -1889,7 +1896,7 @@ def groupby_reduce(
axis_ = np.core.numeric.normalize_axis_tuple(axis, array.ndim) # type: ignore
nax = len(axis_)

has_dask = is_duck_dask_array(array) or is_duck_dask_array(by_)
has_dask = is_chunked_array(array) or is_duck_dask_array(by_)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
has_dask = is_chunked_array(array) or is_duck_dask_array(by_)
is_chunked = is_chunked_array(array) or is_chunked_array(by_)

@dcherian dcherian closed this Jun 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using Flox with cubed
2 participants