-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update pyarrow-related dispatch logic in dask_cudf #14069
Update pyarrow-related dispatch logic in dask_cudf #14069
Conversation
Merge pull request rapidsai#5690 from ajschmidt8/phase2 [skip ci] Update master references for main branch
[RELEASE] Re-release v0.15 cudf [skip-ci]
[RELEASE] cudf v0.17
[RELEASE] cudf v0.18
[RELEASE] Release v0.18.1 cudf
[RELEASE] v0.18.2 `cudf` release [skip-ci]
[RELEASE] v0.19.1 cudf
[RELEASE] v0.19.2 cudf [skip-ci]
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
"Ignoring the following arguments to " | ||
f"`pyarrow_schema_dispatch`: {list(kwargs)}" | ||
) | ||
return obj.to_arrow(preserve_index=preserve_index).schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought (non-blocking): If dask asks for the schema separately from the table it would be good to figure out a way to provide a pyarrow schema for a cudf dataframe without necessarily copying the full frame to host. Ideally producing this metadata should be O(1) rather than (as it is now) O(size-of-frame).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good call. The specific logic in _get_pyarrow_schema_cudf
wasn't really part of this PR, but that's an excellent point. It definitely feels like we should be able to produce a schema without moving all the data!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using meta_nonempty(obj)
seems to do the trick here.
/ok to test |
…/cudf into preserve-index-dispatch
/ok to test |
/merge |
/ok to test |
Description
Updates
dask_cudf
dispatch logic to avoid breakage from dask/dask#10500.Also removes stale
try
/except
logic.Checklist