Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better communicate expectations of data order/contiguousness #5975

Open
wants to merge 3 commits into
base: branch-24.08
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion python/cuml/cuml/internals/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
import pickle

from cuml.internals.global_settings import GlobalSettings
from cuml.internals.logger import debug
from cuml.internals.logger import debug, info
from cuml.internals.mem_type import MemoryType, MemoryTypeError
from cuml.internals.memory_utils import class_with_cupy_rmm, with_cupy_rmm
from cuml.internals.safe_imports import (
Expand Down Expand Up @@ -1184,10 +1184,20 @@ def from_input(
not fail_on_order and order != arr.order and order != "K"
) or make_copy:
if make_copy:
info(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other places where a copy can happen, specifically lines 1072, potentially 1104 (unless the dataframe is fully contiguous) and potentialy 1160, not sure if we need all of them but it would be good to cover most of the cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding line 1072, the copy happens when the function is called with deepcopy=True, which means that a copy was required anyway. The user cannot do much about it. Do we want to log this as well?

Regarding line 1104, can the data in the dataframe be non-contiguous? Not very familiar into the ways to check for the internals of cuDF, but it might be something we want to check too. However again, the user cannot do much about this to avoid a copy.

Regarding line 1160, unless mistaken this section of code should be in charge of data type conversions. Do we want to log this too? But, then to be consistent we would have to double check that every data type conversions is being caught. Given the time left for release, it might be a bit short.

"Got an uncontiguous array."
" Converting data; this will result in additional memory"
" utilization."
)
data = arr.mem_type.xpy.array(
arr.to_output("array"), order=order
)
else:
info(
f"Expected {order} major order but got {arr.order}."
" Converting data; this will result in additional memory"
" utilization."
)
data = arr.mem_type.xpy.asarray(
arr.to_output("array"), order=order
)
Expand Down
Loading