Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Avoid extra memory copy when using cp.concatenate in cuml.dask kmeans (…
…#5937) Partial solution for #5936 Issue was that concatenating when having a single array per worker was causing a memory copy (not sure if always, but often enough). This PR avoids the concatenation when a worker has a single partition of data. This is coming from a behavior from CuPy, where some testing reveals that sometimes it creates an extra allocation when concatenating lists that are comprised of a single array: ```python >>> import cupy as cp >>> a = cp.random.rand(2000000, 250).astype(cp.float32) # Memory occupied: 5936MB >>> b = [a] >>> c = cp.concatenate(b) # Memory occupied: 5936 MB <- no memory copy ``` ```python >>> import cupy as cp >>> a = cp.random.rand(1000000, 250) # Memory occupied: 2120 MB >>> b = [a] >>> c = cp.concatenate(b) # Memory occupied: 4028 MB <- memory copy was performed! ``` I'm not sure what are the exact rules that CuPy follows here, we could check, but in general avoiding the concatenate when we have a single partition is an easy fix that will not depend on the behavior outside of cuML's code. cc @tfeher @cjnolet Authors: - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - Artem M. Chirkin (https://github.com/achirkin) - Tamas Bela Feher (https://github.com/tfeher) - Divye Gala (https://github.com/divyegala) URL: #5937
- Loading branch information