Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] cuml v24.08 #6007

Merged
merged 44 commits into from
Aug 8, 2024
Merged

[RELEASE] cuml v24.08 #6007

merged 44 commits into from
Aug 8, 2024

Conversation

raydouglass
Copy link
Member

❄️ Code freeze for branch-24.08 and v24.08 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-24.08 until release (merging of this PR).

What is the purpose of this PR?

  • Update documentation
  • Allow testing for the new release
  • Enable a means to merge branch-24.08 into main for the release

raydouglass and others added 30 commits May 20, 2024 17:42
Forward-merge branch-24.06 into branch-24.08
Forward-merge branch-24.06 into branch-24.08
Fix conflict of forward-merge #5905 of branch-24.06 into branch-24.08
Forward-merge branch-24.06 into branch-24.08
Forward-merge branch-24.06 into branch-24.08
This failure was just a testing failure, expectint identical pointers of actual dataframes, as opposed to wrapped objects. 

Contributes to fixing #5876 

cc @betatim

Authors:
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Divye Gala (https://github.com/divyegala)

URL: #5885
…5882)

Error came from the fact that pandas and cudf convert to numpy by default with different order. 

Towards fixing #5876

Authors:
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Divye Gala (https://github.com/divyegala)
  - Tim Head (https://github.com/betatim)

URL: #5882
This PR removes text builds of the documentation, which we do not currently use for anything. Contributes to rapidsai/build-planning#71.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Jake Awe (https://github.com/AyodeAwe)

URL: #5921
Forward-merge branch-24.06 into branch-24.08
…earn change (#5925)

Nightly jobs are showing a failure from hypothesis having a strategy that calls sklearn make_regression with n_samples being zero, which is no longer supported. This PR fixes that.

Authors:
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #5925
… followup (#5928)

Contributes to rapidsai/build-planning#31
Contributes to rapidsai/dependency-file-generator#89

#5804 was one of the earlier `rapids-build-backend` PRs merged across RAPIDS. Since it was merged, we've made some small adjustments to the approach for `rapids-build-backend`. This catches `cuml` up with those changes:

* removes unused constants in `ci/build*` scripts
* uses `--file-key` instead of `--file_key` in `rapids-dependency-file-generator` calls
* uses `--prepend-channel` instead of `--prepend-channels` in `rapids-dependency-file-generator` calls
* ensures `ci/update-version.sh` preserves alpha specs

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)

URL: #5928
Fixed by passing on sample_weight to the .fit() method in fit_proba() method of SVC.

Authors:
  - Pablo Tanner (https://github.com/pablotanner)
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #5912
Treelite 4.2.1 contains the following improvements:

* Compatibility patch for latest RapidJSON (dmlc/treelite#567)
* Support for NumPy 2.0 (dmlc/treelite#562). Thanks @jameslamb
* Handle certain class of XGBoost models (dmlc/treelite#564)

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Ray Douglass (https://github.com/raydouglass)
  - James Lamb (https://github.com/jameslamb)

URL: #5908
…types (#5938)

PR fixes parameter sweeps of benchmarks when they have a different type, so for example: 

```
--cuml-param-sweep init=random,scalable-k-means++
```

Authors:
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Divye Gala (https://github.com/divyegala)

URL: #5938
Contributes to rapidsai/build-planning#80

Adds constraints to avoid pulling in CMake 3.30.0, for the reasons described in that issue.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #5956
…#5937)

Partial solution for #5936 

Issue was that concatenating when having a single array per worker was causing a memory copy (not sure if always, but often enough). This PR avoids the concatenation when a worker has a single partition of data.

This is coming from a behavior from CuPy, where some testing reveals that sometimes it creates an extra allocation when concatenating lists that are comprised of a single array:

```python
>>> import cupy as cp
>>> a = cp.random.rand(2000000, 250).astype(cp.float32) # Memory occupied: 5936MB
>>> b = [a]
>>> c = cp.concatenate(b) # Memory occupied: 5936 MB <- no memory copy
```

```python
>>> import cupy as cp
>>> a = cp.random.rand(1000000, 250) # Memory occupied: 2120 MB
>>> b = [a]
>>> c = cp.concatenate(b) # Memory occupied: 4028 MB <- memory copy was performed!
```

I'm not sure what are the exact rules that CuPy follows here, we could check, but in general avoiding the concatenate when we have a single partition is an easy fix that will not depend on the behavior outside of cuML's code. 

cc @tfeher @cjnolet

Authors:
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)
  - Tamas Bela Feher (https://github.com/tfeher)
  - Divye Gala (https://github.com/divyegala)

URL: #5937
With the deployment of rapids-build-backend, we need to make sure our dependencies have alpha specs.

Contributes to rapidsai/build-planning#31

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #5948
Usage of the CUDA math libraries is independent of the CUDA runtime. Make their static/shared status separately controllable.

Contributes to rapidsai/build-planning#35

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)

URL: #5959
Closes #3458

Add PCA embedding initialization to C++ layer and expose it in Python API.
```python

from cuml.manifold import TSNE

tsne = TSNE(
    ...
    init="pca" # ("random" or "pca")
)
```

Authors:
  - Anupam (https://github.com/aamijar)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Micka (https://github.com/lowener)

Approvers:
  - Micka (https://github.com/lowener)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #5897
This is a step towards adding support for dynamic linking with wheels (splitting the shared libraries out into their own wheels). That's being tracked in rapidsai/build-planning#33

This PR performs a necessary step of moving the cuml folder one level deeper, such that the python folder becomes a parent of multiple full-fledged projects, instead of having the python folder be the top level of one python project. This is split into this PR because this change affects so many files. It's easier to review the actual changes of supporting the split wheel when you don't have to also consider these moves.

This change also affects devcontainers, and there will need to be a change similar to rapidsai/devcontainers#283 for cuml.

Authors:
  - Mike Sarahan (https://github.com/msarahan)
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #5944
This PR updates the latest CUDA build/test version 12.2.2 to 12.5.1.

Contributes to rapidsai/build-planning#73

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #5963
Follow up to PR: #5963
Partially addresses issue: rapidsai/build-planning#73

Renames the `.devcontainer`s for CUDA 12.5

cc @KyleFromNVIDIA @jameslamb @trxcllnt (for awareness)

Authors:
  - https://github.com/jakirkham

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - Paul Taylor (https://github.com/trxcllnt)

URL: #5967
After updating everything to CUDA 12.5.1, use `[email protected]` again.

Contributes to rapidsai/build-planning#73

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - https://github.com/jakirkham

URL: #5970
Treelite 4.3.0 contains the following improvements:

* Support XGBoost 2.1.0, including the UBJSON format (dmlc/treelite#572, dmlc/treelite#578)
* [GTIL] Allow inferencing with FP32 input + FP64 model (dmlc/treelite#574). Related: triton-inference-server/fil_backend#391
* Prevent integer overflow for deep LightGBM trees by using DFS order (dmlc/treelite#570).
* Support building with latest RapidJSON (dmlc/treelite#567)

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #5968
dantegd and others added 12 commits July 24, 2024 21:52
Closes #5918 

Correctly gathers the labels of all workers together for the `labels_` attribute of the cuml.dask KMeans estimator.

Authors:
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Divye Gala (https://github.com/divyegala)

URL: #5931
Contributes to rapidsai/build-planning#31

In short, RAPIDS DLFW builds want to produce wheels with unsuffixed dependencies, e.g. `cudf` depending on `rmm`, not `rmm-cu12`.

This PR is part of a series across all of RAPIDS to try to support that type of build by setting up CUDA-suffixed and CUDA-unsuffixed dependency lists in `dependencies.yaml`.

For more details, see:
* rapidsai/build-planning#31 (comment)
* rapidsai/cudf#16183

## Notes for Reviewers

### Why target 24.08?

This is targeting 24.08 because:

1. it should be very low-risk
2. getting these changes into 24.08 prevents the need to carry around patches for every library in DLFW builds using RAPIDS 24.08

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #5974
This fixes a call to `raft::stats::mean()` deactivating the sample parameter in the pca code. 

CC @dantegd

Authors:
  - Malte Förster (https://github.com/mfoerste4)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #5980
Noticed while reviewing #5154.

Plus an extra (probably benign) typo-bug while I'm here.

cc @wphicks

Authors:
  - Lawrence Mitchell (https://github.com/wence-)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - William Hicks (https://github.com/wphicks)

URL: #5166
…5973)

Closes #5551

* Replace `np.float32` with `"float32"` so that we don't reference the `np` module. By the time `__dealloc__` method is called, modules may have already been unloaded.
* Improve the user experience by raising a helpful error when the user attempts to predict with an empty forest.

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #5973
Make `ci/run_cuml_dask_pytests.sh` environment-agnostic again. This script is run outside the RAPIDS CI environment, so it should not include calls to utilities only available in that environment. Follow-up to #5761.

Authors:
  - Paul Taylor (https://github.com/trxcllnt)
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #5950
The sparse PCA still densified `X` during the transform step. This defeats the purpose of a sparse PCA in a sense. However 
 ```
precomputed_mean_impact = self.mean_ @ self.components_.T
mean_impact = cp.ones((X.shape[0], 1)) @ precomputed_mean_impact.reshape(1, -1)
X_transformed = X.dot(self.components_.T) -mean_impact
```
is the same as
```
X = X - self.mean_
X_transformed = X.dot(self.components_.T)
```
The new implementation is faster (but mainly due to the fact that we don't have to rely on cupy's `to_array()`) and uses a lot less memory.

Authors:
  - Severin Dicks (https://github.com/Intron7)
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #5964
This applies some smaller NumPy 2 related fixes.  With (in progress) cupy 13.2 fixups, the single gpu test suite seems to be doing mostly fine.  There is a single test remaining:
```
test_simpl_set.py::test_simplicial_set_embedding
```
is failing with:
```
(Pdb) cp.asarray(cu_embedding)
array([[23067.518, 23067.518],
       [17334.559, 17334.559],
       [22713.598, 22713.598],
       ...,
       [23238.438, 23238.438],
       [25416.912, 25416.912],
       [19748.943, 19748.943]], dtype=float32)
```
being completely different from the reference:
```
array([[5.330462 , 4.3419437],
       [4.1822557, 5.6225405],
       [5.200859 , 4.530094 ],
       ...,
       [4.852359 , 5.0026293],
       [5.361374 , 4.1475334],
       [4.0259256, 5.7187223]], dtype=float32)
```
And I am not sure why that might be, I will prod it a bit more, but it may need someone who knows the methods to have a look.

One wrinkle is that hdbscan is not yet released for NumPy 2, but I guess that still required even though sklearn has a version?
(Probably, not a big issue, but my fixups scikit-learn-contrib/hdbscan#644 run into some issue even though it doesn't seem NumPy 2 related.)

xref: rapidsai/build-planning#38

Authors:
  - Sebastian Berg (https://github.com/seberg)
  - https://github.com/jakirkham
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #5954
This PR fixes the remaining tests and bugs of the encoders, and other utilities for cudf.pandas.

Authors:
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Divye Gala (https://github.com/divyegala)

URL: #5990
Closes #4477 

Adds the capability for all estimators to accept any dtype by converting the inputs if needed to. Currently, for most estimators, this means converting to float32.

Todo:

- [x] Add conversion to all methods
- [x] Discuss if default to float32 is the correct deafault
- [x] Discuss if an option to override that default is needed
- [x] Update docstring generator
- [x] Add tests

cc @beckernick @pentschev @isVoid @divyegala

Authors:
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Divye Gala (https://github.com/divyegala)

URL: #5888
1. Adds `build_algo=nn_descent` option to UMAP.

Now user can choose the knn graph build algorithm between `"brute_force_knn"` and `"nn_descent"`
Defaults to `"auto"`, in which case decides to run with brute force knn or nn descent depending on the given dataset size.

`"auto"` decides to run with `brute_force_knn` if either 1) data has <= 50K rows **OR** 2) data is sparse. Otherwise decides to run with `nn_descent`.

50K rows roughly chosen based on grid search below. (runtime in ms) - Discussed with Corey
<img width="1038" alt="Screenshot 2024-07-23 at 5 36 34 PM" src="https://github.com/user-attachments/assets/d2ffd7d6-8e94-4ddc-ba76-f301be9bea67">


```
X_embedded_nnd = cuUMAP(n_neighbors=16, build_algo="nn_descent").fit_transform(data)
score_nnd = cuml.metrics.trustworthiness(data, X_embedded_nnd)
```
2. Adds `data_on_host` option (defaults to False) when calling `fit()` or `fit_transform()`

Note that brute force knn cannot be used with data on host

### Running Benchmarks
<img width="962" alt="Screenshot 2024-07-23 at 5 41 19 PM" src="https://github.com/user-attachments/assets/7084b326-50bb-46a9-a012-6979278d871d">

Authors:
  - Jinsol Park (https://github.com/jinsolp)

Approvers:
  - Divye Gala (https://github.com/divyegala)

URL: #5910
@raydouglass raydouglass requested review from a team as code owners August 1, 2024 17:26
Copy link

copy-pr-bot bot commented Aug 1, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added conda conda issue Cython / Python Cython or Python issue CMake CUDA/C++ ci labels Aug 1, 2024
Closes #6008

---------

Co-authored-by: Dante Gama Dessavre <[email protected]>
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@raydouglass raydouglass merged commit 6777bc1 into main Aug 8, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci CMake conda conda issue CUDA/C++ Cython / Python Cython or Python issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.