Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] CPU Experimental FIL throws CUDA Error when run with no available CUDA devices #6134

Open
wphicks opened this issue Nov 13, 2024 · 0 comments
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@wphicks
Copy link
Contributor

wphicks commented Nov 13, 2024

Describe the bug
When run with CUDA_VISIBLE_DEVICES='' and within a using_device_type('cpu') block, experimental FIL throws a CUDA Error on any predict call. This makes CPU FIL effectively unusable without a GPU because the cuML CPU package does not currently include CPU FIL. I believe this is a regression from when CPU FIL was introduced due to an upstream change, but I am not certain. It can likely be fixed by calling synchronize at line 300 of fil.pyx only if the current device type is GPU.

Error output below:

  File "/raid/whicks/proj_xgboost/taxi_example/benchmark.py", line 282, in <module>
    fil_model.optimize(batch_size=features.shape[0])
  File "/raid/whicks/miniforge3/envs/triton_benchmark/lib/python3.12/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "fil.pyx", line 1470, in cuml.experimental.fil.fil.ForestInference.optimize
  File "/raid/whicks/miniforge3/envs/triton_benchmark/lib/python3.12/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
    ret = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/raid/whicks/miniforge3/envs/triton_benchmark/lib/python3.12/site-packages/nvtx/nvtx.py", line 116, in inner
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "fil.pyx", line 1258, in cuml.experimental.fil.fil.ForestInference.predict
  File "fil.pyx", line 312, in cuml.experimental.fil.fil.ForestInference_impl.predict
  File "fil.pyx", line 300, in cuml.experimental.fil.fil.ForestInference_impl._predict
RuntimeError: CUDA error encountered at: file=/raid/whicks/miniforge3/envs/triton_benchmark/include/raft/core/interruptible.hpp line=303:```
@wphicks wphicks added ? - Needs Triage Need team to review and classify bug Something isn't working labels Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant