-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduction of the raft::device_resources_snmg
type
#2487
base: branch-24.12
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## branch-24.12 #2487 +/- ##
================================================
- Coverage 87.17% 81.01% -6.17%
================================================
Files 25 17 -8
Lines 546 511 -35
================================================
- Hits 476 414 -62
- Misses 70 97 +27 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
raft::device_resources_snmg
type
|
||
~device_resources_snmg() | ||
{ | ||
#pragma omp parallel for // necessary to avoid hangs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add proper documentation to each of the methods in this class (even private methods). It's important for users of this API to know that they'll need to have multiple threads available for this, otherwise it'll hang.
|
||
namespace raft { | ||
|
||
class device_resources_snmg : public resources { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a device_resources
instance, I think it'll be better to extend device_resources
. That way you can override important methods like get_stream
and get_worker_stream
to return a main stream. In general, we should expect the device_resources_snmg
to behave the same as device_resources
when passed into an algorithm that only operates single gpu while being able to operate in snmg
mode when passed into an algorithm that supports single-node multi-gpu. I think it should store off and use the id of the GPU that is selected when creating the snmg handle by default when being used in single-gpu mode.
Introduces the
raft::device_resources_snmg
type to hold all resources required for the NCCL clique.Answers #2459Removed call to
raft::comms::build_comms_nccl_only
(#2465)