-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable resource naming in config #68
Enable resource naming in config #68
Conversation
Welcome @MondayCha! |
Hi,please add more description about this pr,and use |
Thanks for your contribution. I opened a issue #69 for this pr. |
@MondayCha Would you like to add a doc to guide how to configure and use it ? |
68ff6f8
to
140f5e6
Compare
/ok-to-test |
07d89d1
to
c613eca
Compare
Once this repo is updated and the ConfigMap is prepared, you can begin installing packages from it to deploy the `nvidia-device-plugin` helm chart. | ||
|
||
```shell | ||
helm upgrade -i nvdp nvdp/nvidia-device-plugin \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If on device plugin is installed, would helm upgrade
be successful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is referenced from the installation instructions in the official documentation.
If the user has previously installed an existing gpu device plugin in other ways, then helm upgrade may fail. In this case, it is recommended that the user uninstall and then reinstall it.
de7ced4
to
2ddccac
Compare
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
Signed-off-by: MondayCha <[email protected]>
2ddccac
to
86569cf
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: william-wang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
Motivation
Volcano v1.9.0 introduces Capacity scheduling capabilities, which makes it possible to configure different quotas for different types of GPU queues (important in production environments). For example:
However, the default Nvidia Device Plugin reports resources as
nvidia.com/gpu
, which does not support reporting different GPU models as shown in the example.To address this, we need to customize the device plugin.
Change Details
The NVIDIA community has already had discussions about this issue:
This PR is modified based on the above discussion.
Further Impact
GPU resource renaming will prevent the DCGM Exporter from obtaining pod-level GPU resource usage monitoring, since the DCGM Exporter must exactly match the resource name
nvidia.com/gpu
or those with a prefix ofnvidia.com/mig-
.