-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add DRA+CCC example #1569
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: John Belamaric <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @johnbelamaric and reviewers,
I'm currently reviewing this pull request and will post my detailed review in a few minutes. In the meantime, here's a quick summary to get everyone up to speed:
This pull request aims to add an example demonstrating the use of Dynamic Resource Allocation (DRA) with Custom Compute Classes (CCC) in Google Kubernetes Engine (GKE). The goal is to showcase how a single deployment can be served by different numbers and models of GPUs, providing flexibility in autoscaling and resource utilization.
Here's a breakdown of the changes:
- New files added:
autoscaling/custom-compute-classes/dra/README.md
: A comprehensive guide explaining the concept, setup, and usage of DRA and CCC for GPU workloads. This includes detailed instructions, code snippets, and explanations of the different components involved. Lines 1-256 were added to this file.autoscaling/custom-compute-classes/dra/ccc-inference-1x8x24.yaml
: YAML configuration for a custom compute class namedinference-1x8x24
, defining priorities for different node pools with varying GPU configurations. Lines 1-24 were added to this file.autoscaling/custom-compute-classes/dra/deployment.yaml
: YAML configuration for a Kubernetes Deployment that utilizes DRA to request all available GPUs on a node. Lines 1-68 were added to this file.autoscaling/custom-compute-classes/dra/prepare-gke-nodes-for-dra.yaml
: YAML configuration for a DaemonSet that prepares GKE nodes for use with the DRA driver. Lines 1-60 were added to this file.
- No files were modified or deleted.
The example uses a custom compute class to prioritize node pools with different GPU types (T4, L4, P4), allowing the deployment to adapt to available resources. The README.md
provides a step-by-step guide on setting up the environment, deploying the necessary components, and running the example.
I'll provide a more detailed review shortly, focusing on specific code sections and potential improvements. Let me know if you have any questions in the meantime.
And now, a little haiku to lighten the mood:
GPUs in the cloud,
DRA and CCC work well,
Code flows like a stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR adds an example of using Dynamic Resource Allocation (DRA) with Custom Compute Classes (CCC). This allows a single deployment to be served by varying numbers and models of GPUs. The documentation is comprehensive and provides clear instructions. A few minor improvements can enhance clarity and address potential issues.
I've referenced the Google style guides for some of the suggestions below, as no specific style guide was provided.
It's great that you've included a TODO comment. I've provided a suggestion for addressing it.
# TODO: verify gpu-driver-version=latest makes this unnecessary | ||
#kubectl label node --overwrite -l nvidia.com/gpu.present=true cloud.google.com/gke-gpu-driver-version- | ||
#kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/ubuntu/daemonset-preloaded.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment suggests verifying if gpu-driver-version=latest
makes the subsequent lines unnecessary. It's good practice to resolve TODOs before merging. According to the Nvidia GPU driver installation guide, using gpu-driver-version=latest
automatically installs the latest stable driver. Therefore, manually labeling nodes with nvidia.com/gpu.present=true
and applying the daemonset are redundant. I recommend removing these commented-out lines to avoid confusion.
--location "${LOCATION}" \ | ||
--node-version "${VERSION}" \ | ||
--machine-type "g2-standard-4" \ | ||
--accelerator "gpu-driver-version=latest,count=2" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The count=2
for accelerator in this command seems incorrect. The table in the README indicates g2-standard-4 has 1 L4 GPU. Should this be count=1
to match the table and the machine type's actual GPU configuration?
--accelerator "gpu-driver-version=latest,count=2" \ | |
--accelerator "gpu-driver-version=latest,count=1" |
```yaml | ||
apiVersion: cloud.google.com/v1 | ||
kind: ComputeClass | ||
metadata: | ||
name: inference-1x8x24 | ||
spec: | ||
priorities: | ||
- nodepools: [n1-standard-4-2xt4] | ||
- nodepools: [g2-standard-4] | ||
- nodepools: [n1-standard-4-4xp4] | ||
whenUnsatisfiable: DoNotScaleUp | ||
--- | ||
|
||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The YAML for the ComputeClass is duplicated within the README. This redundancy can lead to inconsistencies if one copy is updated and the other is not. Consider removing the YAML block from the README and referencing the ccc-inference-1x8x24.yaml
file instead. This promotes a single source of truth and improves maintainability. For example:
The ComputeClass definition can be found in [ccc-inference-1x8x24.yaml](ccc-inference-1x8x24.yaml).
```console | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The empty code block at the end seems unintentional. Consider removing it or adding the expected output of the previous command. Showing the expected output helps users verify their setup and understand the example better.
- [DRA driver update to 1.32](https://github.com/NVIDIA/k8s-dra-driver/pull/220) | ||
(in particular the updated driver container image in `install-dra-driver.sh`) | ||
- [DRA driver toleration of compute class](https://github.com/NVIDIA/k8s-dra-driver/pull/221) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good to see that you're referencing necessary updates from external PRs. However, directly linking to PRs can be problematic as PRs can change or even be closed. It's better to either:
- Wait for the PRs to merge and update the instructions accordingly.
- Cherry-pick or apply the changes from those PRs into your current branch, so the example works as expected.
- Provide alternative instructions or workarounds in case those PRs are not merged.
Signed-off-by: John Belamaric <[email protected]>
Signed-off-by: John Belamaric <[email protected]>
Signed-off-by: John Belamaric <[email protected]>
Description
This PR adds an example of using DRA with CCC to allow a single deployment to be served by varying numbers and models of GPU.
Tasks