Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add DRA+CCC example #1569

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

johnbelamaric
Copy link

Description

This PR adds an example of using DRA with CCC to allow a single deployment to be served by varying numbers and models of GPU.

Tasks

  • The contributing guide has been read and followed.
  • The samples added / modified have been fully tested.
  • Workflow files have been added / modified, if applicable.
  • Region tags have been properly added, if new samples.
  • Editable variables have been used, where applicable.
  • All dependencies are set to up-to-date versions, as applicable.
  • Merge this pull-request for me once it is approved.

Signed-off-by: John Belamaric <[email protected]>
@johnbelamaric johnbelamaric requested review from yoshi-approver and a team as code owners December 20, 2024 00:21
Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @johnbelamaric and reviewers,

I'm currently reviewing this pull request and will post my detailed review in a few minutes. In the meantime, here's a quick summary to get everyone up to speed:

This pull request aims to add an example demonstrating the use of Dynamic Resource Allocation (DRA) with Custom Compute Classes (CCC) in Google Kubernetes Engine (GKE). The goal is to showcase how a single deployment can be served by different numbers and models of GPUs, providing flexibility in autoscaling and resource utilization.

Here's a breakdown of the changes:

  • New files added:
    • autoscaling/custom-compute-classes/dra/README.md: A comprehensive guide explaining the concept, setup, and usage of DRA and CCC for GPU workloads. This includes detailed instructions, code snippets, and explanations of the different components involved. Lines 1-256 were added to this file.
    • autoscaling/custom-compute-classes/dra/ccc-inference-1x8x24.yaml: YAML configuration for a custom compute class named inference-1x8x24, defining priorities for different node pools with varying GPU configurations. Lines 1-24 were added to this file.
    • autoscaling/custom-compute-classes/dra/deployment.yaml: YAML configuration for a Kubernetes Deployment that utilizes DRA to request all available GPUs on a node. Lines 1-68 were added to this file.
    • autoscaling/custom-compute-classes/dra/prepare-gke-nodes-for-dra.yaml: YAML configuration for a DaemonSet that prepares GKE nodes for use with the DRA driver. Lines 1-60 were added to this file.
  • No files were modified or deleted.

The example uses a custom compute class to prioritize node pools with different GPU types (T4, L4, P4), allowing the deployment to adapt to available resources. The README.md provides a step-by-step guide on setting up the environment, deploying the necessary components, and running the example.

I'll provide a more detailed review shortly, focusing on specific code sections and potential improvements. Let me know if you have any questions in the meantime.


And now, a little haiku to lighten the mood:

GPUs in the cloud,
DRA and CCC work well,
Code flows like a stream.

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds an example of using Dynamic Resource Allocation (DRA) with Custom Compute Classes (CCC). This allows a single deployment to be served by varying numbers and models of GPUs. The documentation is comprehensive and provides clear instructions. A few minor improvements can enhance clarity and address potential issues.

I've referenced the Google style guides for some of the suggestions below, as no specific style guide was provided.

It's great that you've included a TODO comment. I've provided a suggestion for addressing it.

Comment on lines 95 to 97
# TODO: verify gpu-driver-version=latest makes this unnecessary
#kubectl label node --overwrite -l nvidia.com/gpu.present=true cloud.google.com/gke-gpu-driver-version-
#kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/ubuntu/daemonset-preloaded.yaml

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The TODO comment suggests verifying if gpu-driver-version=latest makes the subsequent lines unnecessary. It's good practice to resolve TODOs before merging. According to the Nvidia GPU driver installation guide, using gpu-driver-version=latest automatically installs the latest stable driver. Therefore, manually labeling nodes with nvidia.com/gpu.present=true and applying the daemonset are redundant. I recommend removing these commented-out lines to avoid confusion.

--location "${LOCATION}" \
--node-version "${VERSION}" \
--machine-type "g2-standard-4" \
--accelerator "gpu-driver-version=latest,count=2" \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The count=2 for accelerator in this command seems incorrect. The table in the README indicates g2-standard-4 has 1 L4 GPU. Should this be count=1 to match the table and the machine type's actual GPU configuration?

Suggested change
--accelerator "gpu-driver-version=latest,count=2" \
--accelerator "gpu-driver-version=latest,count=1"

Comment on lines +173 to +186
```yaml
apiVersion: cloud.google.com/v1
kind: ComputeClass
metadata:
name: inference-1x8x24
spec:
priorities:
- nodepools: [n1-standard-4-2xt4]
- nodepools: [g2-standard-4]
- nodepools: [n1-standard-4-4xp4]
whenUnsatisfiable: DoNotScaleUp
---

```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The YAML for the ComputeClass is duplicated within the README. This redundancy can lead to inconsistencies if one copy is updated and the other is not. Consider removing the YAML block from the README and referencing the ccc-inference-1x8x24.yaml file instead. This promotes a single source of truth and improves maintainability. For example:

The ComputeClass definition can be found in [ccc-inference-1x8x24.yaml](ccc-inference-1x8x24.yaml).

Comment on lines 255 to 256
```console
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty code block at the end seems unintentional. Consider removing it or adding the expected output of the previous command. Showing the expected output helps users verify their setup and understand the example better.

Comment on lines +84 to +86
- [DRA driver update to 1.32](https://github.com/NVIDIA/k8s-dra-driver/pull/220)
(in particular the updated driver container image in `install-dra-driver.sh`)
- [DRA driver toleration of compute class](https://github.com/NVIDIA/k8s-dra-driver/pull/221)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's good to see that you're referencing necessary updates from external PRs. However, directly linking to PRs can be problematic as PRs can change or even be closed. It's better to either:

  1. Wait for the PRs to merge and update the instructions accordingly.
  2. Cherry-pick or apply the changes from those PRs into your current branch, so the example works as expected.
  3. Provide alternative instructions or workarounds in case those PRs are not merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant