[WIP] Add DRA+CCC example #1569

johnbelamaric · 2024-12-20T00:21:50Z

Description

This PR adds an example of using DRA with CCC to allow a single deployment to be served by varying numbers and models of GPU.

Tasks

The contributing guide has been read and followed.
The samples added / modified have been fully tested.
Workflow files have been added / modified, if applicable.
Region tags have been properly added, if new samples.
Editable variables have been used, where applicable.
All dependencies are set to up-to-date versions, as applicable.
Merge this pull-request for me once it is approved.

Signed-off-by: John Belamaric <[email protected]>

code-review-assist

Hi @johnbelamaric and reviewers,

I'm currently reviewing this pull request and will post my detailed review in a few minutes. In the meantime, here's a quick summary to get everyone up to speed:

This pull request aims to add an example demonstrating the use of Dynamic Resource Allocation (DRA) with Custom Compute Classes (CCC) in Google Kubernetes Engine (GKE). The goal is to showcase how a single deployment can be served by different numbers and models of GPUs, providing flexibility in autoscaling and resource utilization.

Here's a breakdown of the changes:

New files added:
- autoscaling/custom-compute-classes/dra/README.md: A comprehensive guide explaining the concept, setup, and usage of DRA and CCC for GPU workloads. This includes detailed instructions, code snippets, and explanations of the different components involved. Lines 1-256 were added to this file.
- autoscaling/custom-compute-classes/dra/ccc-inference-1x8x24.yaml: YAML configuration for a custom compute class named inference-1x8x24, defining priorities for different node pools with varying GPU configurations. Lines 1-24 were added to this file.
- autoscaling/custom-compute-classes/dra/deployment.yaml: YAML configuration for a Kubernetes Deployment that utilizes DRA to request all available GPUs on a node. Lines 1-68 were added to this file.
- autoscaling/custom-compute-classes/dra/prepare-gke-nodes-for-dra.yaml: YAML configuration for a DaemonSet that prepares GKE nodes for use with the DRA driver. Lines 1-60 were added to this file.
No files were modified or deleted.

The example uses a custom compute class to prioritize node pools with different GPU types (T4, L4, P4), allowing the deployment to adapt to available resources. The README.md provides a step-by-step guide on setting up the environment, deploying the necessary components, and running the example.

I'll provide a more detailed review shortly, focusing on specific code sections and potential improvements. Let me know if you have any questions in the meantime.

And now, a little haiku to lighten the mood:

GPUs in the cloud,
DRA and CCC work well,
Code flows like a stream.

code-review-assist

This PR adds an example of using Dynamic Resource Allocation (DRA) with Custom Compute Classes (CCC). This allows a single deployment to be served by varying numbers and models of GPUs. The documentation is comprehensive and provides clear instructions. A few minor improvements can enhance clarity and address potential issues.

I've referenced the Google style guides for some of the suggestions below, as no specific style guide was provided.

It's great that you've included a TODO comment. I've provided a suggestion for addressing it.

code-review-assist · 2024-12-20T00:23:44Z