-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Initial version of proposal for DRA mode #88
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
# Dynamic Resource Allocation plugin mode | ||
|
||
## Background | ||
|
||
Under the kubernetes SIG node, the Dynamic Resource Allocation plugin mechanism ([KEP #3063](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/3063-dynamic-resource-allocation#summary)) | ||
aims to provide a new way to expose devices to kubernetes resources. While it doesn't aim to completely replace the Device Plugin system we currently | ||
use, it does overlap a lot with it. | ||
|
||
This new system is relevant to Akri as it would ease the management of slots usage and shared devices, for both instance level and configuration level resources. | ||
|
||
As a quick reference, the DRA uses a set of kubernetes resources to work: A ResourceClass that register a new class of devices that might be allocatable | ||
(with parameters for the driver) and a ResourceClaim that describe a request for allocation of a resource (with parameters for the driver). | ||
These resources all ultimately refer to a driver, that is a two components system comprised of a resource controller and a resource kubelet plugin. | ||
The controller will help the scheduler on node choice, and receive (de)allocation requests for the devices. The kubelet plugin will give the kubelet | ||
all needed information to expose the device to the workload for a given ResourceClaim. The tracking of available resources is completely up to the driver. | ||
|
||
In comparison, the current Akri model revolves around Configurations and Instances. A configuration both describe how to discover devices and what to | ||
schedule when they are discovered (broker logic). An instance just represent a discovered instance and the list of nodes where it is available. | ||
|
||
This proposal primarily focus on the Configuration resource, as the Instance resource is all about available resource tracking that is explicitly not | ||
covered by DRA. | ||
|
||
The Configuration resource match with the ResourceClass for the discovery part, the ResourceClaim part is handled by a combination of the Instance and | ||
inner workings of the akri agent. | ||
|
||
## API Changes | ||
|
||
These API changes may require writing a migration tool for existing Configurations | ||
|
||
### Create a `BrokerTemplate` resource kind | ||
|
||
In order to fit in the DRA model we need to separate the broker logic from the discovery logic, thus we need to create a new resource kind to handle the broker part. This resource needs to be linked to a discovery logic resource and provides all the needed information about what to schedule. | ||
|
||
This uses the "Deploy arbitrary Kubernetes resources as broker" for the description of what to schedule, please see that proposal for further details | ||
on the exact content of the broker fields. | ||
|
||
```yaml | ||
# apiVersion got ommited on purpose | ||
kind: BrokerTemplate | ||
metadata: ... | ||
spec: | ||
discoveryReference: foo | ||
instanceBrokerResources: ... | ||
globalBrokerResources: ... # this field is named configurationBrokerResources in other proposal | ||
``` | ||
|
||
### Create a `DiscoveryConfiguration` resource kind | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a cluster-scoped resource kind, as ResourceClasses are cluster-scoped. In fact while we currently have namespaces for our Configurations the DevicePlugin resource is expose outside this namespace, so it would makes more sense to have it cluster scoped anyway. |
||
|
||
This resource bears the discovery part of the current Configuration resource. It would be used as ResourceClass parameter. | ||
You can note that we do **not** include the name of the discovery handler as it would be in the driver name in ResourceClass | ||
as a subdomain of `driver.akri.sh` like e.g. `udev.driver.akri.sh` | ||
Comment on lines
+50
to
+51
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if that's the best idea to have a driver name per DH, maybe having a single Another idea here was to have a dedicated CRD per DH so that way there is no ambiguity on what DH a configuration is for, but it would make creating a DH more cumbersome as one would also need to create a CRD. It would also make it harder for the agent as it would need to extract information from a possibly non akri CRD, hence I did not pursued on this idea. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another argument against different CRDs is that the ability to tie things together with downwards api seems useful here and an increase in resource types could make building higher level tooling harder (UI for instance) |
||
|
||
```yaml | ||
# apiVersion got ommited on purpose | ||
kind: DiscoveryConfiguration | ||
metadata: ... | ||
spec: | ||
capacity: 1 | ||
details: | ||
udevRules: | ||
- KERNEL=="video[0-9]*" | ||
properties: ... # discoveryProperties in current Configuration | ||
``` | ||
|
||
### Create a `PropertyFilter` resource kind | ||
|
||
This is a brand-new resource kind, it aims to be used as parameter to a ResourceClaim, it allows to further filter out | ||
on the instances discovered from a ResourceClass when doing a claim. This is what allows to request a specific instance, rather | ||
than using the "configuration level" claim. | ||
|
||
The resource allows filtering on any property exposed by the discovery handler (including the resource identifier). | ||
|
||
```yaml | ||
# apiVersion got ommited on purpose | ||
kind: PropertyFilter | ||
metadata: ... | ||
spec: | ||
filters: # All filters must match for the instance to be considered | ||
- key: foo | ||
value: bar | ||
mode: exclude | ||
``` | ||
|
||
## Architecture and Behavior Changes | ||
|
||
### Add a New Controller: the driverController | ||
|
||
The DRA mechanism needs a driver specific resource controller that will do the allocate/de-allocate work as well as the node selection. As we split | ||
the Configuration resource in two, we must create a new entity to handler this. | ||
|
||
This controller watches over the ResourceClass to trigger a discovery with the given DiscoveryConfiguration, it also watches over ResourceClaims to | ||
inform the scheduler about node selection and manage (de)allocation. | ||
|
||
### Rename current controller to brokerController | ||
|
||
As we split the broker management from the discovery management, and we have another controller, controller becomes too vague for naming the entity, | ||
hence we rename it to brokerController. | ||
|
||
The brokerController does not need to have write permissions to the Instances. | ||
|
||
### Agent behavior changes | ||
|
||
The agent becomes lighter, it no longer watches over Configurations. The agent registers a `ResourcePlugin` per connected discovery handler, upon | ||
allocation request it generates the [CDI](https://github.com/cncf-tags/container-device-interface) device and gives it to the kubelet. | ||
|
||
The communication with the DH can be changed to allow making use of CDI features (like hooks), the agent still create/updates the Instances | ||
when discovering devices (and when a device disappears) | ||
|
||
## Timeline and changes order | ||
|
||
The DRA is not yet in beta phase, we should provide this new behavior along with the "old" one as long as all our supported kubernetes | ||
versions don't have the needed features, however some parts of the new behavior can be implemented without waiting for DRA to be broadly available. | ||
|
||
Here is a proposed timeline: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This makes sense to me! |
||
|
||
1. Rename current controller to brokerController (can be done without any prerequisite on DRA) | ||
2. Make API changes under `akri.sh/v1`, implement Agent's new behavior and driverController for this API version (keeping old behavior for `akri.sh/v0`), this can be done as soon as DRA is available. | ||
3. Deprecate Device Plugin mode (and `akri.sh/v0` API) when all our supported versions have DRA | ||
4. Remove `akri.sh/v0` API | ||
|
||
The DH interface can be augmented with CDI features anytime after step 2. | ||
|
||
## Links for reference | ||
|
||
- [KEP-3063: Dynamic resource allocation](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3063-dynamic-resource-allocation/README.md) | ||
- [Device Plugins 2.0: How to Build a Driver for Dynamic Resource Allocation - K Klues & Alexey Fomenko (KubeCon Europe 2023)](https://www.youtube.com/watch?v=_fi9asserLE) | ||
- [Container Device Interface Specification](https://github.com/cncf-tags/container-device-interface/blob/main/SPEC.md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if
ResourceTemplate
might be more obvious to users?Broker
can be a bit confusing IMOThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, the change in language might be it's own proposal...