Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Initial version of proposal for DRA mode #88

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions proposals/dra-mode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Dynamic Resource Allocation plugin mode

## Background

Under the kubernetes SIG node, the Dynamic Resource Allocation plugin mechanism ([KEP #3063](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/3063-dynamic-resource-allocation#summary))
aims to provide a new way to expose devices to kubernetes resources. While it doesn't aim to completely replace the Device Plugin system we currently
use, it does overlap a lot with it.

This new system is relevant to Akri as it would ease the management of slots usage and shared devices, for both instance level and configuration level resources.

As a quick reference, the DRA uses a set of kubernetes resources to work: A ResourceClass that register a new class of devices that might be allocatable
(with parameters for the driver) and a ResourceClaim that describe a request for allocation of a resource (with parameters for the driver).
These resources all ultimately refer to a driver, that is a two components system comprised of a resource controller and a resource kubelet plugin.
The controller will help the scheduler on node choice, and receive (de)allocation requests for the devices. The kubelet plugin will give the kubelet
all needed information to expose the device to the workload for a given ResourceClaim. The tracking of available resources is completely up to the driver.

In comparison, the current Akri model revolves around Configurations and Instances. A configuration both describe how to discover devices and what to
schedule when they are discovered (broker logic). An instance just represent a discovered instance and the list of nodes where it is available.

This proposal primarily focus on the Configuration resource, as the Instance resource is all about available resource tracking that is explicitly not
covered by DRA.

The Configuration resource match with the ResourceClass for the discovery part, the ResourceClaim part is handled by a combination of the Instance and
inner workings of the akri agent.

## API Changes

These API changes may require writing a migration tool for existing Configurations

### Create a `BrokerTemplate` resource kind

In order to fit in the DRA model we need to separate the broker logic from the discovery logic, thus we need to create a new resource kind to handle the broker part. This resource needs to be linked to a discovery logic resource and provides all the needed information about what to schedule.

This uses the "Deploy arbitrary Kubernetes resources as broker" for the description of what to schedule, please see that proposal for further details
on the exact content of the broker fields.

```yaml
# apiVersion got ommited on purpose
kind: BrokerTemplate
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if ResourceTemplate might be more obvious to users? Broker can be a bit confusing IMO

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, the change in language might be it's own proposal...

metadata: ...
spec:
discoveryReference: foo
instanceBrokerResources: ...
globalBrokerResources: ... # this field is named configurationBrokerResources in other proposal
```

### Create a `DiscoveryConfiguration` resource kind
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cluster-scoped resource kind, as ResourceClasses are cluster-scoped. In fact while we currently have namespaces for our Configurations the DevicePlugin resource is expose outside this namespace, so it would makes more sense to have it cluster scoped anyway.


This resource bears the discovery part of the current Configuration resource. It would be used as ResourceClass parameter.
You can note that we do **not** include the name of the discovery handler as it would be in the driver name in ResourceClass
as a subdomain of `driver.akri.sh` like e.g. `udev.driver.akri.sh`
Comment on lines +50 to +51
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if that's the best idea to have a driver name per DH, maybe having a single driver.akri.sh DRA driver name is better and thus re-introduce discovery handler name in the DiscoveryConfiguration, as it would remove any ambiguity about what DH the configuration is for.

Another idea here was to have a dedicated CRD per DH so that way there is no ambiguity on what DH a configuration is for, but it would make creating a DH more cumbersome as one would also need to create a CRD. It would also make it harder for the agent as it would need to extract information from a possibly non akri CRD, hence I did not pursued on this idea.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another argument against different CRDs is that the ability to tie things together with downwards api seems useful here and an increase in resource types could make building higher level tooling harder (UI for instance)


```yaml
# apiVersion got ommited on purpose
kind: DiscoveryConfiguration
metadata: ...
spec:
capacity: 1
details:
udevRules:
- KERNEL=="video[0-9]*"
properties: ... # discoveryProperties in current Configuration
```

### Create a `PropertyFilter` resource kind

This is a brand-new resource kind, it aims to be used as parameter to a ResourceClaim, it allows to further filter out
on the instances discovered from a ResourceClass when doing a claim. This is what allows to request a specific instance, rather
than using the "configuration level" claim.

The resource allows filtering on any property exposed by the discovery handler (including the resource identifier).

```yaml
# apiVersion got ommited on purpose
kind: PropertyFilter
metadata: ...
spec:
filters: # All filters must match for the instance to be considered
- key: foo
value: bar
mode: exclude
```

## Architecture and Behavior Changes

### Add a New Controller: the driverController

The DRA mechanism needs a driver specific resource controller that will do the allocate/de-allocate work as well as the node selection. As we split
the Configuration resource in two, we must create a new entity to handler this.

This controller watches over the ResourceClass to trigger a discovery with the given DiscoveryConfiguration, it also watches over ResourceClaims to
inform the scheduler about node selection and manage (de)allocation.

### Rename current controller to brokerController

As we split the broker management from the discovery management, and we have another controller, controller becomes too vague for naming the entity,
hence we rename it to brokerController.

The brokerController does not need to have write permissions to the Instances.

### Agent behavior changes

The agent becomes lighter, it no longer watches over Configurations. The agent registers a `ResourcePlugin` per connected discovery handler, upon
allocation request it generates the [CDI](https://github.com/cncf-tags/container-device-interface) device and gives it to the kubelet.

The communication with the DH can be changed to allow making use of CDI features (like hooks), the agent still create/updates the Instances
when discovering devices (and when a device disappears)

## Timeline and changes order

The DRA is not yet in beta phase, we should provide this new behavior along with the "old" one as long as all our supported kubernetes
versions don't have the needed features, however some parts of the new behavior can be implemented without waiting for DRA to be broadly available.

Here is a proposed timeline:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me!


1. Rename current controller to brokerController (can be done without any prerequisite on DRA)
2. Make API changes under `akri.sh/v1`, implement Agent's new behavior and driverController for this API version (keeping old behavior for `akri.sh/v0`), this can be done as soon as DRA is available.
3. Deprecate Device Plugin mode (and `akri.sh/v0` API) when all our supported versions have DRA
4. Remove `akri.sh/v0` API

The DH interface can be augmented with CDI features anytime after step 2.

## Links for reference

- [KEP-3063: Dynamic resource allocation](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3063-dynamic-resource-allocation/README.md)
- [Device Plugins 2.0: How to Build a Driver for Dynamic Resource Allocation - K Klues & Alexey Fomenko (KubeCon Europe 2023)](https://www.youtube.com/watch?v=_fi9asserLE)
- [Container Device Interface Specification](https://github.com/cncf-tags/container-device-interface/blob/main/SPEC.md)