Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add document for Configuration level resource dynamic virtual id #99

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 61 additions & 1 deletion docs/architecture/configuration-level-resource-in-depth.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,9 +63,69 @@ free.
my-resource-00095f-4: ""
```

### Configuration Device Plugin dynamic virtual device ids

The Instance device plugins report available resources to kubelet using fixed virtual device id. Instance device plugin construct the virtual device id by
appending an index to the Instance name. For example, for an Instance device plugin, if the instance name is `akri.sh/akri-onvif-8120fe` and
the `capacity` is 2, Instance device plugin reports 2 virtual devices `akri-onvif-8120fe-0` and `akri-onvif-8120fe-1`. This works fine for
Instance device plugin but is not flexible for Configuration device plugin as we want to minimize the possibility that kubelet issues `allocate`
requests that request more than available cameras. Instead of using fixed virtual device id, Configuration device plugins expose available
resources using dynamic virtual device ids to provide maximum device usage flexibility.

Here is an example that two cameras are discovered for a Configuration (`akri.sh/akri-onvif-8120fe` and `akri.sh/akri-onvif-a19705`) and the
`capacity` is 2. The Configuration device plugin calculates available resources and, instead of reporting fixed id like `akri-onvif-8120fe-0`,
the Configuration device plugin reports virtual device id "0", "1", ... as 'place holder' in `list_and_watch`. The actual device slot to be used
is determined when `allocate` is called.

To avoid kubelet issues `allocate` requests that requests 3 cameras when only 2 cameras exist with a capacity of 2 each. The Configuration
device plugin only report resources already being claimed by the Configuration device plugin and +1 for each camera that still has at least one
free slot. For our example, if all slots from 2 cameras are free, the Configuration device plugin reports 2 virtual devices "0" and "1" even
there are actually 4 slots available to use. In a different case, if `akri-onvif-8120fe-1` had been claimed by the Configuration device plugin
as virtual device id "4" and all other 3 slots are free, the Configuration device plugin reports virtual device id "0", "1", and "4" as
available virtual device ids. The virtual device id "4" maps to `akri-onvif-8120fe-1` and id "0" and "1" can be mapped to
`akri-onvif-8120fe-0`, `akri-onvif-a19705-0`, or `akri-onvif-a19705-1` later when `allocate` is called. By managing the number of available
virtual devices, we can reduce the chances that kubelet issues `allocate` requests more than available cameras exist.

Note that there is still chances that all slots have been claimed by Configuration device plugin. For example, "0": `akri-onvif-8120fe-0`, "1":
`akri-onvif-8120fe-1`, "2":`akri-onvif-a19705-0`, "3":`akri-onvif-a19705-1`. In this case, Configuration device plugin reports all 4 virtual
devices are available, and it's possible that kubelet requests "0" and "1" which map to the same camera. In that case, the Configuration device
plugin denies the allocation request and the Pod will be in an `UnexpectedAdmissionError` state. The Configuration device plugin calculates
virtual device availability periodically and reduces the number of available virtual devices when the reconciler detects slots not in-use and
set the slot usage to free. In our example, assume "3":`akri-onvif-a19705-1` is the only slot being used. The Agent reconciler sets slots
`akri-onvif-8120fe-0`, `akri-onvif-8120fe-1`, and `akri-onvif-a19705-0` to free.
The Configuration device plugin then reduces the device availability to "0", "1" and "3".
If kubelet retries to claim "0" and "1", the Configuration device plugin will allow it by mapping "0" to `akri-onvif-8120fe-0` or `akri-onvif-8120fe-1`, "1" to `akri-onvif-a19705-0`.

The Configuration device plugin reports "0", "1", ... as virtual device ids in `list_and_watch` and determines the actual device slot to be used
when `allocate` is called. The algorithm to map virtual device ids to actual device slot works on the allocation requests on a per-container
basis that:
- For a given container request, ensure allocated devices are unique instances
- If a virtual device id was claimed before, use the previous allocation information.
- If a virtual device id has being claimed, pick a device slot from the instance that has the most free slots.

For example, assume "3":`akri-onvif-a19705-1` is the only slot being used and other slots are all free. Configuration device plugin
`list_and_watch` reports virtual device id "0", "1" and "3" are available (1 free from `akri-onvif-8120fe`, 1 free from `akri-onvif-a19705` and
1 previously claimed from `akri-onvif-a19705`).
- If kubelet requests "0" and "3" in a container request of an `allocate` call, the mapping is "0": `akri-onvif-8120fe-0` or
`akri-onvif-8120fe-1` (since `akri-onvif-8120fe` has the most free slots between instance `akri-onvif-8120fe` (2) and `akri-onvif-a19705` (1))
and "3":`akri-onvif-a19705-1` (previously claimed).
- If kubelet requests "0", "1" and "3" in a container request of an `allocate` call, Configuration device plugin denies the request as there is
only 2 cameras available to allocate.
- If kubelet request "0" and "1" in a container request and "3" in a different container request of an `allocate` call, the Configuratin device
plugin maps "0": `akri-onvif-8120fe-0` or `akri-onvif-8120fe-1` (the most free slots) and "1":`akri-onvif-a19705-0` (unique instance) to
container request 1 and "3":`akri-onvif-a19705-1` (previously claimed) to container request 2.


## Deployment Strategies with Configuration-level resources

The Akri Agent and Discovery Handlers enable device discovery and Kubernetes resource creation: they discover devices, create Kubernetes resources to represent the devices, and ensure only `capacity` containers are using a device at once via the device plugin framework. The Akri Controller eases device use. If a broker is specified in a Configuration, the Controller will automatically deploy Kubernetes Pods or Jobs to discovered devices. Currently the Controller only supports two deployment strategies: either deploying a non-terminating Pod (that Akri calls a "broker") to each Node that can see a device or deploying a single Job to the cluster for each device discovered. There are plenty of scenarios that do not fit these two strategies such as a ReplicaSet like deployment of n number of Pods to the cluster. With Configuration-level resources, users could easily achieve their own scenarios without the Akri Controller, as selecting resources is more declarative. A user specifies in a resource request how many OPC UA servers are needed rather than needing to delineate the exact ones already discovered by Akri, as explained in Akri's current documentation on [requesting Akri resources](../docs/user-guide/requesting-akri-resources.md).
The Akri Agent and Discovery Handlers enable device discovery and Kubernetes resource creation: they discover devices, create Kubernetes resources to represent
the devices, and ensure only `capacity` containers are using a device at once via the device plugin framework. The Akri Controller eases device use.
If a broker is specified in a Configuration, the Controller will automatically deploy Kubernetes Pods or Jobs to discovered devices. Currently the Controller only
supports two deployment strategies: either deploying a non-terminating Pod (that Akri calls a "broker") to each Node that can see a device or deploying a single
Job to the cluster for each device discovered. There are plenty of scenarios that do not fit these two strategies such as a ReplicaSet like deployment of n number
of Pods to the cluster. With Configuration-level resources, users could easily achieve their own scenarios without the Akri Controller, as selecting resources is
more declarative. A user specifies in a resource request how many OPC UA servers are needed rather than needing to delineate the exact ones already discovered by
Akri, as explained in Akri's current documentation on [requesting Akri resources](../user-guide/requesting-akri-resources.md).

For example, with Configuration-level resources, the following Deployment could be applied to a cluster:

Expand Down