PX IPv6 support #7224

swagner-de · 2024-10-11T12:16:14Z

Use StatefulSet instead of deployments
Simplify PX charts: Region names not put all over all places
Use different naming for statefulset
[PX] Set bird as default container
Remove metricRelabelings as the same labels are already created
Support legacy config file path and new config file path
Use different naming for statefulset
Use 1 statefulset with replicas:2 instead of 2 statefulsets
Let router-id be determined from init-network
Add bird IPv6 statefulsets once an IPv6 network was supplied.

The deployment controller in case of `strategy: Recreate` terminates the old pod and creates the new pod already. Because of our static IP assignment via multus this would lead to duplicate addresses being used while the old pod is not fully terminated and the new pod's networking stack is already up. This did not pose a problem with IPv4 (aside from it being still undesirable) but IPv6's duplicate address detection will fail on the new pod, leading to it binding the address only tentatively. This is of course undesired behavior. Hence we make the switch to StatefulSets. Statefulsets make sure that there is only ever one instance of a StatefulSets ordinal. By definition they are anyhow more suiting to bird, as they maintain a stable identity and are tailored to stateful applications.

Historically(TM) the PX charts would reference the region at all places. ConfigMaps, Services, Ingress, Deployments, all of them were prefixed with the region name. This comes with very little benefit (operator generally should well understand to which region they are scoped) but with the disadvantage that the region must carried into all subcharts. This commit also rids all useless prefixes like `cfg-`. We know it's a ConfigMap, we don't need the prefix.

IPv6 is coming. So we need a new naming. Including the region in the statefulset name has bothered me for quite some time as it provided little value but instead bloats the pod name and requires the region to be passed to everything. The new naming now tells the operator what we are, a routeserver, which address family we serve, which service number we do, and in which domain we run. We end up with `routeserver-{{ $afi }}-service-{{ $service }}-domain-{{ $domain }}-{{ $instance }}`

See https://kubernetes.io/docs/reference/labels-annotations-taints/#kubectl-kubernetes-io-default-container for details.

Tried this out, turns out we have exact same labels already on the pod, so no need to parse the info out of the name here.

With the introduction of IPv6 we will also need a new v6 only config. That has to be encoded in the file name but also in the name of the configMap. Hence, we move the part that checks if the config is present out into a helper. In that helper, we try the new path, if it is present we return that path. If not we fall back to the legacy path. We also give the configMap a naming that is independent of the config used but basically equals the new config file name without the `.conf` suffix.

IPv6 is coming. So we need a new naming. Including the region in the statefulset name has bothered me for quite some time as it provided little value but instead bloats the pod name and requires the region to be passed to everything. The new naming now tells the operator what we are, a routeserver, which address family we serve, which service number we do, and in which domain we run. We end up with `routeserver-{{ $afi }}-service-{{ $service }}-domain-{{ $domain }}-{{ $instance }}`

The PX instances share everything, but the IP which is determined by the network attachment definition (NAD). So the idea is, to remove the IP from the NAD and merge the 2 statefulsets. That's what we did here: 1. Remove the IPAM part from the NAD. This leads to the pod coming up with the interface bound l2 wise, but no l3 config present. 2. We don't assign IP addresses anymore, we assume by convention that the first #replicas IPs in the PX peering network are determined for the route server. With this in mind, and the PX peering network known, we can calculate this pod's IP based on the statefulset ordinal. This IP can then be configured. All of this is done through the `init-network` container. It works for IPv4 and IPv6. 3. The looking glass needs to be able to reach each router-server individually. We did that earlier by having a service per statefulset. If we kept doing this, the looking glass would be load balanced between the statefulsets. Headless services [1] to the rescue! We add one service per helm release, that becomes a suffix in the pod's dns name [2]. We also backreference this service in the statefulset allowing us to address each pod of the statefulset by its stable name. [1] https://kubernetes.io/docs/concepts/services-networking/service/#headless-services [2] https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#stable-network-id

The router id is a 4 byte value that is written in IPv4 notation. Bird used the line `router id from "vlan*";` meaning it takes the IP address from an interface starting with vlan* as router id. With v6 that does not work anymore because the router id is still the same value. There are 2 options: 1. hardcode the router id in the config 2. Dynamically determine router id at runtime (1) has the disadvantage that we need a dedicated configmap per route server instance, meaning we would also back out from the plan that we run both router-server instances from a single statefulset as 2 configmaps need 2 statefulsets. It would also mean, that the configbuilder, which right now has no knowledge of the IP addresses bird is served from must get this information, likely by getting a secrets-repo reference or by it bying replicated. This is all not needed for (2), the only downside would be that some parts of the config cannot be validated upon building/merging/deploying it. We would create an additional config that is referenced from the main config that only contains a single line, the router id. As the router id will be the corrresponding router-servers v4 address we always feed the v4 address into the pod, even if it is a v6 pod. We let the init-network script calculate the link IP and the router id. It will generate a single line into a file that the bird config will include.

swagner-de added 10 commits September 27, 2024 16:36

[PX] Set bird as default container

b3480b5

See https://kubernetes.io/docs/reference/labels-annotations-taints/#kubectl-kubernetes-io-default-container for details.

Remove metricRelabelings as the same labels are already created

089c8b3

Tried this out, turns out we have exact same labels already on the pod, so no need to parse the info out of the name here.

Add bird IPv6 statefulsets once an IPv6 network was supplied.

4c498ed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PX IPv6 support #7224

PX IPv6 support #7224

swagner-de commented Oct 11, 2024

PX IPv6 support #7224

Are you sure you want to change the base?

PX IPv6 support #7224

Conversation

swagner-de commented Oct 11, 2024