Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PX IPv6 support #7224

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from
Draft

PX IPv6 support #7224

wants to merge 10 commits into from

Conversation

swagner-de
Copy link
Contributor

  • Use StatefulSet instead of deployments
  • Simplify PX charts: Region names not put all over all places
  • Use different naming for statefulset
  • [PX] Set bird as default container
  • Remove metricRelabelings as the same labels are already created
  • Support legacy config file path and new config file path
  • Use different naming for statefulset
  • Use 1 statefulset with replicas:2 instead of 2 statefulsets
  • Let router-id be determined from init-network
  • Add bird IPv6 statefulsets once an IPv6 network was supplied.

The deployment controller in case of `strategy: Recreate` terminates the
old pod and creates the new pod already. Because of our static IP
assignment via multus this would lead to duplicate addresses being used
while the old pod is not fully terminated and the new pod's networking
stack is already up. This did not pose a problem with IPv4 (aside from
it being still undesirable) but IPv6's duplicate address detection will
fail on the new pod, leading to it binding the address only tentatively.
This is of course undesired behavior. Hence we make the switch to
StatefulSets. Statefulsets make sure that there is only ever one
instance of a StatefulSets ordinal. By definition they are anyhow more
suiting to bird, as they maintain a stable identity and are tailored to
stateful applications.
Historically(TM) the PX charts would reference the region at all places.
ConfigMaps, Services, Ingress, Deployments, all of them were prefixed
with the region name. This comes with very little benefit (operator
generally should well understand to which region they are scoped) but
with the disadvantage that the region must carried into all subcharts.

This commit also rids all useless prefixes like `cfg-`. We know it's a
ConfigMap, we don't need the prefix.
IPv6 is coming. So we need a new naming. Including the region in the
statefulset name has bothered me for quite some time as it provided
little value but instead bloats the pod name and requires the region to
be passed to everything.

The new naming now tells the operator what we are, a routeserver, which
address family we serve, which service number we do, and in which domain
we run.

We end up with

`routeserver-{{ $afi }}-service-{{ $service }}-domain-{{ $domain }}-{{ $instance }}`
Tried this out, turns out we have exact same labels already on the pod,
so no need to parse the info out of the name here.
With the introduction of IPv6 we will also need a new v6 only config.
That has to be encoded in the file name but also in the name of the
configMap.

Hence, we move the part that checks if the config is present out into a
helper. In that helper, we try the new path, if it is present we return
that path. If not we fall back to the legacy path.

We also give the configMap a naming that is independent of the config
used but basically equals the new config file name without the `.conf`
suffix.
IPv6 is coming. So we need a new naming. Including the region in the
statefulset name has bothered me for quite some time as it provided
little value but instead bloats the pod name and requires the region to
be passed to everything.

The new naming now tells the operator what we are, a routeserver, which
address family we serve, which service number we do, and in which domain
we run.

We end up with

`routeserver-{{ $afi }}-service-{{ $service }}-domain-{{ $domain }}-{{ $instance }}`
The PX instances share everything, but the IP which is determined by the
network attachment definition (NAD). So the idea is, to remove the IP
from the NAD and merge the 2 statefulsets.

That's what we did here:
1. Remove the IPAM part from the NAD. This leads to the pod coming up
   with the interface bound l2 wise, but no l3 config present.
2. We don't assign IP addresses anymore, we assume by convention that
   the first #replicas IPs in the PX peering network are determined for
   the route server. With this in mind, and the PX peering network
   known, we can calculate this pod's IP based on the statefulset
   ordinal. This IP can then be configured. All of this is done through
   the `init-network` container. It works for IPv4 and IPv6.
3. The looking glass needs to be able to reach each router-server
   individually. We did that earlier by having a service per
   statefulset. If we kept doing this, the looking glass would be load
   balanced between the statefulsets. Headless services [1] to the
   rescue! We add one service per helm release, that becomes a suffix in
   the pod's dns name [2]. We also backreference this service in the
   statefulset allowing us to address each pod of the statefulset by
   its stable name.

[1] https://kubernetes.io/docs/concepts/services-networking/service/#headless-services
[2] https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#stable-network-id
The router id is a 4 byte value that is written in IPv4 notation. Bird
used the line `router id from "vlan*";` meaning it takes the IP address
from an interface starting with vlan* as router id. With v6 that does
not work anymore because the router id is still the same value.

There are 2 options:
1. hardcode the router id in the config
2. Dynamically determine router id at runtime

(1) has the disadvantage that we need a dedicated configmap per route
server instance, meaning we would also back out from the plan that we
run both router-server instances from a single statefulset as 2
configmaps need 2 statefulsets. It would also mean, that the
configbuilder, which right now has no knowledge of the IP addresses bird
is served from must get this information, likely by getting a
secrets-repo reference or by it bying replicated. This is all not needed
for (2), the only downside would be that some parts of the config cannot
be validated upon building/merging/deploying it. We would create an
additional config that is referenced from the main config that only
contains a single line, the router id. As the router id will be the
corrresponding router-servers v4 address we always feed the v4 address
into the pod, even if it is a v6 pod. We let the init-network script
calculate the link IP and the router id. It will generate a single line
into a file that the bird config will include.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant