Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bonding: Renaming multiple interfaces not working properly #66

Open
atanasdinov opened this issue Apr 16, 2024 · 6 comments
Open

bonding: Renaming multiple interfaces not working properly #66

atanasdinov opened this issue Apr 16, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@atanasdinov
Copy link
Collaborator

Input:

routes:
  config:
    - destination: 0.0.0.0/0
      metric: 100
      next-hop-address: 192.168.122.1
      next-hop-interface: bond99
      table-id: 254
    - destination: 192.168.122.0/24
      metric: 100
      next-hop-address:
      next-hop-interface: bond99
      table-id: 254
dns-resolver:
  config:
    server:
      - 192.168.122.1
      - 8.8.8.8
interfaces:
  - name: bond99
    type: bond
    state: up
    ipv4:
      address:
        - ip: 192.168.122.50
          prefix-length: 24
      enabled: true
    link-aggregation:
      mode: balance-rr
      options:
        miimon: '140'
      port:
        - eth3
        - eth2
  - name: eth2
    type: ethernet
    state: up
    mac-address: 34:8A:B1:4B:16:E7
    ipv4:
      enabled: false
    ipv6:
      enabled: false
  - name: eth3
    type: ethernet
    state: up
    mac-address: 34:8A:B1:4B:16:E8
    ipv4:
      enabled: false
    ipv6:
      enabled: false

Applying the generated configuration on a host with two Ethernet interfaces (eth0 and eth1 respectively) only results in properly renaming one of the interfaces:

node1:~ # journalctl -u combustion | grep nmc
Apr 16 16:14:57 localhost.localdomain combustion[1564]: [2024-04-16T16:14:57Z INFO  nmc::apply_conf] Identified host: node1.suse.com
Apr 16 16:14:57 localhost.localdomain combustion[1564]: [2024-04-16T16:14:57Z INFO  nmc::apply_conf] Set hostname: node1.suse.com
Apr 16 16:14:57 localhost.localdomain combustion[1564]: [2024-04-16T16:14:57Z INFO  nmc::apply_conf] Processing interface 'eth2'...
Apr 16 16:14:57 localhost.localdomain combustion[1564]: [2024-04-16T16:14:57Z INFO  nmc::apply_conf] Using interface name 'eth0' instead of the preconfigured 'eth2'
Apr 16 16:14:57 localhost.localdomain combustion[1564]: [2024-04-16T16:14:57Z INFO  nmc::apply_conf] Processing interface 'eth3'...
Apr 16 16:14:57 localhost.localdomain combustion[1564]: [2024-04-16T16:14:57Z INFO  nmc::apply_conf] Processing interface 'bond99'...
Apr 16 16:14:57 localhost.localdomain combustion[1564]: [2024-04-16T16:14:57Z INFO  nmc] Successfully applied config
node1:~ # nmcli -f NAME,UUID,TYPE,DEVICE,FILENAME con show
NAME    UUID                                  TYPE      DEVICE  FILENAME
bond99  4a920503-4862-5505-80fd-4738d07f44c6  bond      bond99  /etc/NetworkManager/system-connections/bond99.nmconnection
eth0    21373057-f376-5091-afb6-64de925c23ed  ethernet  eth0    /etc/NetworkManager/system-connections/eth0.nmconnection
eth1    25a24ec3-fed0-4735-9020-03a49d5a15ff  ethernet  eth1    /var/run/NetworkManager/system-connections/eth1.nmconnection
eth3    7e211aea-3d14-59cf-a4fa-be91dac5dbba  ethernet  --      /etc/NetworkManager/system-connections/eth3.nmconnection

node1:~ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond99 state UP group default qlen 1000
    link/ether 34:8a:b1:4b:16:e7 brd ff:ff:ff:ff:ff:ff
    altname enp1s0
3: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond99 state UP group default qlen 1000
    link/ether 34:8a:b1:4b:16:e7 brd ff:ff:ff:ff:ff:ff permaddr 34:8a:b1:4b:16:e8
    altname enp7s0
4: bond99: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 34:8a:b1:4b:16:e7 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.50/24 brd 192.168.122.255 scope global noprefixroute bond99
       valid_lft forever preferred_lft forever
@atanasdinov atanasdinov added the bug Something isn't working label Apr 16, 2024
@atanasdinov
Copy link
Collaborator Author

Seems to be related to bonding since the renaming works properly during combustion-prepare:

node1:~ # journalctl -u combustion-prepare
Apr 16 16:14:45 localhost systemd[1]: Starting Combustion (preparations)...
Apr 16 16:14:45 localhost combustion[809]: [2024-04-16T16:14:45Z INFO  nmc::apply_conf] Identified host: node1.suse.com
Apr 16 16:14:45 localhost combustion[809]: [2024-04-16T16:14:45Z INFO  nmc::apply_conf] Set hostname: node1.suse.com
Apr 16 16:14:45 localhost combustion[809]: [2024-04-16T16:14:45Z INFO  nmc::apply_conf] Processing interface 'eth2'...
Apr 16 16:14:45 localhost combustion[809]: [2024-04-16T16:14:45Z INFO  nmc::apply_conf] Using interface name 'eth0' instead of the preconfigured 'eth2'
Apr 16 16:14:45 localhost combustion[809]: [2024-04-16T16:14:45Z INFO  nmc::apply_conf] Processing interface 'eth3'...
Apr 16 16:14:45 localhost combustion[809]: [2024-04-16T16:14:45Z INFO  nmc::apply_conf] Using interface name 'eth1' instead of the preconfigured 'eth3'
Apr 16 16:14:45 localhost combustion[809]: [2024-04-16T16:14:45Z INFO  nmc::apply_conf] Processing interface 'bond99'...
Apr 16 16:14:45 localhost combustion[809]: [2024-04-16T16:14:45Z INFO  nmc] Successfully applied config
Apr 16 16:14:45 localhost systemd[1]: combustion-prepare.service: Deactivated successfully.
Apr 16 16:14:45 localhost systemd[1]: Finished Combustion (preparations).

@atanasdinov atanasdinov changed the title Renaming multiple interfaces not working properly bonding: Renaming multiple interfaces not working properly Apr 16, 2024
@jmmckenz
Copy link

jmmckenz commented Nov 8, 2024

I ran into a form of this as well. I think the issue here is that nmc is creating the nmconnection files with the "cloned-mac-address" and "interface-name" settings populated. While digging into nmc, and subsequently nmstate I found this:

Basically, add "identifier: mac-address" to each of the nics, and the resulting nmconnection files will now populate with "mac-address" and not populate the "interface-name":

  - name: eth2
    type: ethernet
    identifier: mac-address
    state: up
    mac-address: 34:8A:B1:4B:16:E7
    ipv4:
      enabled: false
    ipv6:
      enabled: false
  - name: eth3
    type: ethernet
    identifier: mac-address
    state: up
    mac-address: 34:8A:B1:4B:16:E8
    ipv4:
      enabled: false
    ipv6:
      enabled: false

I am still testing, but this seems to work so far.

@atanasdinov
Copy link
Collaborator Author

Hey, @jmmckenz. Thanks for the hint. Did that resolve the renaming issue or is it not enough?

I'll try to reproduce it as well soon, as well as try to figure out the main culprit behind the bonding failures and how to best tackle those.

@jmmckenz
Copy link

Hey, @jmmckenz. Thanks for the hint. Did that resolve the renaming issue or is it not enough?

I'll try to reproduce it as well soon, as well as try to figure out the main culprit behind the bonding failures and how to best tackle those.

At this time I don't think it is enough. There are two issues at play here, especially with multinic configurations. The first has to do with EIB and the enumeration of NICs not being predictable across multiple reboots. This should be remedied by setting the kernel arg "net.ifnames=1" to produce predictable nic names, BUT with the current release of EIB, it does not take effect until after the first reboot, so you have ethX.nmconnection names that may be invalid once net.ifnames=1 applies on reboot. This effects both bonded and non-bonded nics. There is a current PR that addresses this issue and will have the kexec shell apply the kernel parameter so that it is consistent through the build and reboot process.

The second issue is with nmconfigurator itself. During combustion, nmc runs twice, once at preboot, and then again when the preboot phase is over and kexec has chrooted/mounted us into our "build". The first run correctly identifies host by mac, sets up nics, sets up bonds and slaves as expected. The second run fails to identify the host by mac if a bond is present, thus leaving us with the localhost.localdomain hostname instead of the name that should be generated from network/myhost.mycompany.com.yaml. Because it fails, it does not copy the interface.nmconnection files during combustion, and upon reboot, we are left with no network config for NetworkManager to apply. My assumption here, and take this with a grain of salt because it could be a reach, is that once the bond is established, the slave nics masquerade the MAC address of the bond, and move their own MAC to permaddr. This causes nmc to not recognize our host anymore because the effective MAC address of our interfaces is now cloned from the bond. But, that is just a theory.

Refer to: #122 for more information and my UGLY workaround. I can tell you that we have made some good progress on the EIB "net.ifnames=1" handling, so hopefully it will get pushed out to a future release once it is fully tested and vetted. The nmc issue was identified last week, and may take time to identify root cause, effective remediation, or to see if there are existing ways through the eib-definition yaml that we can avoid the problem.

@atanasdinov
Copy link
Collaborator Author

A couple of notes explaining the reason behind all issues related to bonding.

We have to run nmc during both the combustion-prepare and combustion phases of the provisioning process. Each of those serves a specific purpose and we cannot only do one of those. More details can be found in the README and the source code of Combustion.

When bonding is configured, the combustion-prepare phase will run and set up the interfaces as expected. However, once the bond is configured and we run nmc during combustion we are only fetching the MAC address of the bond itself which is also shared between the configured interfaces and "hides" their original MAC address. This prevents the functionality to detect the node and the script fails resulting in a broken network setup.

We use a third-party crate to list the interfaces on the node since there's no native way in Rust to achieve the same. The linked crate will run getifaddrs under the hood to capture the interfaces and return them. Unfortunately, this C function will not return a result that we can rely on.

Example (edited for visibility):

name: "bond99",
addr: [   V4(V4IfAddr { ip: 192.168.122.60, broadcast: Some(192.168.122.255), netmask: Some(255.255.255.0) }) ],
mac_addr: Some("da:ce:d8:7f:99:b4"),
index: 4,

name: "eth0",
addr: [ ],
mac_addr: Some("da:ce:d8:7f:99:b4"),
index: 2,

name: "eth1",
addr: [ ],
mac_addr: Some("da:ce:d8:7f:99:b4"),
index: 3,

We can clearly see that all interfaces share the MAC address of the bond and the actual MAC addresses of the interfaces are dropped.

Other ways of extracting the MAC addresses resulting in "misleading" values include:

  • cat /sys/class/net/<if-name>/address
  • cat /proc/net/arp

I'd like to have to do some more investigation as soon as I have the time but we need to migrate the current implementation away from getifaddrs. Options that could work are:

  • Use ethtool once the interface names are listed
$ ethtool -P eth0
Permanent address: 34:8a:b1:4b:16:e3
  • Use ip -o link to list interfaces and MAC addresses (although in an inconvenient to parse way)
$ ip -o link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000\    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond99 state UP mode DEFAULT group default qlen 1000\    link/ether da:ce:d8:7f:99:b4 brd ff:ff:ff:ff:ff:ff permaddr 34:8a:b1:4b:16:e3\    altname enp1s0
3: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond99 state UP mode DEFAULT group default qlen 1000\    link/ether da:ce:d8:7f:99:b4 brd ff:ff:ff:ff:ff:ff permaddr 34:8a:b1:4b:16:e4\    altname enp7s0
4: bond99: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000\    link/ether da:ce:d8:7f:99:b4 brd ff:ff:ff:ff:ff:ff

@jmmckenz
Copy link

Different work around for static bonded nics (with caveats)...
https://github.com/jmmckenz/examples/tree/main/bonded-edge-image-builder-workaround

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants