Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The keepalive.conf provided by the CPLB creates an infinite loop. #5178

Open
4 tasks done
chattytak opened this issue Nov 4, 2024 · 5 comments
Open
4 tasks done

The keepalive.conf provided by the CPLB creates an infinite loop. #5178

chattytak opened this issue Nov 4, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@chattytak
Copy link

chattytak commented Nov 4, 2024

Before creating an issue, make sure you've checked the following:

  • You are running the latest released version of k0s
  • Make sure you've searched for existing issues, both open and closed
  • Make sure you've searched for PRs too, a fix might've been merged already
  • You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

Linux 5.14.0-362.24.2.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Mar 30 14:11:54 EDT 2024 x86_64 GNU/Linux
NAME="AlmaLinux"
VERSION="9.3 (Shamrock Pampas Cat)"
ID="almalinux"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.3"
PLATFORM_ID="platform:el9"
PRETTY_NAME="AlmaLinux 9.3 (Shamrock Pampas Cat)"
ANSI_COLOR="0;34"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:almalinux:almalinux:9::baseos"
HOME_URL="https://almalinux.org/"
DOCUMENTATION_URL="https://wiki.almalinux.org/"
BUG_REPORT_URL="https://bugs.almalinux.org/"

ALMALINUX_MANTISBT_PROJECT="AlmaLinux-9"
ALMALINUX_MANTISBT_PROJECT_VERSION="9.3"
REDHAT_SUPPORT_PRODUCT="AlmaLinux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.3"

Version

v1.31.1+k0s.1

Sysinfo

`k0s sysinfo`
Total memory: 3.7 GiB (pass)
File system of /var/lib: xfs (pass)
Disk space available for /var/lib/k0s: 26.7 GiB (pass)
Relative disk space available for /var/lib/k0s: 88% (pass)
Name resolution: localhost: [::1 127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 5.14.0-362.24.2.el9_3.x86_64 (pass)
  Max. file descriptors per process: current: 524288 / max: 524288 (pass)
  AppArmor: unavailable (pass)
  Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": available (device filters attachable) (pass)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: built-in (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: module (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

In the keepalived.conf provided by the CPLB, the virtual_server section is enabled for all control planes, which is an incorrect setting.

If a SYN is received on a control plane that is a MASTER by definition of the vrrp_instance section, it will be load balanced according to the definition of the virtual_server section of the MASTER.
If the BACKUP side is selected at this time, the virtual_server will also operate on the control plane that is the BACKUP, and load balancing will occur there as well.
The next time the MASTER is selected, the persistence_timeout is in effect, so it goes to the BACKUP again, which in turn goes to the MASTER, and so on in a loop.

To solve this problem, make the configuration in the virtual_server section a separate file and load it using the include parameter. Run the script using the notify_master and notify_backup parameters in the vrrp_instance section, with the include parameter enabled only for MASTER and the include parameter for BACKUP comment out and reload keepalived.
(Recognizing that reloading keepalived will cause the notify_backup script to run again, so a check mechanism is needed to prevent a reload loop from occurring.)

The following is a reference site, although it is in Japanese.
https://weseek.co.jp/tech/2989/#keepalived-2

Steps to reproduce

Expected behavior

No response

Actual behavior

No response

Screenshots and logs

No response

Additional context

No response

@chattytak chattytak added the bug Something isn't working label Nov 4, 2024
@chattytak
Copy link
Author

          controlPlaneLoadBalancing:
            enabled: true
            type: Keepalived
            keepalived:
              vrrpInstances:
              - virtualIPs: ["10.1.0.10/24"]
                authPass: CPLB
              virtualServers:
              - ipAddress: "10.1.0.10"

With this setting, LVS is enabled for all control planes.
Control Plane A -> Control Plane B -> Control Plane A -> Control Plane B -> ...
An infinite loop is generated by alternating repetition.

Since this occurs by chance, you may not encounter the problem right away.
If the problem occurs, using the kubectl command will result in a connection failure with the error “No route to host”.
A packet capture using tcpdump will show an infinite loop.
After a few loops, the TTL expires and kubectl receives an ICMP notification saying “icmp time exceeded in-transit”.

          controlPlaneLoadBalancing:
            enabled: true
            type: Keepalived
            keepalived:
              vrrpInstances:
              - virtualIPs: ["10.1.0.10/24"]
                authPass: CPLB
              virtualServers:
              - ipAddress: "10.1.0.10"
                lbAlgo: sh

As a temporary workaround, I changed lbAlgo to “sh” as described above.
The Source Hash algorithm determines the balancing destination by hashing the source IP,
so even if LVS is running on all control planes, the destination will be unique and theoretically should not be an infinite loop.
So far it works fine.

The CPLB LbAlgo is “rr” by default, which is RoundRobin, but I don't think this is appropriate because it can cause a cycle.

Has anyone else noticed this happening?

@chattytak
Copy link
Author

chattytak commented Nov 9, 2024

For example, in this configuration, the active VIPs are held by 10.1.0.3.
First, load balancing is operated by 10.1.0.3's LVS.
If you check the ipvsadm of 10.1.0.3, most ActiveConn is facing 10.1.0.2.
This means the packet is forwarded from 10.1.0.3 to 10.1.0.2.
However, because of “lb_kind DR”, the destination of the forwarded TCP packet is still “10.1.0.10:6443”.
Therefore, packets forwarded to 10.1.0.2 also match the LVS rule.
Next, load balancing is activated by the LVS on 10.1.0.2.
Checking the ipvsadm for 10.1.0.2, most AcctiveConn is facing 10.1.0.2.
Therefore, the packet arrives at itself and reaches the KubeAPI server process.

-> [ 10.1.0.3(lvs)] -> [ 10.1.0.2(lvs)] -> [10.1.0.2(kube-apiserver)]

Because of the “lb_algo sh”, the 10.1.0.3(lvs) draw and the 10.1.0.2(lvs) draw have the same result and no infinite loop occurs.
In the case of “lb_algo rr”, the drawings for 10.1.0.3(lvs) and 10.1.0.2(lvs) could be different, resulting in an infinite loop.

# for i in 10.1.0.{2..4} ; do echo $i ; ssh $i -- ip -4 --oneline addr show | grep -e enp6s18 -e dummyvip0; done
10.1.0.2
2: enp6s18    inet 10.1.0.2/24 brd 10.1.0.255 scope global noprefixroute enp6s18\       valid_lft forever preferred_lft forever
7: dummyvip0    inet 10.1.0.10/32 scope global dummyvip0\       valid_lft forever preferred_lft forever
10.1.0.3
2: enp6s18    inet 10.1.0.3/24 brd 10.1.0.255 scope global noprefixroute enp6s18\       valid_lft forever preferred_lft forever
2: enp6s18    inet 10.1.0.10/24 scope global secondary enp6s18\       valid_lft forever preferred_lft forever
10: dummyvip0    inet 10.1.0.10/32 scope global dummyvip0\       valid_lft forever preferred_lft forever
10.1.0.4
2: enp6s18    inet 10.1.0.4/24 brd 10.1.0.255 scope global noprefixroute enp6s18\       valid_lft forever preferred_lft forever
5: dummyvip0    inet 10.1.0.10/32 scope global dummyvip0\       valid_lft forever preferred_lft forever
# for i in 10.1.0.{2..4} ; do echo $i ; ssh $i -- ipvsadm -L -n; done
10.1.0.2
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.1.0.10:6443 sh persistent 360
  -> 10.1.0.2:6443                Route   1      4          0
  -> 10.1.0.3:6443                Route   1      1          0
  -> 10.1.0.4:6443                Route   1      0          0
10.1.0.3
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.1.0.10:6443 sh persistent 360
  -> 10.1.0.2:6443                Route   1      4          0
  -> 10.1.0.3:6443                Route   1      2          0
  -> 10.1.0.4:6443                Route   1      0          0
10.1.0.4
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.1.0.10:6443 sh persistent 360
  -> 10.1.0.2:6443                Route   1      0          0
  -> 10.1.0.3:6443                Route   1      0          0
  -> 10.1.0.4:6443                Route   1      0          0

@chattytak chattytak changed the title The keepalived.conf provided by the CPLB causes a network loop. The keepalive.conf provided by the CPLB creates an infinite loop. Nov 9, 2024
@muhlba91
Copy link

i run into the same issue and the workaround by @chattytak works for now.

@chattytak
Copy link
Author

I have confirmed that one of the three control planes crashes and cannot be restarted when applying configuration changes using k0sctl apply to an environment installed with CPLB (work-around already enabled) and NLLB enabled.
This is reproduced even after changing parameters unrelated to the CPLB, such as helm extention, and apply.
This does not occur when redundancy is achieved by using an external haproxy instead of CPLB and NLLB.

I don't think CPLB is well tested and should not be introduced into production, even with workarounds.

@juanluisvaladas
Copy link
Contributor

Thanks for the report. Indeed it's not tested enough, that's why it's a beta feature.
The first course of action will be introducing soon a userspace load balancer which should avoid these issues entirely and I will also look into the link provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants