Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

svclb pods not deleted when services are deleted #5823

Closed
brandond opened this issue Jul 8, 2022 · 9 comments
Closed

svclb pods not deleted when services are deleted #5823

brandond opened this issue Jul 8, 2022 · 9 comments
Assignees
Labels
kind/bug Something isn't working
Milestone

Comments

@brandond
Copy link
Member

brandond commented Jul 8, 2022

Following the merge of #5657, deleting LoadBalancer Services leaves orphan svclb pods that block reuse of ports by subsequent services.

Steps to reproduce:

brandond@dev01:~$ kubectl apply -f - <<EOF
apiVersion: v1
kind: Service
metadata:
  name: test
  namespace: default
spec:
  type: LoadBalancer
  selector:
    app.kubernetes.io/name: test
  ports:
  - name: web
    port: 8080
    protocol: TCP
    targetPort: 8080
EOF
service/test created

brandond@dev01:~$ kubectl get pod -n kube-system -l svccontroller.k3s.cattle.io/svcnamespace=default,svccontroller.k3s.cattle.io/svcname=test
NAME                        READY   STATUS    RESTARTS   AGE
svclb-test-31792a59-p4mw2   1/1     Running   0          26s

brandond@dev01:~$ kubectl delete service -n default test
service "test" deleted

brandond@dev01:~$ kubectl get service -n default test
Error from server (NotFound): services "test" not found

brandond@dev01:~$ kubectl get pod -n kube-system -l svccontroller.k3s.cattle.io/svcnamespace=default,svccontroller.k3s.cattle.io/svcname=test
NAME                        READY   STATUS    RESTARTS   AGE
svclb-test-31792a59-p4mw2   1/1     Running   0          59s
@brandond
Copy link
Member Author

brandond commented Jul 8, 2022

This was reported on Rancher Users slack: https://rancher-users.slack.com/archives/C3ASABBD1/p1657304386040799

We are using the KlipperLB load balancer and are finding that the svclb services are getting created in the kube-system namespaces. In addition, we can not delete the pods in that namespace - the initial pod is deleted, but then it gets recreated, so it won't die.
when we stop the Load Balancer service, the svclb instances are not killed, as we have seen in the past. We restart the LB service (with new definitions), we seem to end up with conflicts of ports / names, due to the lingering svclbs

Initial pods in kube-system namespace:

k get pods -n kube-system
NAME                                                        READY   STATUS    RESTARTS       AGE
coredns-d76bd69b-bnx9m                                      1/1     Running   3 (9d ago)     9d
metrics-server-7cd5fcb6b7-gflch                             1/1     Running   3 (9d ago)     9d
local-path-provisioner-6c79684f77-hlwsg                     1/1     Running   5 (9d ago)     9d
svclb-rook-ceph-mgr-dashboard-loadbalancer-1364e2b9-k7wsp   1/1     Running   0              7d23h
svclb-rook-ceph-mgr-dashboard-loadbalancer-1364e2b9-df8cj   1/1     Running   0              7d23h
svclb-rook-ceph-mgr-dashboard-loadbalancer-1364e2b9-sxb88   1/1     Running   0              7d23h
svclb-rook-ceph-mgr-dashboard-loadbalancer-1364e2b9-7nqq6   1/1     Running   1 (7d6h ago)   7d23h
svclb-istio-ingressgateway-c9776a98-m6hzn                   3/3     Running   0              3d2h
svclb-istio-ingressgateway-c9776a98-kwvr7                   3/3     Running   0              3d2h
svclb-istio-ingressgateway-c9776a98-4jn4m                   3/3     Running   0              3d2h
svclb-istio-ingressgateway-c9776a98-hd4jw                   3/3     Running   0              3d2h
svclb-keycloak-fb0704bb-pnq85                               1/1     Running   0              2d23h
svclb-keycloak-fb0704bb-xfh2p                               1/1     Running   0              2d23h
svclb-keycloak-fb0704bb-llc4r                               1/1     Running   0              2d23h
svclb-keycloak-fb0704bb-cjx4v                               1/1     Running   0              2d23h
svclb-postgres-db-40a02dd4-7h7q7                            1/1     Running   0              2d22h
svclb-postgres-db-40a02dd4-svf68                            1/1     Running   0              2d22h
svclb-postgres-db-40a02dd4-cn25l                            1/1     Running   0              2d22h
svclb-postgres-db-40a02dd4-tskt4                            1/1     Running   0              2d22h
svclb-stackgres-restapi-e9fc8149-xqhgw                      0/1     Pending   0              23h
svclb-stackgres-restapi-e9fc8149-jp4cx                      0/1     Pending   0              23h
svclb-stackgres-restapi-e9fc8149-444fj                      0/1     Pending   0              23h
svclb-stackgres-restapi-e9fc8149-jxgjm                      0/1     Pending   0              23h
svclb-nginx-service-2fcd8743-zbxv6                          1/1     Running   0              3h9m
svclb-nginx-service-2fcd8743-dpw6q                          1/1     Running   0              3h9m
svclb-nginx-service-2fcd8743-hj4gk                          1/1     Running   0              3h9m
svclb-nginx-service-2fcd8743-dt5pn                          1/1     Running   0              3h9m
svclb-nginx-b1379761-dhw6w                                  1/1     Running   0              3h27m
svclb-nginx-b1379761-zcgnd                                  1/1     Running   0              3h27m
svclb-nginx-b1379761-czflw                                  1/1     Running   0              3h27m
svclb-nginx-b1379761-jbpmv                                  1/1     Running   0              3h27m
svclb-west2-service-9d3852cf-rjhfw                          1/1     Running   0              64m
svclb-west2-service-9d3852cf-9p29w                          1/1     Running   0              64m
svclb-west2-service-9d3852cf-t2s5h                          1/1     Running   0              64m
svclb-west2-service-9d3852cf-7z2xw                          1/1     Running   0              64m
svclb-test-service-ac1dfd76-6qj28                           1/1     Running   0              47m
svclb-test-service-ac1dfd76-ckfdz                           1/1     Running   0              47m
svclb-test-service-ac1dfd76-cmdgz                           1/1     Running   0              47m
svclb-test-service-ac1dfd76-47j66                           1/1     Running   0              47m
svclb-test-f79880dc-gxrcx                                   0/2     Pending   0              4m35s
svclb-test-f79880dc-l9b9k                                   0/2     Pending   0              4m35s
svclb-test-f79880dc-x2bcz                                   0/2     Pending   0              4m35s
svclb-test-f79880dc-zkwdz                                   0/2     Pending   0              4m35s
svclb-test-replicas-2fef0816-l2xfv                          0/2     Pending   0              4m35s
svclb-test-replicas-2fef0816-zctcp                          0/2     Pending   0              4m35s
svclb-test-replicas-2fef0816-lgj2v                          0/2     Pending   0              4m35s
svclb-test-replicas-2fef0816-gjgwv                          0/2     Pending   0              4m35s

After deleting a couple of pods:

k delete pods -n kube-system svclb-test-f79880dc-gxrcx svclb-test-f79880dc-l9b9k
pod "svclb-test-f79880dc-gxrcx" deleted
pod "svclb-test-f79880dc-l9b9k" deleted
(envgen)$ k get pods -n kube-system
NAME                                                        READY   STATUS    RESTARTS       AGE
coredns-d76bd69b-bnx9m                                      1/1     Running   3 (9d ago)     9d
metrics-server-7cd5fcb6b7-gflch                             1/1     Running   3 (9d ago)     9d
local-path-provisioner-6c79684f77-hlwsg                     1/1     Running   5 (9d ago)     9d
svclb-rook-ceph-mgr-dashboard-loadbalancer-1364e2b9-k7wsp   1/1     Running   0              7d23h
svclb-rook-ceph-mgr-dashboard-loadbalancer-1364e2b9-df8cj   1/1     Running   0              7d23h
svclb-rook-ceph-mgr-dashboard-loadbalancer-1364e2b9-sxb88   1/1     Running   0              7d23h
svclb-rook-ceph-mgr-dashboard-loadbalancer-1364e2b9-7nqq6   1/1     Running   1 (7d6h ago)   7d23h
svclb-istio-ingressgateway-c9776a98-m6hzn                   3/3     Running   0              3d2h
svclb-istio-ingressgateway-c9776a98-kwvr7                   3/3     Running   0              3d2h
svclb-istio-ingressgateway-c9776a98-4jn4m                   3/3     Running   0              3d2h
svclb-istio-ingressgateway-c9776a98-hd4jw                   3/3     Running   0              3d2h
svclb-keycloak-fb0704bb-pnq85                               1/1     Running   0              2d23h
svclb-keycloak-fb0704bb-xfh2p                               1/1     Running   0              2d23h
svclb-keycloak-fb0704bb-llc4r                               1/1     Running   0              2d23h
svclb-keycloak-fb0704bb-cjx4v                               1/1     Running   0              2d23h
svclb-postgres-db-40a02dd4-7h7q7                            1/1     Running   0              2d22h
svclb-postgres-db-40a02dd4-svf68                            1/1     Running   0              2d22h
svclb-postgres-db-40a02dd4-cn25l                            1/1     Running   0              2d22h
svclb-postgres-db-40a02dd4-tskt4                            1/1     Running   0              2d22h
svclb-stackgres-restapi-e9fc8149-xqhgw                      0/1     Pending   0              23h
svclb-stackgres-restapi-e9fc8149-jp4cx                      0/1     Pending   0              23h
svclb-stackgres-restapi-e9fc8149-444fj                      0/1     Pending   0              23h
svclb-stackgres-restapi-e9fc8149-jxgjm                      0/1     Pending   0              23h
svclb-nginx-service-2fcd8743-zbxv6                          1/1     Running   0              3h11m
svclb-nginx-service-2fcd8743-dpw6q                          1/1     Running   0              3h11m
svclb-nginx-service-2fcd8743-hj4gk                          1/1     Running   0              3h11m
svclb-nginx-service-2fcd8743-dt5pn                          1/1     Running   0              3h11m
svclb-nginx-b1379761-dhw6w                                  1/1     Running   0              3h29m
svclb-nginx-b1379761-zcgnd                                  1/1     Running   0              3h29m
svclb-nginx-b1379761-czflw                                  1/1     Running   0              3h29m
svclb-nginx-b1379761-jbpmv                                  1/1     Running   0              3h29m
svclb-west2-service-9d3852cf-rjhfw                          1/1     Running   0              66m
svclb-west2-service-9d3852cf-9p29w                          1/1     Running   0              66m
svclb-west2-service-9d3852cf-t2s5h                          1/1     Running   0              66m
svclb-west2-service-9d3852cf-7z2xw                          1/1     Running   0              66m
svclb-test-service-ac1dfd76-6qj28                           1/1     Running   0              48m
svclb-test-service-ac1dfd76-ckfdz                           1/1     Running   0              48m
svclb-test-service-ac1dfd76-cmdgz                           1/1     Running   0              48m
svclb-test-service-ac1dfd76-47j66                           1/1     Running   0              48m
svclb-test-f79880dc-x2bcz                                   0/2     Pending   0              6m23s
svclb-test-f79880dc-zkwdz                                   0/2     Pending   0              6m23s
svclb-test-replicas-2fef0816-l2xfv                          0/2     Pending   0              6m23s
svclb-test-replicas-2fef0816-zctcp                          0/2     Pending   0              6m23s
svclb-test-replicas-2fef0816-lgj2v                          0/2     Pending   0              6m23s
svclb-test-replicas-2fef0816-gjgwv                          0/2     Pending   0              6m23s
svclb-test-f79880dc-bfhcc                                   0/2     Pending   0              3s
svclb-test-f79880dc-vz52q                                   0/2     Pending   0              3s

@brandond brandond self-assigned this Jul 8, 2022
@brandond brandond added priority/critical-urgent kind/bug Something isn't working labels Jul 8, 2022
@brandond brandond added this to the v1.24.3+k3s1 milestone Jul 8, 2022
@ralphflat
Copy link

ralphflat commented Jul 8, 2022

See also same in KlipperLB github: k3s-io/klipper-lb#38

We tore down our entire cluster and reinstalled (1 master and 3 worker nodes). Then we started nginx with the following configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
        volumeMounts:
            - name: nginx-conf
              mountPath: /etc/nginx/nginx.conf
              subPath: nginx.conf
              readOnly: true
            - name: nginx-media
              mountPath: /usr/share/nginx/media
              readOnly: false
      volumes:
      - name: nginx-conf
        configMap:
          name: nginx-conf
          items:
            - key: nginx.conf
              path: nginx.conf
      - name: nginx-media
        persistentVolumeClaim:
          claimName: nginx-pv-claim

---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  type: LoadBalancer
  ports:
  - port: 9080
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx

Reviewing the different namespaces, we saw the following (as expected):

image

Then we executed a "k delete -f nginx-test.yaml" with the following results:

kubectl delete -f nginx-test.yaml
deployment.apps "nginx" deleted
service "nginx" deleted
kubectl get pods
No resources found in default namespace.
[root@sdie-2-pm-4 nginx-test]# kubectl get services
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.43.0.1    <none>        443/TCP   164m
[root@sdie-2-pm-4 nginx-test]# kubectl get pods -n kube-system
NAME                                      READY   STATUS    RESTARTS   AGE
local-path-provisioner-6c79684f77-m8sfx   1/1     Running   0          164m
coredns-d76bd69b-47x8m                    1/1     Running   0          164m
metrics-server-7cd5fcb6b7-r7p8d           1/1     Running   0          164m
svclb-nginx-8acfcf71-5f565                1/1     Running   0          110s
svclb-nginx-8acfcf71-hqmf6                1/1     Running   0          110s
svclb-nginx-8acfcf71-jhgpp                1/1     Running   0          110s
svclb-nginx-8acfcf71-q4z4q                1/1     Running   0          110s

Note the four svclb-nginx still running (and they are still running 8 mins after delete was issued.

@brandond
Copy link
Member Author

brandond commented Jul 8, 2022

As a temporary work-around, you can roll back your cluster to v1.24.1, which does not include the above-linked PR. You could also manually delete the svclb DaemonSets after deleting their Service.

@ralphflat
Copy link

ralphflat commented Jul 11, 2022

According to K3S software, we are using version 1.23.8

k3s --version
k3s version v1.23.8+k3s1 (53f2d4e7)
go version go1.17.5

Also, 1.24.1 is not available on github. Sorry - I am in error, 1.24.1 is available. I did not look back far enough. However, this would be advancing a version from what we are currently doing.

@brandond BTW, my colleague has been using 1.23.6_k3s1 without any problems. Is this a good version to drop back to?

@ralphflat
Copy link

@brandond - I am able to download the 1.23.6_k3s1 by clicking on download tarball or zip icon. I am not able to figure out the combination of how to install a specific tagged version. I have tried setting the INSTALL_K3S_VERSION and that seems to be trying to get that version. However, it is not found in the releases area. I have also tried to work with INSTALL_K3S_CHANNEL_URL and INSTALL_K3S_SKIP_ENABLE without getting the software to install.

Please provide details on how to install tagged version that is not a release.

@brandond
Copy link
Member Author

brandond commented Jul 11, 2022

Releases are named something like v1.23.6+k3s1 - so you'd want to set INSTALL_K3S_VERSION=v1.23.6+k3s1

@rancher-max
Copy link
Contributor

Validated on master branch with commitid ffe72eecc4cb058ba8d4635450f99c88fd71b71c

Environment Details

Infrastructure

  • Cloud (AWS)
  • Hosted

Node(s) CPU architecture, OS, and Version:

$ uname -a
Linux ip-172-31-46-97 5.15.0-1011-aws #14-Ubuntu SMP Wed Jun 1 20:54:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Cluster Configuration:

1 server

Config.yaml:

N/A

Additional files

N/A

Testing Steps

  1. Install k3s, I used the following EXEC arg during install: INSTALL_K3S_EXEC="server --write-kubeconfig-mode 644 --cluster-init --token secret"
  2. After node is Ready and pods are all Running, deploy service:
kubectl apply -f - <<EOF
apiVersion: v1
kind: Service
metadata:
  name: test
  namespace: default
spec:
  type: LoadBalancer
  selector:
    app.kubernetes.io/name: test
  ports:
  - name: web
    port: 8080
    protocol: TCP
    targetPort: 8080
EOF
  1. Ensure it is created and svclb pod is created and running:
$ kubectl get pod -n kube-system -l svccontroller.k3s.cattle.io/svcnamespace=default,svccontroller.k3s.cattle.io/svcname=test

$ kubectl get service -n default test
  1. Delete service: kubectl delete service -n default test

Replication Results:

  • k3s version used for replication:
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.24.2+k3s2 INSTALL_K3S_EXEC="server --write-kubeconfig-mode 644 --cluster-init --token secret" sh -
# After deleting the service:
$ kubectl get service -n default test
Error from server (NotFound): services "test" not found

$ kubectl get pod -n kube-system -l svccontroller.k3s.cattle.io/svcnamespace=default,svccontroller.k3s.cattle.io/svcname=test
NAME                        READY   STATUS    RESTARTS   AGE
svclb-test-2d2d1b5e-msd8n   1/1     Running   0          57s

Validation Results:

  • k3s version used for validation:
curl -sfL https://get.k3s.io | INSTALL_K3S_COMMIT=ffe72eecc4cb058ba8d4635450f99c88fd71b71c INSTALL_K3S_EXEC="server --write-kubeconfig-mode 644 --cluster-init --token secret" sh -
# After deleting the service
$ kubectl get service -n default test
Error from server (NotFound): services "test" not found

$ kubectl get pod -n kube-system -l svccontroller.k3s.cattle.io/svcnamespace=default,svccontroller.k3s.cattle.io/svcname=test
No resources found in kube-system namespace.

As can be seen, the svclb pod is deleted now correctly.

@vxav

This comment was marked as outdated.

@brandond
Copy link
Member Author

brandond commented Oct 19, 2023

@vxav please don't "me too" on old closed issues. Open a new issue, fill out the issue template, and provide steps to reproduce.

@k3s-io k3s-io locked as resolved and limited conversation to collaborators Oct 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants