Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flanneld doesn't reconnect to the apiserver #1272

Closed
angeloxx opened this issue Mar 24, 2020 · 9 comments
Closed

Flanneld doesn't reconnect to the apiserver #1272

angeloxx opened this issue Mar 24, 2020 · 9 comments
Labels

Comments

@angeloxx
Copy link

angeloxx commented Mar 24, 2020

When flanneld 0.12 on Windows worker node lost the connection with the apiserver, it doesn't retry to connect but continues to log:

reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to list *v1.Node: Get https://10.128.0.12:6443/api/v1/nodes?resourceVersion=0: http2: no cached connection was available`

Expected Behavior

Flanneld should reconnect to the apiserver

Steps to Reproduce (for bugs)

In our environment the easiest way to reproduce the issue is to move the floating IP to a different master node.

Your Environment

  • Flannel version: 0.12.0
  • Startup Option: --kube-subnet-mgr --kubeconfig-file=<kubelet-file.conf>
  • Backend used: vxlan
  • Kubernetes version (if used): 1.16.3 on premise, 3 master nodes and we're using a floating IP (managed by keepalived) that is used by flanneld as backend
  • Operating System and version:
    • Master: Redhat Linux 7
    • Worker: Windows 2019 10.0.0.17763.934
@DerrickMartinez
Copy link

I have reproduced this as well in 0.12.0. When rolling updating master nodes, it will cause this to go into panic

@DerrickMartinez
Copy link

Any update on this issue? It's bad enough so 0.12.0 it's not production ready in a windows env. Thanks

@luthermonson
Copy link
Contributor

@DerrickMartinez you have any logs you can post? i have to look into something i think is related.

@EagleIJoe
Copy link

Windows Update KB4551853 screws up my flannel connections. Could be related. Is not only related to flannel but also affects other CNI (ex Docker swarm).

@KnicKnic
Copy link

I have a workaround for this issue, see KnicKnic@023f21b

@jsturtevant
Copy link

This looks to be an open standing issue with the golang client: kubernetes/client-go#374

It looks like this type of failure is handled directly in kubelet as a workaround: kubernetes/kubernetes#78016 and other CNI's handle it as well: AliyunContainerService/terway#87

@rhockenbury
Copy link

With kubernetes/kubernetes#95981 merged, I believe all that needs to be done to resolve this is bumping k8s.io/go-client to 1.20 after release or the 1.19 backport.

@rhockenbury
Copy link

And the cherrypick PR for 1.19 is now open - kubernetes/kubernetes#96778

1.19.5 is slated for release on 12/9 per https://github.com/kubernetes/sig-release/blob/master/releases/patch-releases.md.

@stale
Copy link

stale bot commented Jan 25, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jan 25, 2023
@stale stale bot closed this as completed Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants