-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kilo pods high cpu usage #304
Comments
Hi @faraonc that is very high indeed and not normal for Kilo! If you'd like to give it a shot, please open a PR :) Otherwise I'll do it this afternoon. In the meantime, could you share logs from one of the high CPU-usage Kilo pods? |
When we've seen high CPU issues in the past, it's tended to be either due to constant reconciliation because some configuration doesn't match or Kilo and another process are fighting OR because the iptables controller is going crazy and spawning lots of iptables processes because some rules don't match what Kilo expects. We haven't seen the iptables issue in a year, so hopefully it's not that. |
Thanks for pointing out where the change is needed. I might just build and test it on my deployments. I might do a PR next week if you have not done so. |
Hi @faraonc i made the PR earlier today: #305 Something funny is going in in that cluster for sure. The WireGuard configuration keeps changing really really fast. There are constantly new, unexpected changed to the number of peers. Are you connecting WireGuard peers to the cluster besides nodes? |
@faraonc the PR just merged, so you should be able to use the |
Thanks a lot. I will be deploying this on Monday, and check what is going on. I will share the outcome. |
As i mentioned earlier, i strongly suspect that Kilo is getting confused with the WireGuard configuration. Any info you could share about the peers in the cluster / what other processes might be interacting with the cluster would be very helpful. |
Hi @faraonc, any update? |
Totally understand 👍 |
We started seeing kilo pods having high CPU usage. This is degrading the network and causing our apps to have timeouts.
To workaround this issue, we would kill the pod whenever it happens. We tried upgrading the daemonset to use image
0.4.1
, but it did not resolve the issue. Is there already a built-in way to profile the CPU? Does anybody have a process to investigate the root cause?The text was updated successfully, but these errors were encountered: