-
Notifications
You must be signed in to change notification settings - Fork 753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stale IMDS entries could cause VPC CNI incorrectly release in-use IPs #3019
Comments
Instead of comparing the lengths, if we match the contents, will it prevent the bug under discussion? |
Yes, comparing contents would be a simple fix in our current code/setup since we only do IP pool action per minute. |
We've seen pod ips get reused even if it's assigned to another pod because of this issue and 2 separate eth interfaces are bound to the same ip (but only one works):
|
@BojanZelic-DD Basically worker nodes affected by this issue could be in one of the following states:
If state #1 or #2 occurs, the existing pod will lose network connectivity. In state #3, the existing pod may not show immediate symptoms, but it could transition to state #1 or #2 at any time in the future, even after IMDS corrects the discrepancies. |
This issue is now closed. Comments on closed issues are hard for our team to see. |
This can be closed only the release with the fix is available. |
The fix is available in 1.18.4 which is released. |
This issue is now closed. Comments on closed issues are hard for our team to see. |
What happened:
TL;DR: It's a code defect in VPC-CNI, which is trigged by an defect deployment in EC2 where IP allocation/release didn't reflect in Instance Metadata Service(IMDS). As a result, pod IPs used by existing pods might be released to EC2 or reused by another pod.
Background:
network/interfaces/macs/<eniMac>/local-ipv4s
that supposed to contains the set of IP addresses associated with a ENI, which is expected to be eventually consist with view#1 above over a short period.eniIPPoolReconcile
routine that tries to reconcile it's view#3 to match the real set of IP address on ENI(view#1). However, in order to save unnecessary API calls, it assumes IMDS(view#2) is same as EC2(view#1) if length($view#3) == length($view2)(This is the code defect that that corrupted due to IMDS bug), and will fallback to view#1 if length mismatch. Then it will add IPs to/from view#3 to match IMDS(view#2)/EC2(view#1).+1
in the actual comparison code, its a compensation for the primary IP of eni, which i deliberately excluded from this discussion & examples below for simplicity.What happened:
Due to some bugs in IMDS's end, the set of IP addresses associated with a ENI is no longer eventually consist with actual state from EC2 API. For example,
Let's focus on a particular case related to
eni-<redacted>
.With the above IMDS issue in place, the set of IP addresses for this ENI in the three view is as follows:
At
2024-08-29T22:14:11.001Z
, theeniIPPoolReconcile
routine executes. Since the length($view#3) and length($view#2) mismatches(M+3 != M+4), VPC CNI will fetch the view#1 from EC2 directly and compare it with view#3. Nothing happens as they matches.At
2024-08-29T22:14:30.579Z
, a new pod is created and got assigned IP address. Due to it's warm pool maintenance algorithm, at2024-08-29T22:14:33.645Z
VPC CNI allocated an new IP 10.56.145.254 to this ENI. Since this new IP didn't show up in IMDS as well, the three view became follows:At
2024-08-29T22:15:36.566Z
, theeniIPPoolReconcile
routine executes. Since now length($view#3) and length($view#2) matches(M+4 == M+4), VPC CNI trusted IMDS as source of truth, and does the following:[10.56.156.194, 10.56.151.27, 10.56.151.25, 10.56.145.254]
from it's internal cache(view#3).10.56.156.194
is the IP used by customer's pod. However, so far the pod network still works, it's just a marker.[10.56.150.175, 10.56.147.61, 10.56.159.130, 10.56.151.147]
to it's it's internal cache(view#3) thanks to those line of code. (complex and unrelated logic so i'll skip explanation of this part).So now the three view became
At
2024-08-29T22:15:39.197Z
, warm pool maintenance algorithm decides to add two additional IP to this ENI (10.56.148.72 & 10.56.154.181). Now the three view became follows:At
2024-08-29T22:16:42.107Z
, theeniIPPoolReconcile
routine executes. Since now length($view#3) and length($view#2) mismatches(M+2 != M+4), VPC CNI will fetch the view#1 from EC2 directly and compare it with view#3, and does the following:[10.56.156.194, 10.56.151.27, 10.56.151.25, 10.56.145.254]
back to it's internal cache(view#3). However those IPs won't be recorded as been occupied by any pods.At
2024-08-29T22:16:44.731Z
. The warm IP maintenance algorithm runs again, which noticed there are a lot IP address not assigned to any pod(including our 10.56.156.194), thus it decides to release those IP address back to EC2 via UnassignPrivateIpAddresses. After this, pod networking for pod with 10.56.156.194 starts broken.Attach logs
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
):cat /etc/os-release
):uname -a
):The text was updated successfully, but these errors were encountered: