-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address istio mesh e2e mesh failures #11286
Comments
/assign @arturenault |
@dprotaso @arturenault The root cause is The option makes tests failed w/o sidecar injection. The failed tests disable sidecar as: serving/test/e2e/service_to_service_test.go Line 123 in d30f0bd
I verified these tests worked if So, we need to change either: a) Remove |
Thanks for looking into this @nak3. I think option 2 makes sense. I'll send a PR later today. |
yeah - some of the tests skip disable sidecars so the test will never pass with STRICT mTLS |
Now, I am inclined to think "a) Remove So can we remove the option in net-istio https://github.com/knative-sandbox/net-istio/blob/main/third_party/istio-stable/istio-kind-mesh/config-istio.yaml#L11 ? (I kicked the test run in #11503) cc @ZhiminXiang |
Then are we really running |
I think we use mTLS STRICT for mesh https://github.com/knative-sandbox/net-istio/blob/8a3cf7e23bc14a27218bddaf0a3757c4d64f891a/third_party/istio-latest/extra/istio-mesh.yaml#L9 |
I think we can remove the option |
I'm confused I thought by removing I'd only consider sidecar to sidecar to be 'mesh' |
I believe removing this just allows access through the gateway for pods that arent in the mesh, it doesn't require it, and pods with a sidecar will still go pod-to-pod. |
@julz is right. From my test knative-extensions/net-istio#562 (comment), after removing |
ok great :) |
So we will proceed this issue by the following steps:
I will send the PRs so please help your reviews 🙏 |
Thanks all, sorry for dropping the ball on the previous PR (got busy with other things and then went on vacation for 2 weeks). Sounds like we're going with a different solution and assign this issue to kenjiro. /assign @nak3 |
So, the last monster is #11552 verified that all test passed with I think we have a few options for the solution like: 🟢 ... test will pass. option 1. Use short option for pre-submit test.
option 2. Add new job trigger
|
And I am thinking that option 2 is good for now. |
Option 2 seems reasonable to me too fwiw (I would love, at some point, separately, to figure out if we can tweak the timeouts on the full scale tests in mesh mode to have it pass, though -- it's obviously not ideal if we have no scale tests for mesh mode long-term since I imagine people who use mesh might have relatively big clusters/workloads) |
To @julz's point is configuring the timeouts for mesh an option easier option - ie. make them test flags |
aside: a caveat is these timings would be sensitive to GKE performance and it could regress again. I think these scale tests should be spun up on GCE/kubeadm were we have more control of the machine types of the API server But we've only been bitten by this once so maybe not worth doing so yet |
This patch adds short option to e2e test and support it on scale test. Please see knative#11286 for the discussion.
Now, mesh test fails with only Should we close this issue? Or keep open until |
Personal opinion: I reckon since we now have consistently green tests (🎉🎉!) we should close this issue but we should create another one to track re-enabling the fill scale-200 (unless we already did obviously), because we do want to do that one way or another |
testgrid still shows red for me - is there a new job? |
The mesh test's failure is knative/test-infra#2817 added |
I guess I'm trying to answer the question:
If we know scale-200 is going to fail should we use skip it for the mesh case and decide after whether it's possible to fix? |
|
/area test-and-release
/area networking
/kind cleanup
/kind good-first-issue
Expected Behavior
e2e tests pass when running our e2e tests with istio and the local gateway disabled (mesh only) with mTLS enabled (STRICT). At a minimum we should be skipping tests we know that will fail when strict mTLS is enabled.
This also affects DomainMapping but that's being tracked by knative-extensions/net-istio#562.
Actual Behavior
The following tests will always fail since we have global mTLS turned on (STRICT)
Steps to Reproduce the Problem
ie. [istio-stable] renable global mTLS when in mesh mode knative-extensions/net-istio#617
Additional Info
The text was updated successfully, but these errors were encountered: