Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite fast loop on sidecar injector #415

Open
dguihal opened this issue Jan 16, 2023 · 13 comments
Open

Infinite fast loop on sidecar injector #415

dguihal opened this issue Jan 16, 2023 · 13 comments
Labels
bug Something isn't working

Comments

@dguihal
Copy link

dguihal commented Jan 16, 2023

Describe the bug

Here is a sample take from vault sidecar injector deployed on Azure AKS cluster

2023-01-16T10:40:35.827Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:35.934Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:35.941Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:36.110Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:36.119Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:36.218Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:36.225Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:36.334Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:36.342Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:36.615Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:36.624Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:36.734Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:36.745Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:36.909Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:36.916Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:37.016Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:37.025Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:37.133Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:37.145Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:37.415Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:37.422Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:37.534Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:37.540Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:37.712Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:37.719Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:37.816Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:37.823Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:37.934Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:37.954Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:38.215Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:38.223Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:38.433Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...
2023-01-16T10:40:38.443Z [INFO] handler.certwatcher: Webhooks changed. Updating certs...

Looks like there is some fast loop event which doesn't end.

Strangely same deployment on AWS EKS cluster doesn't show the same behavior

To Reproduce

Seen with 0.17 and 1.0.1 version of docker image
Seen with 0.22 and 0.23 version of helm chart

Application deployment:

---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: vault-injector
  namespace: vault
spec:
  interval: 5m
  releaseName: vault-injector
  chart:
    spec:
      chart: vault
      version: 0.22.1
      sourceRef:
        kind: HelmRepository
        name: hashicorp
        namespace: tooling
      interval: 10m
  values:
    injector:
      enabled: true
      replicas: 1
      externalVaultAddr: https://vault-server-addr:8200
      authPath: auth/kubernetes
      logLevel: info
      resources:
        requests:
          memory: 100Mi
          cpu: 100m
        limits:
          memory: 256Mi
      securityContext:
        container:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
      image:
        repository: hashicorp/vault-k8s
        tag: 0.17.0
      agentImage:
        repository: hashicorp/vault
        tag: 1.11.3
      webhook:
        failurePolicy: Fail
        objectSelector:
          matchLabels:
            vault-sidecar-injector: "true"

Other useful info to include: kubectl describe deployment <app> and kubectl describe replicaset <app> output.

Expected behavior

Compared to same deployment on EKS clusters, I expected a far less output flow for this output

Environment

  • Kubernetes version:
    • Distribution or cloud vendor (OpenShift, EKS, GKE, AKS, etc.): AKS v1.24
    • Other configuration options or runtime services (istio, etc.): N/A
  • vault-k8s version: 0.17.0 (tested with 1.0.1 also)

Additional context

@dguihal dguihal added the bug Something isn't working label Jan 16, 2023
@tvoran
Copy link
Member

tvoran commented Jan 18, 2023

Hi @dguihal, that is a bit odd; that log message seems to indicate that the MutatingWebhookConfiguration is being updated by something repeatedly. I wonder if flux could be trying to reconcile it with its initial state, since the webhook config's clientConfig.caBundle is blank when first deployed, and then the injector generates its certs and keeps the webhook updated.

@dguihal
Copy link
Author

dguihal commented Jan 18, 2023

Hi @tvoran thanks for having taking time to answer, it indeed could have been an idea, but I think it's something else.

The MutatingWebhookConfiguration comes as part of the (hashicorp-vault) chart and not as a raw flux resource. So it isn't synced directly by flux. Only a redeployment of the helmrelease would trigger this (which doesn't occur).

Also we deploy this helmrelease on the same way on EKS clusters, with far less log spams (I happens from time to time but for short time frames)

Unfortunatly I'm not very familiar with k8s golang APIs

case <-webhooksWatcher:

webhooks := factory.Admissionregistration().V1().MutatingWebhookConfigurations()

I was thinking about some non acklowdeged signal or similar. But seems it doesn't exists from the code I can see

@tvoran
Copy link
Member

tvoran commented Jan 19, 2023

Right, the log message is coming from the part of the code that watches the mutating webhook config for changes, and then checks to make sure the caBundle is still set correctly. It does this to ensure that something besides vault-k8s doesn't clobber the webhook config.

Is the injector working when it's logging repeatedly like that in AKS?

You might be able to see what's changing by running kubectl describe mutatingwebhookconfiguration -n vault vault-injector-agent-injector-cfg in a loop or with watch.

@dguihal
Copy link
Author

dguihal commented Jan 19, 2023

Still investigating on my side. And I saw that "something" is adding a namespace selector on the mutatingwebhookconfiguration on AKS clusters (and not on EKS ones)

namespaceSelector:
matchExpressions:
- key: control-plane
operator: DoesNotExist

Could it be the source of those logs ?

@tvoran
Copy link
Member

tvoran commented Jan 19, 2023

Yep, that would cause those messages. It sounds like something in the Azure side is making those updates: Azure/AKS#1771 And they discuss it a bit here: https://learn.microsoft.com/en-us/azure/aks/faq#can-i-use-admission-controller-webhooks-on-aks

So the easiest way to reduce the log noise would be to add that selector to the chart deployment:

      webhook:
        failurePolicy: Fail
        objectSelector:
          matchLabels:
            vault-sidecar-injector: "true"
        namespaceSelector:
          matchExpressions:
            - key: control-plane
              operator: DoesNotExist

That should reduce the spurious updates from both sides.

@travisforte
Copy link

Tried @tvoran suggestion above which didn't stop the logs on my clusters AKS v1.25.6. @dguihal Were you ever able to resolve this?

@dguihal
Copy link
Author

dguihal commented May 17, 2023

@travisforte unfortunatly not ....

@pbromb
Copy link

pbromb commented May 26, 2023

I have the same problem when I have 2 pods of vault injector running to provide HA

@rcousens
Copy link

rcousens commented Nov 24, 2023

@pbromb My guess is this is because each pod in the deployment is updating the CA bundle and triggering the other pods. I am still working out if I can solve it in Helm

I solved it, had to add

injector:
  leaderElector:
    enabled: true

@travisforte
Copy link

@rcousens Thought that the leaderElector option is enabled by default? So adding that shouldn't change anything. I tried it anyway and found no difference, was there anything else that you changed?

@revanthalampally
Copy link

I am seeing the same issue on my AKS cluster

kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-11T13:17:17Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.10", GitCommit:"b8609d4dd75c5d6fba4a5eaa63a5507cb39a6e99", GitTreeState:"clean", BuildDate:"2023-10-23T16:10:33Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"}

Im using injector image:

  # image sets the repo and tag of the vault-k8s image to use for the injector.
  image:
    repository: "hashicorp/vault-k8s"
    tag: "1.0.0"
    pullPolicy: IfNotPresent

Chart.yaml

apiVersion: v2
name: vault
version: 0.22.0
appVersion: 1.11.3
kubeVersion: ">= 1.16.0-0"
description: Official HashiCorp Vault Chart
home: https://www.vaultproject.io
icon: https://github.com/hashicorp/vault/raw/f22d202cde2018f9455dec755118a9b84586e082/Vault_PrimaryLogo_Black.png
keywords: ["vault", "security", "encryption", "secrets", "management", "automation", "infrastructure"]
sources:
  - https://github.com/hashicorp/vault
  - https://github.com/hashicorp/vault-helm
  - https://github.com/hashicorp/vault-k8s
  - https://github.com/hashicorp/vault-csi-provider

kubectl logs -n vault vault-agent-injector-86c8f7dccf-9lmw9 | head -n 15

Using internal leader elector logic for webhook certificate management
2023-12-20T06:35:02.707Z [INFO]  handler: Starting handler..
Listening on ":8080"...
2023-12-20T06:35:02.808Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2023-12-20T06:35:02.808Z [WARN]  handler.certwatcher: Could not load TLS keypair: tls: failed to find any PEM data in certificate input. Trying again...
2023-12-20T06:35:02.808Z [INFO]  handler.certwatcher: Updated certificate bundle received. Updating certs...
2023-12-20T06:35:02.808Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2023-12-20T06:35:02.808Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2023-12-20T06:35:02.808Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2023-12-20T06:35:02.808Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2023-12-20T06:35:02.809Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2023-12-20T06:35:02.809Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2023-12-20T06:35:02.809Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2023-12-20T06:35:02.838Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2023-12-20T06:35:02.866Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...

I do see the logs

injector:
leaderElector:
    enabled: true

@fsadykov
Copy link

fsadykov commented Jan 26, 2024

What is the solution on this issue? We are facing same issue on GKE.

2024-01-26T01:25:07.160Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:07.342Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:07.352Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:07.542Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:07.553Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:07.740Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:07.751Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:07.942Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:07.953Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:08.141Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:08.151Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:08.342Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:08.352Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:08.544Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:08.550Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:08.742Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:08.754Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:08.946Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...

This is already enabled

injector:
  leaderElector:
    enabled: true

@browol
Copy link

browol commented Aug 6, 2024

What is the solution on this issue? We are facing same issue on GKE.

2024-01-26T01:25:07.160Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:07.342Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:07.352Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:07.542Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:07.553Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:07.740Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:07.751Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:07.942Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:07.953Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:08.141Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:08.151Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:08.342Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:08.352Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:08.544Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:08.550Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:08.742Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:08.754Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...
2024-01-26T01:25:08.946Z [INFO]  handler.certwatcher: Webhooks changed. Updating certs...

This is already enabled

injector:
  leaderElector:
    enabled: true

Don't forget to increase the replicas:

injector:
  # Any number greater than 1 is required to enable leader election.
  replicas: 2
  leaderElector:
    enabled: true

https://github.com/hashicorp/vault-helm/blob/v0.28.1/templates/injector-deployment.yaml#L96

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants