Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] New pods shortly deleted, and the old pods remains. ArgoCD #627

Open
qdrddr opened this issue Mar 12, 2024 · 13 comments
Open

[BUG] New pods shortly deleted, and the old pods remains. ArgoCD #627

qdrddr opened this issue Mar 12, 2024 · 13 comments
Labels
kind/bug Something isn't working

Comments

@qdrddr
Copy link

qdrddr commented Mar 12, 2024

Describe the bug
Tested with strategy: default or env-vars. Using ArgoCD. ArgoCD app ory\oathkeeper that includes the ConfigMap accessrules to monitor. The deployment has annotation: configmap.reloader.stakater.com/reload: accessrules

When I modify the ConfigMap in git, the reloader notices the change, creates new pods, but they are deleted shortly after creation and the old version remains intact.

To Reproduce
Steps to reproduce the behavior

Expected behavior
The old pod should be deleted and the new pod remains.

Logs from the Reloader

time="2024-03-12T20:49:44Z" level=error msg="Update for 'oathkeeper' of type 'Deployment' in namespace 'ory' failed with error Operation cannot be fulfilled on deployments.apps \"oathkeeper\": the object has been modified; please apply your changes to the latest version and try again"
time="2024-03-12T20:49:44Z" level=error msg="Rolling upgrade for 'accessrules' failed with error = Operation cannot be fulfilled on deployments.apps \"oathkeeper\": the object has been modified; please apply your changes to the latest version and try again"
time="2024-03-12T20:49:44Z" level=error msg="Error syncing events: Operation cannot be fulfilled on deployments.apps \"oathkeeper\": the object has been modified; please apply your changes to the latest version and try again"
time="2024-03-12T20:49:51Z" level=info msg="Changes detected in 'accessrules' of type 'CONFIGMAP' in namespace 'ory', Updated 'oathkeeper' of type 'Deployment' in namespace 'ory'"

Environment

  • Operator Version:
  • Kubernetes/OpenShift Version: RKE2 k8s v1.27.10+rke2r1
  • ArgoCD Version: v2.10.2
  • Reloader: Deployed via ArgoCD using Helmchart 1.0.69

Additional context
the helm values file:

reloader:
  enableHA: true

deployment:
  # If you wish to run multiple replicas set reloader.enableHA = true
  replicas: 2
  # Set to default, env-vars or annotations
  reloadStrategy: env-vars
@qdrddr qdrddr added the kind/bug Something isn't working label Mar 12, 2024
@qdrddr qdrddr changed the title [BUG] [BUG] New pods shortly delegated, and the old pod remains. ArgoCD Mar 12, 2024
@MuneebAijaz
Copy link
Contributor

@qdrddr the way reloader works is by updating env var, which triggers a deployment change and update it sent to the pods. Could there be any case that argocd is actively reverting the changes to the deployment done by reloader, in that case new pods will be deleted and old state will be persisted?

@qdrddr
Copy link
Author

qdrddr commented Mar 25, 2024

@MuneebAijaz would you recommend how to workaround this and continue using ArgoCD?

@MuneebAijaz
Copy link
Contributor

@qdrddr i think this would need more investigation on if the reason really is ArgoCD. and if it is, should it watch the Env field under deployments. if not, ArgoCD provides ways to ignore specific fields in specific resources

@qdrddr
Copy link
Author

qdrddr commented May 1, 2024

Could you point to the documents for this?

Also, I have doubts about the proposed workaround to set ArgoCD to skip checking parts of a resource, as even if ArgoCD ignores Envs, it still will see and kill extra containers with auto-prune settings regardless of changes in Env.

So basically, I cannot use Reloader with ArgoCD with enabled auto pruning. @MuneebAijaz

@qdrddr
Copy link
Author

qdrddr commented May 1, 2024

The problem is that Reloader creates additional containers before killing outdated containers to reduce impact, which is an excellent strategy. But ArgoCD, with enabled Pruning, notices an extra container and kills it before Reloader gets a chance to kill the outdated container. Resulting in the outdated containers remain unchanged.

Do you know if ArgoCD integration is needed here?

Ideas on how this can be fixed:

  1. So ideally, in this scenario of Reloader creating a new container and then killing the old one and so on, instead, Reloader should tell ArgoCD to increase replicas by one, and then Reloader can kill the old containers one by one. So ArgoCD with enabled Pruning would create a new extra container and re-create those deleted by Reloader.
  2. Alternatively, you can temporarily turn off auto pruning in ArgoCD for a given app and reload it once the Reloader is complete.
  3. A not-ideal scenario that might also work is to check if there is more than one container in the replica set without creating an extra container, kill one by one, and wait till they are re-created by ArgoCD & k8s.

@qdrddr qdrddr changed the title [BUG] New pods shortly delegated, and the old pod remains. ArgoCD [BUG] New pods shortly deleted, and the old pods remains. ArgoCD May 1, 2024
@MuneebAijaz
Copy link
Contributor

I have doubts about the proposed workaround to set ArgoCD

yes, there are implications to that approach. but not the ones stated above.

The problem is that Reloader creates additional containers before killing outdated containers

Reloader doesnt do that, Reloader only updates ENV field in the parent resource (Deployment, Statefulset, Daemonset), and when an ENV is updated, Deployment etc are bound to propagate that change to the pods, so the parent resource spins up another ReplicaSet with new ENV, and Replicaset creates new pod with updated ENV. that is how update is done by Reloader.

Reloader itself doesnt manage pod/container lifecycle, to not have any effect on user's application, it relies on already set deployment strategy in Deployments to propagate that change.

@MuneebAijaz
Copy link
Contributor

I will try to replicate the issue on my end, and get back to you.

@0123hoang
Copy link

@qdrddr Did you follow #333 ?
I changed from reloadStrategy: annotations to env-vars and problem gone.

@qdrddr
Copy link
Author

qdrddr commented May 22, 2024

@0123hoang Nope, the problem persists with reloadStrategy: env-vars

CleanShot 2024-05-21 at 20 38 15

@shameemshah
Copy link

we are also facing the same issue

@Gatschknoedel
Copy link

I would debug this by disabling self-heal for the responsible argo app, let reloader do its thing and afterwards check the argo app. My guess is that the application is out of sync and argo is immediately reverting because of that.

@BlackRoach
Copy link

After the reloader propagated changes and while the new pods were reloading, I clicked the ArgoCD Sync button. As a result, the new pods were immediately deleted and replaced with the old pods.

I think ArgoCD auto sync revert changes from reloader.

@MuneebAijaz
Copy link
Contributor

Have you tried setting reload strategy as annotations? Related issue: #701

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants