Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment bug #52

Closed
hmike96 opened this issue Mar 26, 2020 · 2 comments
Closed

Deployment bug #52

hmike96 opened this issue Mar 26, 2020 · 2 comments

Comments

@hmike96
Copy link

hmike96 commented Mar 26, 2020

Hi I am trying to run pod reaper as a deployment but keep getting this bug during run time of the reaper {"error":"no rules were loaded","level":"panic","msg":"error loading options","time":"2020-03-26T04:11:46Z"} panic: (*logrus.Entry) (0x142fba0,0xc42034f810) goroutine 1 [running]: github.com/target/pod-reaper/vendor/github.com/sirupsen/logrus.Entry.log(0xc42004e060, 0xc420211620, 0x0, 0x0, 0x0, 0x0, 0x0 /go/src/github.com/target/pod-reaper/vendor/github.com/sirupsen/logrus/entry.go:239 +0x350 github.com/target/pod-reaper/vendor/github.com/sirupsen/logrus.(*Entry).Log(0xc42034f7a0, 0xc400000000, 0xc4205f9d30, 0x1, 0 /go/src/github.com/target/pod-reaper/vendor/github.com/sirupsen/logrus/entry.go:268 +0xc8 github.com/target/pod-reaper/vendor/github.com/sirupsen/logrus.(*Entry).Panic(0xc42034f7a0, 0xc4205f9d30, 0x1, 0x1) /go/src/github.com/target/pod-reaper/vendor/github.com/sirupsen/logrus/entry.go:306 +0x55 main.newReaper(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/target/pod-reaper/reaper/reaper.go:37 +0x2de main.main() /go/src/github.com/target/pod-reaper/reaper/main.go:22 +0x50

Here is my manifest that includes the resources I am deploying.
`


apiVersion: v1
kind: Namespace
metadata:
name: reaper


apiVersion: v1
kind: ServiceAccount
metadata:
name: pod-reaper-service-account
namespace: reaper


kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: pod-reaper-cluster-role
rules:

  • apiGroups: [""]
    resources: ["pods"]
    verbs: ["list", "delete"]

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: pod-reaper-role-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: pod-reaper-cluster-role
subjects:

  • kind: ServiceAccount
    name: pod-reaper-service-account
    namespace: reaper

apiVersion: apps/v1
kind: Deployment
metadata:
name: pod-reaper
namespace: reaper
spec:
replicas: 1
selector:
matchLabels:
app: pod-reaper
template:
metadata:
labels:
app: pod-reaper
pod-reaper: disabled
spec:
serviceAccount: pod-reaper-service-account
containers:
- name: airflow-scheduler-terminator
image: target/pod-reaper
resources:
limits:
cpu: 30m
memory: 30Mi
requests:
cpu: 20m
memory: 20Mi
env:
- name: NAMESPACE
value: dataloader-airflow-blue
- name: SCHEDULE
value: "@every 15m"
- name: REQUIRE_LABEL_KEY
value: component
- name: REQUIRE_LABEL_VALUES
value: scheduler
`
Thanks in advance for a great tool.

@brianberzins
Copy link
Collaborator

This is actually a case that wouldn't be caught by the rule refactor PR I have open, which is pretty interesting.

So what's actually going on here?
When pod reaper starts up, it tries to figure out which "rules" it should run. For now (at least until #45 is well reviewed/merged) it does this by looking for any environment variables that configure a specific conditions that tell it which pods to kill. In this case, it didn't find any.

I'm not sure that panicking/erroring out is the right behavior here, but it's trying to start up, finds no rules telling it what to kill, and subsequently erroring out. Effectively it's trying to say "I have no pod kill conditions, therefore I'll never kill anything, therefore I'll just crash out".

A way to test that the deployment is doing something without actually killing anything could be to configure a rule with a condition that'll never be met.

env:
  - name: POD_STATUSES
    value: totally-not-a-real-status

Or

env:
  - name: CHAOS_CHANCE
    value: 0.0

Hopefully that makes sense!

After you know it's up and working, what conditions would you like to see as criteria for whether or not to kill a pod? Maybe I can through together and get something working (and/or use the use case to try out #50)!

@hmike96
Copy link
Author

hmike96 commented May 11, 2020

Hey thanks for getting back to me so quickly didnt realize I just had configurations and no rules. Added a rule and works great now thanks!

@hmike96 hmike96 closed this as completed May 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants