-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding health check and exception handling #1
base: djm
Are you sure you want to change the base?
Conversation
…tion and marks the service as unhealthy.
self.poll() | ||
health_check.Healthy = True | ||
sleep(self.options.poll_period) | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this make it worse? It won't actually exit. I had a look at this code before, there's no exception catching, so I don't understand -- it should be exiting if there are issues (and then being restarted).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll configure the liveness and readiness probes to restart the container if health check fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it would already do that if you just let the exception bubble up; it seems like the problem is that it isn't quitting when there is an issue?
healthcheck/health_check.py
Outdated
|
||
|
||
def start_health_endpoint(): | ||
app.run(host='0.0.0.0', port=5000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be better if we can have this port number configurable through OptionParser
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup sure. I'll update it.
def run(): | ||
t = Thread(target=start_health_endpoint, daemon=True) | ||
def run(options): | ||
t = Thread(target=start_health_endpoint(options.port), daemon=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not really sure if this is correct. According to Python's doc:
target is the callable object to be invoked by the run() method
but here you've passed None
which is returned from the execution of start_health_endpoint
.
One thing to do is to return a lambda that will be started by Thread, I think:
def start_health_endpoint(port):
return lambda port: app.run(host='0.0.0.0', port=port)
def run(options):
t = Thread(target=start_health_endpoint(options.port), daemon=True)
Overview
Added a new health check endpoint. The
sqs.py
during eachpoll()
catches the exception if any and marks the service as unhealthy. We'll still have to update the other services deployments to configure thelivenessProbe
andreadinessProbe
.Unfortunately, there are no tests at all. So we'll have to test the changes out in staging.