Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some improvements for Production deployments #71

Open
fschoell opened this issue Sep 5, 2024 · 0 comments
Open

Some improvements for Production deployments #71

fschoell opened this issue Sep 5, 2024 · 0 comments

Comments

@fschoell
Copy link

fschoell commented Sep 5, 2024

Don't log 4xx responses as error. Either log them as info or not at all, they aren't an error from our perspective but only client errors.

Don't expose internal errors to the client (make sure they are properly logged though). They are not helpful for users and might expose internal information. If you want traceability, you could generate a random id and return that to the user instead (and also log it so we can grep the logs for a specific failed request).

Count failed Clickhouse queries as Prometheus metric, that way we can easily add alerts for database issues (for now this is probably equivalent with all 500 errors, but that might diverge in the future).

Use a Promtheus histogram to bucket query times instead of a counter. That way we can monitor query time percentiles, which is more useful than a global average of query times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant