Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: make leader election possible with multiple replicas #195

Closed
wants to merge 1 commit into from

Conversation

Ajpantuso
Copy link
Contributor

@Ajpantuso Ajpantuso commented Oct 16, 2022

Summary

Fixes #171

Makes Leader Election possible with the following changes:

  • When Leader Election is enabled the Metrics Endpoint is served directly on Leaders, but indirectly (through a reverse-proxy) on followers so that clients will only ever receive metrics data from the leader.
  • The Prometheus registry for each instance leader or not is initialized so that the GenericReconcilier can feed metrics even for non-leaders. This means that non-leaders are warm-standby instances that will have populated metrics to serve when they become leaders themselves.

Result

  • All manager replicas actively inspect resources and populate metrics
  • Metrics are only ever served from the leader

Additional Information

  • Options are added to control leader election and Metrics Bind Addr (the latter is useful when running multiple instances out-of-cluster)
  • Prom Registry initialization is fully decoupled from the metrics server

@Ajpantuso
Copy link
Contributor Author

/hold

@openshift-ci
Copy link

openshift-ci bot commented Oct 16, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ajpantuso

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov-commenter
Copy link

codecov-commenter commented Oct 16, 2022

Codecov Report

Merging #195 (f4c7d4e) into master (edb1b65) will decrease coverage by 1.47%.
The diff coverage is 14.84%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #195      +/-   ##
==========================================
- Coverage   29.77%   28.30%   -1.48%     
==========================================
  Files          23       28       +5     
  Lines         937     1106     +169     
==========================================
+ Hits          279      313      +34     
- Misses        626      759     +133     
- Partials       32       34       +2     
Impacted Files Coverage Δ
cmd/manager/main.go 0.00% <0.00%> (ø)
cmd/manager/options.go 0.00% <0.00%> (ø)
internal/handler/options.go 0.00% <0.00%> (ø)
internal/runnable/server.go 0.00% <0.00%> (ø)
pkg/controller/generic_reconciler.go 0.00% <0.00%> (ø)
pkg/prometheus/options.go 0.00% <0.00%> (ø)
pkg/prometheus/registry.go 0.00% <0.00%> (ø)
pkg/prometheus/server.go 0.00% <0.00%> (ø)
internal/handler/handler.go 89.47% <89.47%> (ø)

@openshift-merge-robot
Copy link
Collaborator

@Ajpantuso: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Ajpantuso Ajpantuso closed this Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add controller-runtime based leader election to this operator
3 participants