Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Circuit breaker #266

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Circuit breaker #266

wants to merge 2 commits into from

Conversation

Kamil-Lontkowski
Copy link
Contributor

This draft implements CricuitBreaker with features based on that are provided in breaker from resilience4j.

Those features, for count based window (last n operations):

  • defining threshold for failures
  • defining threshold for slow operations, and threshold for what is considered slow
  • minimum number of operations until thresholds are calculated
  • HalfOpen state that allows number of operations to pass, after that it is calculated if those operations where below threshold so we can close breaker or go back to open.
  • time after which breaker goes from Open to HalfOpen

Things that are not implemented in here but are in resilience4j.

  • Timeout for HalfOpen state, after which breaker goes back to Open
  • Ability to turn of automatic transition from Open to HalfOpen state

@Kamil-Lontkowski
Copy link
Contributor Author

For now this is incomplete since I have some questions.

  1. Right now I am not keeping results of calls made in HalfOpen state separate. Maybe it should be, because different number of operations can push rate more or less depending on the difference.
  2. Should I add those 2 missing features from resilience4j? Flag seems necessary, since we should not worry about background thread. But HalfOpen timeout might be useful, but it will complicate a bit process of registering operation started in HalfOpen state, so we don't end up with wrong metrics.
  3. Right now there is only runOrDrop operation defined. Since resilience4j always throw exception I am not sure what other interfaces would make sense.
  4. Would it make sense to have ability to completely wipe out current state of circuit breaker?

Right now only now only count based sliding window is implemented but I wanted to discuss those questions right away.

With time based sliding windows I think that background loop evicting older entries from queue similar to SlidingWindow RateLimiterAlgorithm should be sufficient.

if numOfOperations >= minimumNumberOfCalls && (failuresRate >= failureRateThreshold || slowRate >= slowCallThreshold) then
// Start schedule to switch to HalfOpen after waitDurationOpenState passed
forkDiscard(
scheduled(ScheduledConfig[Throwable, Unit](Schedule.InitialDelay(waitDurationOpenState)))(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Effects in updateAndGet should be avoided, but I think scheduling state change twice (in case of reapplying function) should not break things, especially since we should have braker open for some time. But maybe it can create race condition where we can complete enough calls in HalfOpen state to change it to Closed only for second scheduled to complete and change it back to HalfOpen

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm this looks fragile. I think we should change the design a little. An actor here seems like a natural choice - the actor state - that is, the state machine - would then always be accessed from a single thread. We could send updates to the actor with the results of operation invocations, and basing on the, the internal state would be updated.

One place where we'd have to deviate from the actor pattern is checking if the circuit breaker is open - always going through the actor would be a bottleneck (as all operations would have to synchronize on a single thread). So the actor would also have to update some shared mutable state, which every thread could quickly check (not necessarily immediately consistent, it's fine if we let through one or two additional operations, when the CB is closing)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then each circuit breaker state could be modelled as an immutable value, so we'd only have one top-level mutable overall state, managed by the actor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now only push and snapshot methods are used. Unless we want to provide this implementation for the user maybe they should be deleted.

end apply

private[resilience] case class CircuitBreakerCountStateMachine(
windowSize: Int,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should definitely be captured in a "config" case class. Can we reuse the existing one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is captured in config, in enum SlidingWindow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but here we are passing all these parameters explicitly for some reason? the parameter list seems way too long :)

@adamw
Copy link
Member

adamw commented Jan 10, 2025

As for the questions:

  1. hm I don't know - how do circuit breakers work normally? maybe you could find some articles describing typical designs, and link to them here or in the comments; I think we should aim for whatever is "industry standard"
  2. again, that's a question of how circuit breakers work in general. Intuitively, the half open state should transition to open/closed after certain conditions are met, but I'm sure there's plenty of edge-cases to consider
  3. yeah, runOrDrop is fine
  4. I think any use-case for wiping can be served by simply creating a new CB, so let's omit wiping for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants