Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid Thundering Herd on Rollout of new Release #177

Open
probably-not opened this issue Jul 14, 2024 · 1 comment
Open

Avoid Thundering Herd on Rollout of new Release #177

probably-not opened this issue Jul 14, 2024 · 1 comment

Comments

@probably-not
Copy link

probably-not commented Jul 14, 2024

When a rollout/deployment occurs, the feature flags that are cached in ETS are all flushed (as a new node will start up without the same data). This leads to a Thundering Herd situation, as all of the requests to the new node make requests to the persistence adapter (until the cache fills once more).

This can be solved in a few different ways:

  • Pre-filling the cache: on startup, allow a configuration that will pre-fill the ETS cache in memory. This will ensure that the cache is already full after startup completes. This could cause bloated memory, if for example the feature flags table contains a lot of old flags that are no longer in use.
  • Single-flight mechanics: Using a single-flight mechanism to ensure that only one request per key to the persistence adapter is made at any given time. This is fairly easy to implement with a GenServer + handle_call + GenServer.reply, i.e. the persistence_adapter().get() call would be wrapped in a GenServer to ensure that only one call is running at any given time.
@tompave
Copy link
Owner

tompave commented Aug 6, 2024

Hey, thank you for using the library and raising this point.

You're making a fair observation. And this is also related to this long-standing todo item in the readme:

  • Add some optional randomness to the TTL, so that Redis or the DB don't get hammered at constant intervals after a server restart.

Both of your suggestions have merits.

Pre-filling the cache: on startup, allow a configuration that will pre-fill the ETS cache in memory. This will ensure that the cache is already full after startup completes. This could cause bloated memory, if for example the feature flags table contains a lot of old flags that are no longer in use.

This is a simple and effective solution, and as far as I know it's something that applications are already doing when encountering the problem you describe. That's because this doesn't need to be part of FWF itself, and it can be done in application code. The pattern can even be generalized and extracted as a 3rd party extension to FWF, and published on Hex.

It still doesn't entirely remove the problem though, as the TTL of all cached flags will be the same, and they'll all expire at roughly the same time. At least, assuming that a large enough number of different flags will be queried frequently enough to cause a problem, let's say during the lifecycle of different web requests in a high traffic application.

  • Single-flight mechanics: Using a single-flight mechanism to ensure that only one request per key to the persistence adapter is made at any given time. This is fairly easy to implement with a GenServer + handle_call + GenServer.reply, i.e. the persistence_adapter().get() call would be wrapped in a GenServer to ensure that only one call is running at any given time.

This is interesting.

I don't think I've considered it before because it is possible to disable the ETS cache, and I wouldn't want to put such a bottleneck in front of the persistence adapter all the time. If we went with this solution, it would be have to be conditional to the ETS cache being enabled.

But it still sounds like something that can be done with a custom persistence adapter. Since adapters have a consistent interface, it should be possible to implement a generic enough "persistence adapter proxy", or perhaps middleware, to do what you describe before forwarding calls to the underlying actual adapter.

As I'm always a fan of keeping FWF simple, I'd be inclined to see this too as an extension of the library on Hex, rather than something that is part of FWF itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants