Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[server] Fixed storage node read quota usage ratio spikes #1256

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Commits on Oct 28, 2024

  1. [server] Fixed storage node read quota usage ratio spikes

    1. We observed unusually high quota usage ratio spikes (7x) without any rejected requests. This is mathematically
    impossible if the spikes were caused by an actual KPS spike since our default sampling window is 30s and token
    bucket capacity multiplier is only 5x. This is possible because we currently can have a mismatch between KPS
    and node responsibility of different versions (current, backup and future). i.e. Depending on the timing of things we
    could be calculating ratio using the KPS of current + backup and divide that by the node responsibility of a future
    version which has its replicas partially assigned, resulting in a huge spike in usage ratio.
    
    2. To solve this we will monitor the KPS and QPS based on corresponding versions. Version swap is not atomic when we
    have many routers and fast clients updating their current version metadata separately. Therefore, it's expected for
    some short periods of time to receive traffic for both current and backup versions. We will track the requested quota
    stats separately and use the current version stats to calculate the usage ratio. We can also have alerts on the
    backup version requested stats since it's expected to drop to 0 if all the fast clients and routers are updating
    their metadata correctly.
    
    3. Quota rejection will remain to be enforced at the version level but emit stats at the store level. We can convert
    it to versioned as well but currently don't see the needs to do so.
    xunyin8 committed Oct 28, 2024
    Configuration menu
    Copy the full SHA
    10e2d1d View commit details
    Browse the repository at this point in the history