Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvement: Introduce new URIPool and URISelector interfaces. #341

Draft
wants to merge 9 commits into
base: develop
Choose a base branch
from

Conversation

dtrejod
Copy link
Contributor

@dtrejod dtrejod commented Jul 28, 2022

Before this PR

The components of the request retrier in CGR were tightly coupled and hard to reason about. Due to how requests are retried currently, we are unable to preserve server state across requests very easily.

After this PR

Refactors large bits of the internal httpclient.Client retry/uri selection logic.
Instead the current retry logic relies on a RequestRetrier middleware and
a separate URI selection mechanism. These existing interfaces limit
our ability to implement smarter client side load balancing algorithms.

I propose introducing a two new interfaces -- URIPool and URISelector.
The URIPool is an abstraction that tracks server side errors across all
requests. It also is responsible for updating the list of available uris
from the config refreshable. Additionally the URISelector is an abstraction that allow for
implementing many types of client side load balancing algorithms. For
example, least used connection, rendezvous routing, zone aware routing,
strict client side sharing, ect. These two new types will persist state across requests
by acting as middleware themselves so we can actually implement better request concurrent models
akin to what https://github.com/palantir/dialogue does today.

Lastly the RequestRetrier still determines when requests in general
should be retried and the associated backoff logic.

In short, the idea overall becomes:

  • RequestRetrier: Handles client side errors
  • URIPool: Handles server side errors
  • URISelector: Client side load balancing algorithm

Work here is still a WIP.

==COMMIT_MSG==
Introduce new URIPool and URISelector interfaces.
==COMMIT_MSG==

Benchmarks

name                                                          old time/op    new time/op    delta
UnavailableURIs/OneAvailableServer/count=10-8                    535µs ± 0%     419µs ± 1%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneAvailableServer/count=100-8                  5.38ms ± 1%    4.33ms ± 3%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneAvailableServer/count=1000-8                 53.8ms ± 1%    45.8ms ± 5%   ~     (p=0.100 n=3+3)
UnavailableURIs/FourAvailableServers/count=10-8                  550µs ± 1%     426µs ± 1%   ~     (p=0.100 n=3+3)
UnavailableURIs/FourAvailableServers/count=100-8                5.46ms ± 0%    4.43ms ± 7%   ~     (p=0.100 n=3+3)
UnavailableURIs/FourAvailableServers/count=1000-8               56.7ms ± 8%    42.6ms ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfFourUnavailableServers/count=10-8        687µs ± 0%     429µs ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfFourUnavailableServers/count=100-8      7.06ms ± 6%    4.30ms ± 1%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfFourUnavailableServers/count=1000-8     68.4ms ± 0%    42.6ms ± 1%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfThreeUnavailableServers/count=10-8       738µs ± 2%     450µs ± 8%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfThreeUnavailableServers/count=100-8     7.38ms ± 1%    4.30ms ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfThreeUnavailableServers/count=1000-8    73.5ms ± 1%    43.0ms ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfTwoUnavailableServers/count=10-8         817µs ± 1%     433µs ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfTwoUnavailableServers/count=100-8       8.17ms ± 0%    4.34ms ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfTwoUnavailableServers/count=1000-8      81.7ms ± 0%    43.1ms ± 0%   ~     (p=0.100 n=3+3)

name                                                          old alloc/op   new alloc/op   delta
UnavailableURIs/OneAvailableServer/count=10-8                    155kB ± 0%      95kB ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneAvailableServer/count=100-8                  1.55MB ± 0%    0.95MB ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneAvailableServer/count=1000-8                 15.5MB ± 0%     9.5MB ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/FourAvailableServers/count=10-8                  155kB ± 0%      95kB ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/FourAvailableServers/count=100-8                1.55MB ± 0%    0.96MB ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/FourAvailableServers/count=1000-8               15.5MB ± 0%     9.5MB ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfFourUnavailableServers/count=10-8        197kB ± 0%      95kB ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfFourUnavailableServers/count=100-8      1.97MB ± 0%    0.96MB ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfFourUnavailableServers/count=1000-8     19.7MB ± 0%     9.5MB ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfThreeUnavailableServers/count=10-8       211kB ± 0%      95kB ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfThreeUnavailableServers/count=100-8     2.11MB ± 0%    0.95MB ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfThreeUnavailableServers/count=1000-8    21.1MB ± 0%     9.5MB ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfTwoUnavailableServers/count=10-8         237kB ± 1%      95kB ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfTwoUnavailableServers/count=100-8       2.38MB ± 0%    0.95MB ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfTwoUnavailableServers/count=1000-8      23.8MB ± 1%     9.5MB ± 0%   ~     (p=0.100 n=3+3)

name                                                          old allocs/op  new allocs/op  delta
UnavailableURIs/OneAvailableServer/count=10-8                    1.43k ± 0%     1.38k ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneAvailableServer/count=100-8                   14.3k ± 0%     13.8k ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneAvailableServer/count=1000-8                   143k ± 0%      138k ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/FourAvailableServers/count=10-8                  1.43k ± 0%     1.38k ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/FourAvailableServers/count=100-8                 14.3k ± 0%     13.8k ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/FourAvailableServers/count=1000-8                 143k ± 0%      138k ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfFourUnavailableServers/count=10-8        1.95k ± 0%     1.38k ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfFourUnavailableServers/count=100-8       19.5k ± 0%     13.8k ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfFourUnavailableServers/count=1000-8       195k ± 0%      138k ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfThreeUnavailableServers/count=10-8       2.12k ± 0%     1.38k ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfThreeUnavailableServers/count=100-8      21.3k ± 0%     13.8k ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfThreeUnavailableServers/count=1000-8      213k ± 1%      138k ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfTwoUnavailableServers/count=10-8         2.45k ± 1%     1.38k ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfTwoUnavailableServers/count=100-8        24.7k ± 0%     13.8k ± 0%   ~     (p=0.100 n=3+3)
UnavailableURIs/OneOutOfTwoUnavailableServers/count=1000-8        246k ± 1%      138k ± 0%   ~     (p=0.100 n=3+3)

Possible downsides?


This change is Reviewable

dtrejod and others added 4 commits July 22, 2022 18:30
Refactors large bits of the internal httpclient.Client retry logic.
Currently we don't store any state across the lifecycle of multiple
requests. Instead the current retry logic relies on a RequestRetrier and
a URI selection mechanism. However the interfaces between the two logic
machines doesn't provide clear boundaries and also makes it difficult to
use recovered response state across different requests.

I propose introducing a two new interfaces -- URIPool and URISelector.
The URIPool is an abstraction that tracks server side errors across all
requests. It also is responsible for updating the list of available uris
from the configuration refreshable.

Additionally the URISelector is an abstraction that allow for
implementing many types of scored request weighting algorithms. For
example, least used connection, rendezvous routing, zone aware routing,
ect.

Lastly the RequestRetrier still determines at where requests in general
should be retried and the associated backoff logic.

Work here is still a WIP.
…or of client having round-robin uri selector.
@dtrejod dtrejod force-pushed the dt/uri-retry-pool-selector branch from b8e031c to a060bc2 Compare August 11, 2022 16:41
@@ -65,7 +65,7 @@ func TestErrorDecoderMiddlewares(t *testing.T) {
assert.True(t, ok)
assert.Equal(t, 307, code)
location, ok := httpclient.LocationFromError(err)
assert.True(t, ok)
assert.False(t, ok)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are now using http.Response.Location() in the error decoder middleware, this test return false since there is no valid location header returned in this test.

Comment on lines +84 to +87
location, err := resp.Location()
if err == nil {
unsafeParams["location"] = location.String()
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uses the http.Response.Location() method here allows us to forgo passing in the useBaseURIOnly in the doOnce() method since Location() builds the proper full URI based on the http.Response.Request and returned Location header.

…ing redirects. Uncomment relevant tests in stateful uri pool for failed uris.
…ellation error. Cleanup err handling in balanced selector. Fallback to response status code if cgr error isn't defined.
@dtrejod dtrejod force-pushed the dt/uri-retry-pool-selector branch from 4528f68 to 5e2d60f Compare August 15, 2022 23:32
@dtrejod dtrejod force-pushed the dt/uri-retry-pool-selector branch from dcba04f to cdfd790 Compare August 16, 2022 00:05
@@ -164,7 +168,6 @@ func (c *clientImpl) doOnce(
transport := clientCopy.Transport // start with the client's transport configured with default middleware

// must precede the error decoders to read the status code of the raw response.
transport = wrapTransport(transport, c.uriScorer.CurrentURIScoringMiddleware())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scoring URI middleware was moved into clientImpl.middleware instantiated when the client is built here:

// append uriSelector and uriPool middlewares
middleware = append(middleware, uriPool, b.URISelector)

@dtrejod dtrejod marked this pull request as ready for review August 16, 2022 00:05
@@ -55,7 +55,8 @@ type clientImpl struct {
errorDecoderMiddleware Middleware
recoveryMiddleware Middleware

uriScorer internal.RefreshableURIScoringMiddleware
uriPool internal.URIPool
uriSelector internal.URISelector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These interfaces exist for the lifetime of the client - we don't believe we need to manage state on a request-specific basis, so not optimizing the interface for that. This works well for state managed over the liftime of the client and URI strategies that are stateless (i.e. rendezvous hashing).

// provided) to determine if the request should be attempted. If the returned value is true, the retrier will have
// waited the desired backoff interval before returning when applicable. If the previous response was a redirect, the
// retrier will also return the URL that should be used for the new next request.
func (r *RequestRetrier) Next(resp *http.Response, err error) (bool, *url.URL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels weird to me that the retrier is still responsible for URI selection. It would read more clearly to me if the URI redirection logic was implemented behind the URI interfaces used in the client instead.

r.currentURI = nextURI
r.offset = nextURIOffset
// retry with backoff
return r.retrier.Next(), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would separate backoff behavior into a different interface given the weird way we use the underlying Retrier interface is not intuitive.

@@ -55,7 +55,8 @@ type clientImpl struct {
errorDecoderMiddleware Middleware
recoveryMiddleware Middleware

uriScorer internal.RefreshableURIScoringMiddleware
uriPool internal.URIPool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URIPool is new functionality. I think the behavior can be captured behind a URISelector implementation so prefer to just use the refreshable slice of URIs directly for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants