Bug Report: connection pool timed out
errors when there is a spike in borrowed/waiting connections due to race condition
#17662
Labels
Overview of the Issue
There seems to be a race condition that causes a deadlock in connection pooling that occurs when a large number of connections are borrowed/waiting, specifically when there are no new connections afterwards. Here is the general flow, assuming a connection pool of size 1 for example:
code = ResourceExhausted desc = connection pool timed out
.Normally, in a live production system, a new query would come in, and a connection would be pulled straight from the pool, rather than waiting on an existing connection to pass it on. The new connection could then pass it on to Thread B, breaking the deadlock. But when it comes to our (GitHub) CI, the nature of our queries tends to cause the race condition more often, as we fire a bunch of queries all at once as part of a
UNION ALL
in our test cleanup code. These queries exceed the connection pool quickly, execute quickly, and cause the race condition. Since we're at the end of our test(s), no new queries are fired to pull a connection directly from the pool, and we wait forever.Reproduction Steps
@arthurschreiber has come up with a test case that pretty consistently reproduces the error: #17661
Binary Version
Operating System and Environment details
Log Fragments
The text was updated successfully, but these errors were encountered: