Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue where successful server streaming calls can incorrectly be marked as failing due to context cancellation #16

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

charleskorn
Copy link

@charleskorn charleskorn commented Oct 24, 2023

There is a race condition in openTracingClientStream that means that successful server streaming calls' client-generated spans can be incorrectly marked as failing due to context cancellation.

This happens because whenever the gRPC library's ClientStream implementation encounters an error in RecvMsg(), SendMsg(), Header() or CloseSend(), it immediately cancels its context (the one returned by Context()). This includes when io.EOF is returned by RecvMsg(), which is used to indicate the successful end of a stream.

However, this context cancellation then races with the interceptor calling finishFunc with the error returned by RecvMsg(), SendMsg(), Header() or CloseSend(). Either this call wins the race, in which case the true status of the call is recorded, or the context cancellation wins, and the span is recorded as failing due to context cancellation by the call to finishFunc here.

On top of this, the docs for grpc.ClientStream warn that the Context() method should not be called until after the first call to Header() or RecvMsg() has returned, however the current implementation calls Context() before either of these methods are called.

This PR fixes both of these issues by using the provided context rather than the stream's context to wait for context cancellation, as this context is not cancelled when the stream is exhausted or fails.

This has two drawbacks:

  • The interceptor will not immediately observe cases where the stream's ClientConn is closed, but will instead wait until another method on the stream returns an error. This seemed preferable to the alternative, more complex, implementation which would require capturing the stream context after the first call to Header() or RecvMsg(), and implementing some form of locking to ensure that the error returned by RecvMsg(), SendMsg(), Header() or CloseSend() wins the race with the cancellation of the stream's context.
  • The context passed to the SpanDecoratorFunc will now be the provided context, rather than the stream's context.

Note that while I can observe this behaviour regularly in our application, I had to artificially delay the call to finishFunc on line 213 of client.go to have the test reliably fail before introducing the fix.

…ailing due to context cancellation when they actually succeeded
@charleskorn charleskorn changed the title Fix issue where successful client streaming calls can incorrectly be marked as failing due to context cancellation Fix issue where successful server streaming calls can incorrectly be marked as failing due to context cancellation Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant