-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't pause processing when send_local_response fails #423
Conversation
+cc @keithmattix |
Seems reasonable to me! The change seems worth the risk IMO; from what I understand, the affected population are plugin authors who are already pausing envoy |
Yes, that's correct. Plugins affected by this change are already affected, but in a different way. |
Propagating returned status is definitely a good thing to do. But please note that Having said that:
Also, and that's a topic for a separate issue, but I question whether we should be calling the plugin for response processing of response it generated itself. |
For example, in Rust SDK's example HTTP authorization plugin that relies on this behavior (like every other authorization plugin), what should be behavior when this calls fails? The only solution that comes to mind is returning What am I missing? |
6cfb37d
to
173953e
Compare
For context see the Envoy issue envoyproxy/envoy#28826. Here is a shorter summary: 1. A wasm plugin calls proxy_send_local_response from both onRequestHeaders and onResponseHeaders 2. When proxy_send_local_reply is called from onRequestHeaders it triggers a local reply and that reply goes through the filter chain in Envoy 3. The same plugin is called again as part of the filter chain processing but this time onResponseHeaders is called 4. onResponseHeaders calls proxy_send_local_response which ultimately does not generate a local reply, but it stops filter chain processing. As a result we end up with a stuck connection on Envoy - no local reply and processing is stopped. I think that proxy wasm plugins shouldn't use proxy_send_local_response this way, so ultimately whoever created such a plugin shot themselves in the foot. That being said, I think there are a few improvements that could be made here on Envoy/proxy-wasm side to handle this situation somewhat better: 1. We can avoid stopping processing in such cases to prevent stuck connections on Envoy 2. We can return errors from proxy_send_local_response instead of silently ignoring them. Currently Envoy implementation of sendLocalResponse can detect when a second local response is requested and returns an error in this case without actually trying to send a local response. However, even though Envoy reports an error, send_local_response ignores the result of the host specific sendLocalResponse implementation and stops processing and returns success to the wasm plugin. With this change, send_local_response will check the result of the host-specific implementation of the sendLocalResponse. In cases when sendLocalResponse fails it will just propagate the error to the caller and do nothing else (including stopping processing). I think this behavior makes sense in principle because on the one hand we don't ignore the failure from sendLocalResponse and on the other hand, when the failure happens we don't trigger any side-effects expected from the successful proxy_send_local_response call. NOTE: Even though I do think that this is a more resonable behavior, it's still a change from the previous behavior and it might break existing proxy-wasm plugins. Specifically: 1. C++ plugins that proactively check the result of proxy_send_local_response will change behavior (e.g., before proxy_send_local_response failed silently) 2. Rust plugins, due to the way Rust SDK handles errors from proxy_send_local_response will crash in runtime in this case. On the bright side of things, the plugins that are affected by this change currently just cause stuck connections in Envoy, so we are changing one undesirable behavior for another, but more explicit. Signed-off-by: Mikhail Krinkin <[email protected]>
173953e
to
98fe532
Compare
@PiotrSikora you're correct that the plugins that do that are buggy and it's definitely not the intent here to create a workarounds for them. That being said, I think Envoy can do better in a presense of a buggy plugin and not leave a stuck connection behind. And a buggy plugin could benefit from a signal that would tell them that they are doing something wrong. As for the fallback, my thinking here is that plugins shouldn't need a fallback in this case - they should just stop calling proxy_send_local_response when processing a local response. In a way proxy_send_local_response remains infallible as long as it is used correctly. Rather than returning an error, I can try and find a way to "crash" the plugin (i.e., Rust SDK would just panic in this case and I can see if I can do the same for C++). I can also skip returning an error all together, though I don't think it's the best way forward, because plugins that do that basically get no clear indication that they are doing something wrong. I also agree with you that the whole situation when a plugin is processing its own local reply is confusing and seem to be causing subtle issues. So before moving forward with the review of this PR let me see if I can come up with a change on the Envoy side to address that behavior - maybe Envoy folks will be receptive to this change of Envoy behavior. |
@krinkinmu the existing plugins currently have code that looks like this: // authorization declined
send_http_response(...);
return Action::Pause; However, after you start propagating error codes, any well-written plugin should check it, so that code becomes: // authorization declined
if send_http_response(...) == Status::OK {
return Action::Pause;
} else {
// what to do here???
return Action::Pause;
} which will result in exactly the same "stuck" behavior that you're trying to fix in this PR. |
I think well written plugins shouldn't call send_http_response in cases
where it will return an error.
So the code might be like this:
'''
assert(send_http_response(...) == WasmResult::Ok);
'''
I also don't quite follow, why fallback for incorrect use of the API would
be pausing in the first place? If you pause processing in case of errors
what exactly is supposed to resume processing?
…On Sat 19 Oct 2024, 15:54 Piotr Sikora, ***@***.***> wrote:
@krinkinmu <https://github.com/krinkinmu> the existing plugins currently
have code that looks like this:
// authorization declinedsend_http_response(...);return Action::Pause;
However, after you start propagating error codes, any well-written plugin
should check it, so that code becomes:
// authorization declinedif send_http_response(...) == Status::OK {
return Action::Pause;} else {
// what to do here???
return Action::Pause;}
which will result in exactly the same "stuck" behavior that you're trying
to fix in this PR.
—
Reply to this email directly, view it on GitHub
<#423 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAI7Q44CU4QMFJTYHBAC24LZ4JXCTAVCNFSM6AAAAABQB7TN76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRTHEZTINJTGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
But now that the error is propagated, plugins should check the return value, and plugins authors must think about what to do in case
That's exactly my point. What should be the fallback behavior? And why the processing should be resumed? If the authorization or anti-exfiltration plugin decides that request/response should be not be sent to the client, but the |
Why do we need a fallback for something that plugins shouldn't do in the
first place?
…On Sat 19 Oct 2024, 16:31 Piotr Sikora, ***@***.***> wrote:
I think well written plugins shouldn't call send_http_response in cases
where it will return an error.
But now that the error is propagated, plugins should check the return
value, and plugins authors must think about what to do in case
send_http_response fails.
So the code might be like this:
''' assert(send_http_response(...) == WasmResult::Ok); '''
assert() or panic() are definitely not going to unstuck anything, they're
going to bring the whole WasmVM down, so I don't see how that's any
improvement.
I also don't quite follow, why fallback for incorrect use of the API would
be pausing in the first place? If you pause processing in case of errors
what exactly is supposed to resume processing?
That's exactly my point. What should be the fallback behavior? And why the
processing should be resumed? If the authorization or anti-exfiltration
plugin decides that request/response should be not be sent to the client,
but the send_http_response fails for some reason, then I suspect that
resuming processing and forwarding request/response (effectively bypassing
the security mechanism in place) isn't the most desirable solution.
—
Reply to this email directly, view it on GitHub
<#423 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAI7Q42S6FF56OHFVJRZG43Z4J3NDAVCNFSM6AAAAABQB7TN76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRTHE3DQMRVGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Actually, let me step back, because I don't seem to understand the concern
here.
If I keep the patch the same, but stop returning an error to the plugin
would you find it problematic? From the plugin point of view, it will be
exactly as it currently works - they call proxy_send_local_response and it
will silently fail. The only difference is the processing will continue.
…On Sat 19 Oct 2024, 16:35 Mike Krinkin, ***@***.***> wrote:
Why do we need a fallback for something that plugins shouldn't do in the
first place?
On Sat 19 Oct 2024, 16:31 Piotr Sikora, ***@***.***> wrote:
> I think well written plugins shouldn't call send_http_response in cases
> where it will return an error.
>
> But now that the error is propagated, plugins should check the return
> value, and plugins authors must think about what to do in case
> send_http_response fails.
>
> So the code might be like this:
> ''' assert(send_http_response(...) == WasmResult::Ok); '''
>
> assert() or panic() are definitely not going to unstuck anything,
> they're going to bring the whole WasmVM down, so I don't see how that's any
> improvement.
>
> I also don't quite follow, why fallback for incorrect use of the API
> would be pausing in the first place? If you pause processing in case of
> errors what exactly is supposed to resume processing?
>
> That's exactly my point. What should be the fallback behavior? And why
> the processing should be resumed? If the authorization or anti-exfiltration
> plugin decides that request/response should be not be sent to the client,
> but the send_http_response fails for some reason, then I suspect that
> resuming processing and forwarding request/response (effectively bypassing
> the security mechanism in place) isn't the most desirable solution.
>
> —
> Reply to this email directly, view it on GitHub
> <#423 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAI7Q42S6FF56OHFVJRZG43Z4J3NDAVCNFSM6AAAAABQB7TN76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRTHE3DQMRVGE>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
I think you might be looking at this with a very narrow focus (i.e. what to do in case when Errors are usually propagated up the stack to be acted upon, so from the plugin's point of view, if it receives an error, then it should have alternative code path for it. The "fallback for something that plugins shouldn't do in the first place" isn't really a thing, because noone intentionally writes bad code. Also, I suspect that this can happen even when If that's the case, then that's a bug in Envoy. It would be good to verify it.
But why the processing should be resumed? And what does it entail? If we go back to the authorization plugin example, what do you want to happen here? Do you want the response to be forwarded to the client? |
Yes, I expect response to be forwarded to the client, given that it was a
local response triggered by the plugin in the first place.
…On Sat 19 Oct 2024, 17:12 Piotr Sikora, ***@***.***> wrote:
Actually, let me step back, because I don't seem to understand the concern
here.
I think you might be looking at this with a very narrow focus (i.e. what
to do in case when send_http_response is called again by the same plugin
for a locally generated response), but you're covering a lot more with this
change.
Errors are usually propagated up the stack to be acted upon, so from the
plugin's point of view, if it receives an error, then it should have
alternative code path for it. The "fallback for something that plugins
shouldn't do in the first place" isn't really a thing, because noone
intentionally writes bad code.
Also, I suspect that this can happen even when send_http_response is
called for a local response generated by another native/Lua/Wasm extensions
in the filter chain, so when the Proxy-Wasm plugin itself is perfectly fine.
If that's the case, then that's a bug in Envoy. It would be good to verify
it.
If I keep the patch the same, but stop returning an error to the plugin
would you find it problematic? From the plugin point of view, it will be
exactly as it currently works - they call proxy_send_local_response and it
will silently fail. The only difference is the processing will continue.
But why the processing should be resumed? And what does it entail? If we
go back to the authorization plugin example, what do you want to happen
here? Do you want the response to be forwarded to the client?
—
Reply to this email directly, view it on GitHub
<#423 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAI7Q454Y42HXA72BIK4LGLZ4KAGBAVCNFSM6AAAAABQB7TN76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRUGAZTGNZSHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
And in case when multiple plugins general a local response processing the
same request or response, I would expect at least one of those to actually
complete, rather than none of them.
…On Sat 19 Oct 2024, 17:13 Mike Krinkin, ***@***.***> wrote:
Yes, I expect response to be forwarded to the client, given that it was a
local response triggered by the plugin in the first place.
On Sat 19 Oct 2024, 17:12 Piotr Sikora, ***@***.***> wrote:
> Actually, let me step back, because I don't seem to understand the
> concern here.
>
> I think you might be looking at this with a very narrow focus (i.e. what
> to do in case when send_http_response is called again by the same plugin
> for a locally generated response), but you're covering a lot more with this
> change.
>
> Errors are usually propagated up the stack to be acted upon, so from the
> plugin's point of view, if it receives an error, then it should have
> alternative code path for it. The "fallback for something that plugins
> shouldn't do in the first place" isn't really a thing, because noone
> intentionally writes bad code.
>
> Also, I suspect that this can happen even when send_http_response is
> called for a local response generated by another native/Lua/Wasm extensions
> in the filter chain, so when the Proxy-Wasm plugin itself is perfectly fine.
>
> If that's the case, then that's a bug in Envoy. It would be good to
> verify it.
>
> If I keep the patch the same, but stop returning an error to the plugin
> would you find it problematic? From the plugin point of view, it will be
> exactly as it currently works - they call proxy_send_local_response and it
> will silently fail. The only difference is the processing will continue.
>
> But why the processing should be resumed? And what does it entail? If we
> go back to the authorization plugin example, what do you want to happen
> here? Do you want the response to be forwarded to the client?
>
> —
> Reply to this email directly, view it on GitHub
> <#423 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAI7Q454Y42HXA72BIK4LGLZ4KAGBAVCNFSM6AAAAABQB7TN76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRUGAZTGNZSHE>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
As far as I can tell, the changes in this PR aren't limited to cover only that scenario.
Assume such deployment: "client -> plugin A (anti-exfiltration) -> plugin B -> origin". "plugin B" generates a local response, "plugin A" decides it should be blocked and calls What should happen now? Resuming processing (i.e. sending response from "plugin B") would completely bypass the security mechanism supposedly provided by "plugin A (anti-exfiltration)". |
In this case plugin A has a few things that it could do:
1. It could scrape the response to prevent exfiltration
2. It could intentionally pause processing, if somebody decides that it's a
desirable behavior, but to do that it shouldn't call
proxy_send_local_response in the first place it should just return pause
directly.
…On Sat 19 Oct 2024, 17:34 Piotr Sikora, ***@***.***> wrote:
Yes, I expect response to be forwarded to the client, given that it was a
local response triggered by the plugin in the first place.
As far as I can tell, the changes in this PR aren't limited to cover only
that scenario.
And in case when multiple plugins general a local response processing the
same request or response, I would expect at least one of those to actually
complete, rather than none of them.
Assume such deployment: "client -> plugin A (anti-exfiltration) -> plugin
B -> origin".
"plugin B" generates a local response, "plugin A" decides it should be
blocked and calls send_http_response which fails.
What should happen now? Resuming processing (i.e. sending response from
"plugin B") would completely bypass the security mechanism supposedly
provided by "plugin A (anti-exfiltration)".
—
Reply to this email directly, view it on GitHub
<#423 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAI7Q464TBYMFBVY3ZHRE63Z4KC3HAVCNFSM6AAAAABQB7TN76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRUGA2TONRWGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Ideally though, instead of pausing processing and leaving a hanging
connection, IMO, better behavior would be to close the connection.
…On Sat 19 Oct 2024, 17:39 Mike Krinkin, ***@***.***> wrote:
In this case plugin A has a few things that it could do:
1. It could scrape the response to prevent exfiltration
2. It could intentionally pause processing, if somebody decides that it's
a desirable behavior, but to do that it shouldn't call
proxy_send_local_response in the first place it should just return pause
directly.
On Sat 19 Oct 2024, 17:34 Piotr Sikora, ***@***.***> wrote:
> Yes, I expect response to be forwarded to the client, given that it was a
> local response triggered by the plugin in the first place.
>
> As far as I can tell, the changes in this PR aren't limited to cover only
> that scenario.
>
> And in case when multiple plugins general a local response processing the
> same request or response, I would expect at least one of those to actually
> complete, rather than none of them.
>
> Assume such deployment: "client -> plugin A (anti-exfiltration) -> plugin
> B -> origin".
>
> "plugin B" generates a local response, "plugin A" decides it should be
> blocked and calls send_http_response which fails.
>
> What should happen now? Resuming processing (i.e. sending response from
> "plugin B") would completely bypass the security mechanism supposedly
> provided by "plugin A (anti-exfiltration)".
>
> —
> Reply to this email directly, view it on GitHub
> <#423 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAI7Q464TBYMFBVY3ZHRE63Z4KC3HAVCNFSM6AAAAABQB7TN76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRUGA2TONRWGA>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
The anti-exfiltration HTTP body example in Rust SDK uses |
I see, would it be fair to summarize your argument as this -
proxy_send_local_response could be intentionally used to pause processing
in some cases and that's a use case that we want to support?
…On Sat 19 Oct 2024, 17:55 Piotr Sikora, ***@***.***> wrote:
In this case plugin A has a few things that it could do:
1. It could scrape the response to prevent exfiltration
2. It could intentionally pause processing, if somebody decides that
it's a desirable behavior, but to do that it shouldn't call
proxy_send_local_response in the first place it should just return pause
directly.
The anti-exfiltration HTTP body example in Rust SDK
<https://github.com/proxy-wasm/proxy-wasm-rust-sdk/blob/3f4274ec8c6276cb187eb2c3a76752cb915b9c01/examples/http_body/src/lib.rs#L58-L66>
uses send_http_response for that purpose, so it would be negatively
affected by this change.
—
Reply to this email directly, view it on GitHub
<#423 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAI7Q45RBLNZH3AR6KNT5FLZ4KFHZAVCNFSM6AAAAABQB7TN76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRUGA3TOOJVGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Yes, The "forever pause" in case of local response being generated twice is not intentional (and possibly an error in Envoy that's not exclusive to Wasm), but that's actually not a bad fallback.
I agree that closing connection might be a better solution. |
Thank you for the feedback!
One more question if you don't mind, if we close the connection when we
detect multiple local responses (by the same plugin or different plugins,
doesn't matter), would it be problematic for proxy-wasm extensions?
Returning back to the example above, let's say exfiltration prevention
plugin checks the regular response (non-local) and calls
proxy_send_local_response - in this case we just send back to the client
the local response.
However, if the same plugin calls proxy_send_local_response while
processing another local response we just close the connection completely
without sending anything back.
My thinking here is that when multiple local responses are involved, there
are two behaviors that make sense:
1. Don't allow multiple local responses and close the connection when we
detect the second local response
2. Allow multiple local responses, but when it happens, only the latest
generated response actually gets sent back to the client.
I'm not yet completely sure that the second option is feasible yet, but
will look into it in more detail. In the meantime the first option seems
somewhat simpler, so I want to see if there are counter-points to such
behaviour.
…On Sat 19 Oct 2024, 18:13 Piotr Sikora, ***@***.***> wrote:
I see, would it be fair to summarize your argument as this -
proxy_send_local_response could be intentionally used to pause processing
in some cases and that's a use case that we want to support?
Yes, proxy_send_local_response is pretty much always used to stop further
processing and to generate an error response describing why (again:
authorization and anti-exfiltration plugins).
The "forever pause" in case of local response being generated twice is not
intentional (and possibly an error in Envoy that's not exclusive to Wasm),
but that's actually not a bad fallback.
Ideally though, instead of pausing processing and leaving a hanging
connection, IMO, better behavior would be to close the connection.
I agree that closing connection might be a better solution.
—
Reply to this email directly, view it on GitHub
<#423 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAI7Q47SZHE5NPUSVQZBRDTZ4KHLLAVCNFSM6AAAAABQB7TN76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRUGA4TKNZRG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I think that's would be fine, since both effectively stop processing. Also, note that this is very much an Envoy-specific issue. In principle, it shouldn't make any difference whether response came directly from the upstream or if it was generated locally, either by the plugin itself or another extension.
The second option seems reasonable... other than in case of buggy plugin that rewrites its own response (which, ironically, is what you're trying to address with this PR). Also, it looks that Envoy has a few ways to generate LocalReply nowadays: |
Thank you! I'm closing this PR, and will see if I can make a broader scoped change to the general local reply processing. |
@PiotrSikora I digged through the code and tested a bit with both wasm and native Envoy extensions and wanted to share what I found here. TL;DR: it looks like envoyproxy/envoy#23049 can be basically reverted now and it will not cause use-after-free issue and will at the same time prevent stuck connections left behind by proxy-wasm. I thought you might be interested in this because it would remove the failure case the PR introduced from sendLocalResponse in Envoy. I will spend a bit more time experiemnting, but I'm thinking of preparing a PR that just reverts the change in envoyproxy/envoy#23049 - it would be nice if you could weight in on that change once it's ready. I also wanted to bring up the hypothetical example you shared in #423 (comment):
Reading through the code and testing a bit, I don't think that plugin A will work as expected in this scenario, regardless of how proxy-wasm is implemented in Envoy. The problem is that Envoy does not send locally generated reply throught the filter chain in all cases - sometimes it just sends it directly to the client bypassing the filter chain and in such cases plugin A wouldn't be called at all. I thought it is important to mention in case somebody relies on this. Also, the issue with stuck connections is proxy-wasm specific and native extensions don't have the same issue. To be precise, if we compare two extensions:
Http::FilterHeadersStatus Filter::decodeHeaders(Http::RequestHeaderMap&, bool) {
decoder_callbacks_->sendLocalReply(
Http::Code::BadRequest, "", &modifyHeaders, absl::nullopt, "");
return Http::FilterHeadersStatus::StopAllIterationAndWatermark;
}
Http::FilterHeadersStatus Filter::encodeHeaders(Http::ResponseHeaderMap&, bool) {
decoder_callbacks_->sendLocalReply(
Http::Code::BadRequest, "", &modifyHeaders, absl::nullopt, "");
return Http::FilterHeadersStatus::StopAllIterationAndWatermark;
}
FilterHeadersStatus ExampleContext::onRequestHeaders(uint32_t, bool) {
sendLocalResponse(400, "", "", {});
return FilterHeadersStatus::StopAllIterationAndWatermark;
}
FilterHeadersStatus ExampleContext::onResponseHeaders(uint32_t, bool) {
sendLocalResponse(400, "", "", {});
return FilterHeadersStatus::StopAllIterationAndWatermark;
} They behave differently, even though they look similar. The native extension will end up sending back to the client the response from the encodeHeaders, while the proxy-wasm extension will end up with a stuck connection. The difference here is that native extension actually calls sendLocalReply before stopping iteration, while proxy-wasm extension ends up stopping iteration without calling sendLocalReply from onResponseHeaders (because of the changes introduced in envoyproxy/envoy#23049 to deal with use-after-free issue). Finally, it seems that sendLocalReply in Envoy is already trying to close the connection when local reply cannot be sent to the client, though in all cases I tried it also triggers an assert and kills Envoy, but that's a separate issue. |
@krinkinmu great job, thank you for investigating this! Yeah, if that PR can be safely reverted (retaining the test cases), then that's great. cc @johnlanni @wbpcode @mpwarres @martijneken
There is a lot of deferred actions we do in Wasm to avoid re-entrance from callbacks. Coincidentally, |
@krinkinmu I looked at the latest version of the Envoy code, If this PR is reverted, the issue mentioned here should still exist: envoyproxy/envoy#23047 (comment)
|
@johnlanni compared to envoyproxy/envoy#23049 now when proxy-wasm calls into sendLocalReply during request processing it will end up calling prepareLocalReplyViaFilterChain instead of sendLocalReplyViaFilterChain. The difference between the two is that prepareLocalReplyViaFilterChain just saves the local request and returns without triggering it. Instead it gets triggered later by executeLocalReplyIfPrepared. So with this change in mind, if we return to the example you had in envoyproxy/envoy#23047 (comment), we will end up with the following sequence of events:
To make sure I also did an asan build of Envoy and run a similar example to the one in envoyproxy/envoy#23047 (comment). Here is my Envoy config:
I didn't use Go SDK though and instead created what I think is equivalent C++ SDK filter: class ExampleContext : public Context {
public:
explicit ExampleContext(uint32_t id, RootContext *root) : Context(id, root) {}
FilterDataStatus onRequestBody(size_t headers, bool end_of_stream) override;
FilterHeadersStatus onResponseHeaders(uint32_t headers, bool end_of_stream) override;
};
static RegisterContextFactory register_ExampleContext(CONTEXT_FACTORY(ExampleContext));
FilterDataStatus ExampleContext::onRequestBody(size_t, bool) {
logDebug("onRequestBody");
sendLocalResponse(400, "", "", {});
sendLocalResponse(401, "", "", {});
return FilterDataStatus::StopIterationAndWatermark;
}
FilterHeadersStatus ExampleContext::onResponseHeaders(uint32_t, bool) {
logDebug("onResponseHeaders");
return FilterHeadersStatus::Continue;
} And I tested it like this:
ASAN does not complain and I also stepped through the code under gdb to see that my understanding of the code flow matches what actually happens in practice. Could you share more, why do you think that the issue still exists? Maybe there is a different example that triggers use-after-free that I'm overlooking? |
@krinkinmu Thank you for your detailed analysis, I didn't notice the handling of |
@PiotrSikora I did check and it seems like returning Continue from either proxy-wasm plugin or native plugin does not actually change anything. For proxy-wasm, it's what I expected, because regardless of what plugin returns, proxy-wasm will override the return status when proxy_send_local_response is called. For native plugins I had to read through the code a bit and basically for native plugins something very similar happens:
Which makes me think whether we still need to defer local responses in proxy-wasm anymore? It looks like envoyproxy/envoy#24367 addressed the issue for local replies at least in a generic way that works for all plugins and not just proxy-wasm. |
This change fixes envoyproxy#28826. Some additional discussions for context can be found in proxy-wasm/proxy-wasm-cpp-host#423. The issue reported in envoyproxy#28826 happens when proxy-wasm plugin calls proxy_send_local_response during the HTTP request proessing and HTTP response processing. This happens because in attempt to mitigate a use-after-free issue (see envoyproxy#23049) we added logic to proxy-wasm that avoids calling sendLocalReply multiple times. So now when proxy-wasm plugin calls proxy_send_local_response only the first call will result in sendLocalReply, while all subsequent calls will get ignored. At the same time, when proxy-wasm plugins call proxy_send_local_response, because it's used to report an error in the plugin, proxy-wasm also stops iteration. During HTTP request processing this leads to the following chain of events: 1. During request proxy-wasm plugin calls proxy_send_local_response 2. proxy_send_local_response calls sendLocalReply, which schedules the local reply to be processed later through the filter chain 3. Request processing filter chain gets aborted and Envoy sends the previous created local reply though the filter chain 4. Proxy-wasm plugin gets called to process the response it generated and it calls proxy_send_local_response 5. proxy_send_local_response **does not** call sendLocalReply, because proxy-wasm prevents multiple calls to sendLocalReply currently 6. proxy-wasm stops iteration So in the end the filter chain iteration is stopped for the response and because proxy_send_local_respose does not actually call sendLocalReply we don't send another locally generated response either. I think we can do slightly better and close the connection in this case. This change includes the following parts: 1. Partial rollback of envoyproxy#23049 2. Change to Envoy implementation of failStream used by proxy-wasm in case of critical errors 3. Tests covering this case and some other using the actual FilterManager. The most important question is why rolling back envoyproxy#23049 now is safe? The reason why it's safe, is that since introduction of prepareLocalReplyViaFilterChain in envoyproxy#24367, calling sendLocalReply multiple times is safe - that PR basically address the issue in a generic way for all the plugins, so a proxy-wasm specific fix is not needed anymore. On top of being safe, there are additional benefits to making this change: 1. We don't end up with a stuck connection in case of errors, which is slightly better 2. We remove a failure mode from proxy_send_local_response that was introduced in envoyproxy#23049 - which is good, because proxy-wasm plugins don't have a good fallback when proxy_send_local_response is failing. Why do we need to change the implementation of the failStream? failStream gets called when Wasm VM crashes (e.g., null pointer derefernce or abort inside the VM or any other unrecoverable error with the VM). Current implementation just calls sendLocalReply in this case. Let's consider what happens during the HTTP request processing when Wasm VM crashes: 1. Wasm VM crashes 2. proxy-wasm calls failStream which calls sendLocalReply 3. Envoy prepares local reply and eventually sends it through the filter chain 4. proxy-wasm plugin with a crashed VM is called to process the reply proxy-wasm in this case can't really do anything and just stops the iteration. Which is a fine way of dealing with the issue, but we can do slightly better and close the connection in this case instead of just pausing the iteration. And we are not losing anything in this case by replacing sendLocalReply with resetStream, because the local reply doesn't get send anyways. > NOTE: The same issue does not happen if the VM crashes during response > processing, because sendLocalReply in this case will send the response > directly ignoring the filter chain. Finally, why replace the current mocks with a real FilterManager? Mock implementation of sendLocalReply works fine for tests that just need to assert that sendLocalReply gets called. However, in this case we rely on the fact that it's safe to call sendLocalReply multiple times and it will do the right thing and we want to assert that the connection will get closed in the end - that cannot be tested by just checking that the sendLocalReply gets called or by relying on a simplistic mock implementation of sendLocalReply. Signed-off-by: Mikhail Krinkin <[email protected]>
This change fixes envoyproxy#28826. Some additional discussions for context can be found in proxy-wasm/proxy-wasm-cpp-host#423. The issue reported in envoyproxy#28826 happens when proxy-wasm plugin calls proxy_send_local_response during the HTTP request proessing and HTTP response processing. This happens because in attempt to mitigate a use-after-free issue (see envoyproxy#23049) we added logic to proxy-wasm that avoids calling sendLocalReply multiple times. So now when proxy-wasm plugin calls proxy_send_local_response only the first call will result in sendLocalReply, while all subsequent calls will get ignored. At the same time, when proxy-wasm plugins call proxy_send_local_response, because it's used to report an error in the plugin, proxy-wasm also stops iteration. During HTTP request processing this leads to the following chain of events: 1. During request proxy-wasm plugin calls proxy_send_local_response 2. proxy_send_local_response calls sendLocalReply, which schedules the local reply to be processed later through the filter chain 3. Request processing filter chain gets aborted and Envoy sends the previous created local reply though the filter chain 4. Proxy-wasm plugin gets called to process the response it generated and it calls proxy_send_local_response 5. proxy_send_local_response **does not** call sendLocalReply, because proxy-wasm prevents multiple calls to sendLocalReply currently 6. proxy-wasm stops iteration So in the end the filter chain iteration is stopped for the response and because proxy_send_local_respose does not actually call sendLocalReply we don't send another locally generated response either. I think we can do slightly better and close the connection in this case. This change includes the following parts: 1. Partial rollback of envoyproxy#23049 2. Change to Envoy implementation of failStream used by proxy-wasm in case of critical errors 3. Tests covering this case and some other using the actual FilterManager. The most important question is why rolling back envoyproxy#23049 now is safe? The reason why it's safe, is that since introduction of prepareLocalReplyViaFilterChain in envoyproxy#24367, calling sendLocalReply multiple times is safe - that PR basically address the issue in a generic way for all the plugins, so a proxy-wasm specific fix is not needed anymore. On top of being safe, there are additional benefits to making this change: 1. We don't end up with a stuck connection in case of errors, which is slightly better 2. We remove a failure mode from proxy_send_local_response that was introduced in envoyproxy#23049 - which is good, because proxy-wasm plugins don't have a good fallback when proxy_send_local_response is failing. Why do we need to change the implementation of the failStream? failStream gets called when Wasm VM crashes (e.g., null pointer derefernce or abort inside the VM or any other unrecoverable error with the VM). Current implementation just calls sendLocalReply in this case. Let's consider what happens during the HTTP request processing when Wasm VM crashes: 1. Wasm VM crashes 2. proxy-wasm calls failStream which calls sendLocalReply 3. Envoy prepares local reply and eventually sends it through the filter chain 4. proxy-wasm plugin with a crashed VM is called to process the reply proxy-wasm in this case can't really do anything and just stops the iteration. Which is a fine way of dealing with the issue, but we can do slightly better and close the connection in this case instead of just pausing the iteration. And we are not losing anything in this case by replacing sendLocalReply with resetStream, because the local reply doesn't get send anyways. > NOTE: The same issue does not happen if the VM crashes during response > processing, because sendLocalReply in this case will send the response > directly ignoring the filter chain. Finally, why replace the current mocks with a real FilterManager? Mock implementation of sendLocalReply works fine for tests that just need to assert that sendLocalReply gets called. However, in this case we rely on the fact that it's safe to call sendLocalReply multiple times and it will do the right thing and we want to assert that the connection will get closed in the end - that cannot be tested by just checking that the sendLocalReply gets called or by relying on a simplistic mock implementation of sendLocalReply. Signed-off-by: Mikhail Krinkin <[email protected]>
This change fixes envoyproxy#28826. Some additional discussions for context can be found in proxy-wasm/proxy-wasm-cpp-host#423. The issue reported in envoyproxy#28826 happens when proxy-wasm plugin calls proxy_send_local_response during the HTTP request proessing and HTTP response processing. This happens because in attempt to mitigate a use-after-free issue (see envoyproxy#23049) we added logic to proxy-wasm that avoids calling sendLocalReply multiple times. So now when proxy-wasm plugin calls proxy_send_local_response only the first call will result in sendLocalReply, while all subsequent calls will get ignored. At the same time, when proxy-wasm plugins call proxy_send_local_response, because it's used to report an error in the plugin, proxy-wasm also stops iteration. During HTTP request processing this leads to the following chain of events: 1. During request proxy-wasm plugin calls proxy_send_local_response 2. proxy_send_local_response calls sendLocalReply, which schedules the local reply to be processed later through the filter chain 3. Request processing filter chain gets aborted and Envoy sends the previous created local reply though the filter chain 4. Proxy-wasm plugin gets called to process the response it generated and it calls proxy_send_local_response 5. proxy_send_local_response **does not** call sendLocalReply, because proxy-wasm prevents multiple calls to sendLocalReply currently 6. proxy-wasm stops iteration So in the end the filter chain iteration is stopped for the response and because proxy_send_local_respose does not actually call sendLocalReply we don't send another locally generated response either. I think we can do slightly better and close the connection in this case. This change includes the following parts: 1. Partial rollback of envoyproxy#23049 2. Tests covering this case and some other using the actual FilterManager. The most important question is why rolling back envoyproxy#23049 now is safe? The reason why it's safe, is that since introduction of prepareLocalReplyViaFilterChain in envoyproxy#24367, calling sendLocalReply multiple times is safe - that PR basically address the issue in a generic way for all the plugins, so a proxy-wasm specific fix is not needed anymore. On top of being safe, there are additional benefits to making this change: 1. We don't end up with a stuck connection in case of errors, which is slightly better 2. We remove a failure mode from proxy_send_local_response that was introduced in envoyproxy#23049 - which is good, because proxy-wasm plugins don't have a good fallback when proxy_send_local_response is failing. Finally, why replace the current mocks with a real FilterManager? Mock implementation of sendLocalReply works fine for tests that just need to assert that sendLocalReply gets called. However, in this case we rely on the fact that it's safe to call sendLocalReply multiple times and it will do the right thing and we want to assert that the connection will get closed in the end - that cannot be tested by just checking that the sendLocalReply gets called or by relying on a simplistic mock implementation of sendLocalReply. Signed-off-by: Mikhail Krinkin <[email protected]>
…6809) Commit Message: This change fixes #28826. Some additional discussions for context can be found in proxy-wasm/proxy-wasm-cpp-host#423. The issue reported in #28826 happens when proxy-wasm plugin calls proxy_send_local_response during the HTTP request proessing and HTTP response processing. This happens because in attempt to mitigate a use-after-free issue (see #23049) we added logic to proxy-wasm that avoids calling sendLocalReply multiple times. So now when proxy-wasm plugin calls proxy_send_local_response only the first call will result in sendLocalReply, while all subsequent calls will get ignored. At the same time, when proxy-wasm plugins call proxy_send_local_response, because it's used to report an error in the plugin, proxy-wasm also stops iteration. During HTTP request processing this leads to the following chain of events: 1. During request proxy-wasm plugin calls proxy_send_local_response 2. proxy_send_local_response calls sendLocalReply, which schedules the local reply to be processed later through the filter chain 3. Request processing filter chain gets aborted and Envoy sends the previous created local reply though the filter chain 4. Proxy-wasm plugin gets called to process the response it generated and it calls proxy_send_local_response 5. proxy_send_local_response **does not** call sendLocalReply, because proxy-wasm prevents multiple calls to sendLocalReply currently 6. proxy-wasm stops iteration So in the end the filter chain iteration is stopped for the response and because proxy_send_local_respose does not actually call sendLocalReply we don't send another locally generated response either. I think we can do slightly better and close the connection in this case. This change includes the following parts: 1. Partial rollback of #23049 2. Tests covering this case and some other using the actual FilterManager. The most important question is why rolling back #23049 now is safe? The reason why it's safe, is that since introduction of prepareLocalReplyViaFilterChain in #24367, calling sendLocalReply multiple times is safe - that PR basically address the issue in a generic way for all the plugins, so a proxy-wasm specific fix is not needed anymore. On top of being safe, there are additional benefits to making this change: 1. We don't end up with a stuck connection in case of errors, which is slightly better 2. We remove a failure mode from proxy_send_local_response that was introduced in #23049 - which is good, because proxy-wasm plugins don't have a good fallback when proxy_send_local_response is failing. Finally, why replace the current mocks with a real FilterManager? Mock implementation of sendLocalReply works fine for tests that just need to assert that sendLocalReply gets called. However, in this case we rely on the fact that it's safe to call sendLocalReply multiple times and it will do the right thing and we want to assert that the connection will get closed in the end - that cannot be tested by just checking that the sendLocalReply gets called or by relying on a simplistic mock implementation of sendLocalReply. Additional Description: Risk Level: low Testing: Manually, by reproducing the case reported in #28826. I also added new unit tests and verified that they pass and aren't flaky: ``` bazel test --runs_per_test=1000 //test/extensions/common/wasm:all --config=docker-clang ``` Docs Changes: N/A Release Notes: N/A Platform Specific Features: N/A Fixes #28826 --------- Signed-off-by: Mikhail Krinkin <[email protected]>
For context see the Envoy issue envoyproxy/envoy#28826. Here is a shorter summary:
As a result we end up with a stuck connection on Envoy - no local reply and processing is stopped.
I think that proxy wasm plugins shouldn't use proxy_send_local_response this way, so ultimately whoever created such a plugin shot themselves in the foot. That being said, I think there are a few improvements that could be made here on Envoy/proxy-wasm side to handle this situation somewhat better:
Currently Envoy implementation of sendLocalResponse can detect when a second local response is requested and returns an error in this case without actually trying to send a local response.
However, even though Envoy reports an error, send_local_response ignores the result of the host specific sendLocalResponse implementation and stops processing and returns success to the wasm plugin.
With this change, send_local_response will check the result of the host-specific implementation of the sendLocalResponse. In cases when sendLocalResponse fails it will just propagate the error to the caller and do nothing else (including stopping processing).
I think this behavior makes sense in principle because on the one hand we don't ignore the failure from sendLocalResponse and on the other hand, when the failure happens we don't trigger any side-effects expected from the successful proxy_send_local_response call.
NOTE: Even though I do think that this is a more resonable behavior, it's still a change from the previous behavior and it might break existing proxy-wasm plugins. Specifically:
On the bright side of things, the plugins that are affected by this change currently just cause stuck connections in Envoy, so we are changing one undesirable behavior for another, but more explicit.
A couple of additional notes for reviewers: