Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cancellation Tokens are being emitted for Queue trigger functions when entering Drain Mode #47427

Closed
nzthiago opened this issue Dec 5, 2024 · 7 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)

Comments

@nzthiago
Copy link
Member

nzthiago commented Dec 5, 2024

Library name and version

Microsoft.Azure.Functions.Worker.Extensions.Storage.Queues

Describe the bug

When function apps scale in on the Consumption, Flex consumption, and Elastic Premium plans, a drain mode request is sent to the host and extensions. Entering drain mode should not emit a cancellation token to executing functions, it should just stop the listeners so no new function executions happen on that instance. See this issue for a little more details: Azure/azure-webjobs-sdk#2795

It appears like the Microsoft.Azure.Functions.Worker.Extensions.Storage.Queues extension is emitting a cancellation token when drain mode is called, and this should not happen.

Requesting prioritizing fixing this as it has been addressed by other triggers. This is affecting customers on Flex Consumption as it is more aggressive to scale in, blocking some from going to production. Let me know if any more info is needed.

Expected behavior

Function app scale in gracefully and no cancellation token is emitted due to an instance being marked for scale in (drain mode).

Actual behavior

A cancellation token is being emitted when a function app instance gets marked for scale in (drain mode), causing function executions to be cancelled way earlier than they should.

Reproduction Steps

As a repro:

A simple function with a delay of 20 seconds, this is the only function in the app:
Image

It has a 20 second delay in it. The app is deployed to a Flex Consumption azure functions hosting plan. It seems like drain mode is triggering the cancellation token (see in the logs that the token returns true), and this causes the message to appear in the queue again after a few minutes and this repeats again:

Image

Environment

No response

@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. Functions needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team labels Dec 5, 2024
Copy link

github-actions bot commented Dec 5, 2024

Thank you for your feedback. Tagging and routing to the team member best able to assist.

@jsquire jsquire added Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files) and removed Functions labels Dec 5, 2024
@nzthiago
Copy link
Member Author

nzthiago commented Dec 9, 2024

FYI this has already been addressed for other triggers, like Service Bus, Timer and others, hoping we could address it in a timely manner for queue storage as well.

@Peter-Juhasz
Copy link

I believe we see the same root cause here: Azure/azure-functions-dotnet-worker#2900

@nzthiago
Copy link
Member Author

This PR from @amnguye was to address this and not emit a cancellation token for scale in (drain mode) for version 5.3.0 of the Microsoft.Azure.WebJobs.Extensions.Storage package and newer. Further investigation is needed as to why the cancellation token is still being emitted.

@FHajHusein
Copy link

FHajHusein commented Dec 26, 2024

Internal incident: IcM 579547346.

I believe the issue was introduced by PR 40792. The CancellationToken passed to ProcessMessageAsync is also specified for CompleteProcessingMessageAsync. When QueueListener.StopAsync is called, TaskSeriesTimer.Cancel is called, and the CancellationToken is canceled, thus canceling the deleting of the message by CompleteProcessingMessageAsync. The message eventually reappears. Before this PR, the CancellationToken wasn't set for CompleteProcessingMessageAsync.

It is also worth noting that currently, while stopping the listener, the execution CancellationToken is canceled if the drain mode is enabled, opposite to the expected behavior. I believe the condition should be changed to become as follows (but this of course wouldn't address the issue at hand):
if (!_drainModeManager?.IsDrainModeEnabled ?? true)

@amnguye
Copy link
Member

amnguye commented Dec 29, 2024

Looking into the issue.

@seanmcc-msft
Copy link
Member

This issue was fixed by #47791, and will be included in our next release in February.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

7 participants