-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Access violation when using winrt::resume_on_signal #1329
Comments
Keep in mind that a |
The coroutine code in this case is correct - it's just The issue is that if the threadpool wait immediately fires, the coroutine execution resumes and destroys the cppwinrt/strings/base_coroutine_threadpool.h Lines 405 to 409 in 297454e
This seems like it could be a problem with the other awaiters that rely on |
Quoting cppreference for additional context:
They have some sample code right below the quoted text that further illustrates the problem. |
This issue is stale because it has been open 10 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
What's the opinions of maintainers on this? Is moving the assignment before SetThreadpoolXxx being called an acceptable solution? |
It was simpler when I originally wrote it, and I'm not comfortable messing with it now. Unless @oldnewthing is available, I suggest you write your own |
I'm busy right now but I'll try to find time to look at this in the next few weeks. |
This issue is stale because it has been open 10 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Still a problem |
This issue is stale because it has been open 10 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
The easiest way to reproduce this is to insert a sleep between SetThreadpoolX and the atomic access, to simulate a case where the threadpool thread fires before the atomic access runs. This is true of all The result is that I believe the solution would be to set state expected = state::idle;
if (m_state.compare_exchange_strong(expected, state::pending, std::memory_order_release))
{
WINRT_IMPL_SetThreadpoolWaitEx(m_wait.get(), m_handle, file_time, nullptr);
}
else
{
// fire the callback immediately
int64_t now = 0;
WINRT_IMPL_SetThreadpoolWaitEx(m_wait.get(), WINRT_IMPL_GetCurrentProcess(), &now, nullptr);
} I can PR it if this sounds like a good solution.
|
I agree that this is a problem. Now I have to reverse-engineer how "cancellation v2 (#1246)" works. |
I have a solution that I'm about to open a PR for. This problem wasn't introduced by #1246 as far as I know, as the code was the same before. It got extracted into its own type by the PR, hence why it shows up in the |
Fixes microsoft#1329 This fixes the issue by using m_state before suspending, rather than after. The issue occurs because the callback could fire before our thread of execution resumes, causing the timespan_awaiter/signal_awaiter to be destroyed inside the coroutine frame before m_state is accessed. As a drive-by improvement, currently if await_suspend is called with a non-idle state, the threadpool object is closed (cancelling the timer/wait), the existing coroutine handle is just dropped, and resume on the new handle is fired immediately. This would cause the existing pending coroutine to hang forever. Instead, avoid doing anything and throw an exception when the awaiter is not idle. This is a very unlikely event, the test does some gymnastics (reused awaiter) to achieve this state, but better safe than sorry.
Opened #1342 |
This issue is stale because it has been open 10 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
I'm gonna look at the fixing the PR this week |
This issue is stale because it has been open 10 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
This issue is stale because it has been open 10 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
This issue is stale because it has been open 10 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
still a problem |
If you're able to build cppwinrt yourself, you could try #1342 to make sure it solves your problem |
This issue is stale because it has been open 10 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
It seems a shame this has not been fixed, given resume_on_signal is a very useful function and the bug in the implementation is actually called out in the Old New Thing blog (in a slightly different context). What about implementing a PR that keeps the code as close to the existing code as possible, by following the idea of @oldnewthing in https://devblogs.microsoft.com/oldnewthing/20191225-00/?p=103265. I don't know if it's as good as the solution from @sylveon , but maybe there is more chance of it being accepted? (i.e. the idea is to prevent the coroutine from completing before await_suspend has finished doing stuff that requires any member variables). I guess the only issue would be if there was a performance concern, but I'll have to let others judge that. |
I think there's a better chance to moving this functionality to wil and keeping cppwinrt more focused on the projection only. I've just not had the time to redo my PR against wil. |
Version
No response
Summary
We ran into an access violation when using winrt::resume_on_signal. It was caused by a race condition in signal_awaiter::create_threadpool_wait. The method accesses *this after scheduling a "transfer" of the coroutine handle across threads using a threadpool wait. By the time m_state is accessed, a worker thread may have already resumed the coroutine, destroying the signal_awaiter and eventually the coroutine state.
Reproducible example
No response
Expected behavior
No response
Actual behavior
No response
Additional comments
No response
The text was updated successfully, but these errors were encountered: