Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix access violation in resume_after and resume_on_signal #1342

Closed

Conversation

sylveon
Copy link
Contributor

@sylveon sylveon commented Aug 18, 2023

Fixes #1329

This fixes the issue by using m_state before suspending, rather than after. The issue occurs because the callback could fire before our thread of execution resumes, causing the timespan_awaiter/signal_awaiter to be destroyed inside the coroutine frame before m_state is accessed.

As a drive-by improvement, currently if await_suspend is called with a non-idle state, the threadpool object is closed (cancelling the timer/wait), the existing coroutine handle is just dropped, and resume on the new handle is fired immediately. This would cause the existing pending coroutine to hang forever. Instead, avoid doing anything and throw an exception when the awaiter is not idle. This is a very unlikely event, the test does some gymnastics (reused awaiter) to achieve this state, but better safe than sorry.

Fixes microsoft#1329

This fixes the issue by using m_state before suspending, rather than after. The issue occurs because the callback could fire before our thread of execution resumes, causing the timespan_awaiter/signal_awaiter to be destroyed inside the coroutine frame before m_state is accessed.

As a drive-by improvement, currently if await_suspend is called with a non-idle state, the threadpool object is closed (cancelling the timer/wait), the existing coroutine handle is just dropped, and resume on the new handle is fired immediately. This would cause the existing pending coroutine to hang forever. Instead, avoid doing anything and throw an exception when the awaiter is not idle. This is a very unlikely event, the test does some gymnastics (reused awaiter) to achieve this state, but better safe than sorry.
@kennykerr
Copy link
Collaborator

CI build failing with:

C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.35.32215\include\atomic(702) : Assertion failed: Invalid memory order

@sylveon
Copy link
Contributor Author

sylveon commented Sep 13, 2023

Yep, taking a look

@sylveon
Copy link
Contributor Author

sylveon commented Sep 14, 2023

A driver update hosed my system install (I love Windows), will take a bit more time than expected

@sylveon
Copy link
Contributor Author

sylveon commented Sep 21, 2023

I stopped trying to be clever and just used a mutex, that way we can be sure that await_suspend completes before the threadpool callback resumes the coroutine.

I also took the opportunity to refactor the code to use CRTP and share implementation between timers and waits, as well as insert some useful short circuits in case of cancellation.

{
}

#if defined(__GNUC__) && !defined(__clang__)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this because the upstream bug has been fixed since GCC 12

@sylveon
Copy link
Contributor Author

sylveon commented Sep 21, 2023

Everything should work now :)

@kennykerr
Copy link
Collaborator

@oldnewthing are you able to review this?

@oldnewthing
Copy link
Member

@kennykerr Will try to get to it, but kind of busy with other API stuff.

@github-actions
Copy link

This pull request is stale because it has been open 10 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@sylveon
Copy link
Contributor Author

sylveon commented Oct 13, 2023

Bump

@github-actions
Copy link

This pull request is stale because it has been open 10 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@sylveon
Copy link
Contributor Author

sylveon commented Oct 29, 2023

Can someone reopen?

@sylveon
Copy link
Contributor Author

sylveon commented Feb 8, 2024

Bump

Copy link

This pull request is stale because it has been open 10 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Copy link

github-actions bot commented Mar 1, 2024

This pull request is stale because it has been open 10 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@sylveon
Copy link
Contributor Author

sylveon commented Mar 1, 2024

Is there anything preventing this from being merged?

@kennykerr
Copy link
Collaborator

I'm not able to review this code #1329 (comment). My preference is to strip coroutine support in C++/WinRT down to the basics, removing all of the helpers, cancellation, and progress support as it all appears somewhat unreliable and buggy and is too difficult to reason about.

@sylveon
Copy link
Contributor Author

sylveon commented Mar 2, 2024

Migrating all of that to WIL seems preferred (I'm not opposed to that), but then it would be a breaking change that requires C++/WinRT 3. Maybe @oldnewthing can review this until then

@oldnewthing
Copy link
Member

The PR description says "This fixes the issue by using m_state before suspending, rather than after. " But the change is much bigger than this sentence suggests.

Can we untangle the essential fix from the drive-by fix?

@sylveon
Copy link
Contributor Author

sylveon commented Mar 3, 2024

Making the fix separately in both classes would not lead to a much better PR diff unfortunately, so I merged them together into a single base class to reduce code duplication (and therefore less effort to review).

Copy link

This pull request is stale because it has been open 10 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@sylveon
Copy link
Contributor Author

sylveon commented Mar 17, 2024

In short, I don't think untangling the drive-by would make it much easier to read.

Copy link

This pull request is stale because it has been open 10 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Copy link

github-actions bot commented Apr 9, 2024

This pull request is stale because it has been open 10 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@kennykerr
Copy link
Collaborator

Closing this PR as there's been no real activity in nearly 8 months. If a project maintainer is available to review, they can always reopen this PR as needed.

@kennykerr kennykerr closed this Apr 16, 2024
@JaiganeshKumaran
Copy link
Contributor

Why hasn't Microsoft got anyone to look after C++/WinRT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Access violation when using winrt::resume_on_signal
6 participants