Fix synchronization mistake in CUDAScopedContext #327
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR description:
This PR fixes a synchronization mistake in
CUDAScopedContext
reported by @VinInn e.g. here #318 (comment). The problem was restricted to cases where the producerExternalWork
CUDAScopedContext::emplace()
(all of these are fulfilled by the BeamSpot PR #318)
The problem was that
CUDASCopedContext
checked whether the CUDA stream is idle or not before calling the CUDA product constructor (for the CUDA event optimization of #292). If the stream is idle, theCUDAProduct
is marked immediately to be available to avoid inspecting the state of the CUDA event (well, even creation of the event). But if the constructor queues more work, the state of theCUDAProduct
is incorrect.The proposed fix is to check for the CUDA stream status after calling the constructor.
PR validation:
Profiling workflow runs, unit tests run. Verified with printouts that now the product status information in the consumer of BeamSpot is consistent with the stream status information.