-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a race for CUDA event in CachingHostAllocator #237
Fix a race for CUDA event in CachingHostAllocator #237
Conversation
I believe the device allocator has a similar race condition (but being much harder to detect). My PR to the upstream is in https://github.com/NVlabs/cub/pull/156. |
I have repeated the tested pointed out by @VinInn (32 threads, 16 streams, 4 GPUs) with these changes, and observed no crashes after 60+ trials. |
Validation summaryReference release CMSSW_10_4_0_pre4 at d74dd18
|
@makortel, do you think we need to take any actions to handle the possible race in the device allocator, while waiting for it to be addressed upstream ?
|
No changes in physics performance, as expected (see the summaries above). Before
After
|
@fwyzard I think it would be good to patch the device allocator by ourselves. I don't have a clear opinion between the two options though. I suspect we'll eventually want to craft our own caching (device) allocator, so maybe that would give a slight preference for the own copy? (event if forking our own distribution of CUB and patching that would be cleaner) |
I prefer as well to make a local copy; the situation with It also makes it easier to add more debug statements if needed... |
Looks like we have a decision then. Do you want to do it or shall I? |
I'll have time for it tomorrow, if you don't do it sooner. |
Ok, I'll do it today. |
Done in #240. |
Currently the mutex in the allocator is unlocked after the memory block is inserted in the free list (
cached_bytes
), but before callingcudaEventRecord()
. This means that there is a short period of time when the memory block is in the free list and the CUDA event status is notcudaErrorNotReady
, and thus aHostAllocate()
(called in another thread) may consider the memory block to be free incmssw/HeterogeneousCore/CUDAServices/src/CachingHostAllocator.h
Line 393 in 6f55d70
If the memory block was previously associated to a CUDA stream on a different device, the CUDA event gets destroyed and created on the current device. In the original (freeing) thread, the
cudaEventRecord()
incmssw/HeterogeneousCore/CUDAServices/src/CachingHostAllocator.h
Line 551 in 6f55d70
gets a CUDA event that is now all the sudden on a different device, leading to the segfaults reported in #197 and #216.
This PR fixes the race condition by unlocking the mutex only after the
cudaEventRecord()
is called inHostFree()
.@fwyzard @VinInn