Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a local copy of CUB's CachingDeviceAllocator and patch it for the race condition #240

Merged

Conversation

makortel
Copy link

@makortel makortel commented Jan 7, 2019

As discussed in #237.

This commit fixes a race for the CUDA event between DeviceFree() and
DeviceAllocate(). If the mutex is unlocked before the
cudaEventRecord(), there is a short period of time when
- the memory block is already in the free list (cached_blocks), and
- the CUDA event status is not yet cudaErrorNotReady
and the DeviceAllocate() may consider that memory block to be free to
be used for another CUDA stream.
@fwyzard
Copy link

fwyzard commented Jan 7, 2019

Validation summary

Reference release CMSSW_10_4_0_pre4 at d74dd18
Development branch CMSSW_10_4_X_Patatrack at eae2fbc
Testing PRs:

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_4_0_pre3-PU25ns_103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_4_0_pre3-103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_4_0_pre3-PU25ns_103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_4_0_pre3-103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/c16293b4c00e07ac2c9ac896fc82272edb494bf7/log .

@fwyzard
Copy link

fwyzard commented Jan 8, 2019

No changes in physics performance (as expected):

RelValTTbar_13 reference-10824.5 development-10824.5 development-10824.8 testing-10824.8
Efficiency 0.4818 0.4824 0.4842 0.4842
Number of TrackingParticles (after cuts) 5556 5556 5556 5556
Number of matched TrackingParticles 2677 2680 2690 2690
Fake rate 0.0519 0.0517 0.0337 0.0337
Duplicate rate 0.0168 0.0175 0.0176 0.0176
Number of tracks 32452 32480 31796 31796
Number of true tracks 30769 30801 30723 30724
Number of fake tracks 1683 1679 1073 1072
Number of pileup tracks 27093 27118 26998 26998
Number of duplicate tracks 546 567 559 559

@fwyzard
Copy link

fwyzard commented Jan 8, 2019

No changes in throughput:

before

Warming up
Running 4 times over 4200 events with 1 jobs, each with 8 threads, 8 streams and 1 GPUs
  1730.3 ±   1.8 ev/s (4000 events)
  1726.9 ±   2.4 ev/s (4000 events)
  1731.3 ±   1.4 ev/s (4000 events)
  1738.8 ±   1.3 ev/s (4000 events)

after

Warming up
Running 4 times over 4200 events with 1 jobs, each with 8 threads, 8 streams and 1 GPUs
  1726.4 ±   2.6 ev/s (4000 events)
  1734.6 ±   1.8 ev/s (4000 events)
  1725.1 ±   1.8 ev/s (4000 events)
  1729.6 ±   2.0 ev/s (4000 events)

@fwyzard fwyzard merged commit 5579cac into cms-patatrack:CMSSW_10_4_X_Patatrack Jan 8, 2019
fwyzard pushed a commit that referenced this pull request Jan 8, 2019
This commit fixes a race for the CUDA event between DeviceFree() and
DeviceAllocate(). If the mutex is unlocked before the
cudaEventRecord(), there is a short period of time when
- the memory block is already in the free list (cached_blocks), and
- the CUDA event status is not yet cudaErrorNotReady
and the DeviceAllocate() may consider that memory block to be free to
be used for another CUDA stream.
@fwyzard
Copy link

fwyzard commented Jan 8, 2019

Given the simple commit history, I have tried merging this PR using the "Rebase and merge" approach instead of the usual "Squash and merge".

The positive aspect is that it keeps the full git log history, i.e. it looked like this:

The negative aspect is that there is no trace of the PR number in the git log.
As part of the test, I have fixed that locally and force-pushed the update, so the history now looks like this:

What do people like best ?

  • squash into a single commit (I would still do this for long commit histories)
  • rebase
  • rebase and reword to add the PR number
  • merge (as done in the upstream CMSSW)

@rovere
Copy link

rovere commented Jan 8, 2019

Ciao @fwyzard
I'm not sure I understand the difference between the rebase+reword and the usual merge approach (maybe the linearity of the history w/o the branching??).
Having the PR is, IMHO, very useful. I'd actually go one step further and add the username of the requester together with the PR number (also for our upstream CMSSW repository).

@makortel
Copy link
Author

makortel commented Jan 8, 2019

I'd prefer squash or rebase over the merge (since we're not the upstream repo, and "random" merges tend to make the history, and later comparisons/ports to upstream more difficult). I also think the PR number in the commit message(s) do(es) have some value (especially since we have those already in the history). So I suppose my preference would be

  • rebase and reword when "it makes sense" to preserve the commit history of a PR
  • otherwise squash

@fwyzard
Copy link

fwyzard commented Jan 8, 2019

@makortel thanks for your comments, I think I agree with you.
Rebasing + rewording takes a bit of manual effort, so I'd reserve it for the cases where I think it makes most sense.

@fwyzard fwyzard added this to the CMSSW_10_4_X_Patatrack milestone Jan 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants