-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full workflow on GPU #197
Full workflow on GPU #197
Conversation
Validation summaryReference release CMSSW_10_4_0_pre4 at d74dd18
|
this is not freed
|
"leak" fixed in #216 p.s. cannot access the logs plots etc |
they should appear shortly |
backport fix from cms-sw#216
Validation summaryReference release CMSSW_10_4_0_pre4 at d74dd18
|
fitter.launchKernels(hh, hh.nHits, CAConstants::maxNumberOfQuadruplets(), cudaStream); | ||
kernels.classifyTuples(hh, gpu_, cudaStream); | ||
} | ||
if (transferToCPU) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one of these three (or all of them) seems to produce a
Uninitialized access at 0x2aab2b009dc0 on access by cudaMemcopy source
what does it mean?
that we are transferring uninitialized memory?
yes, the memory is not zeroed (why should be?) and we fill much less tracks then CAConstants::maxNumberOfQuadruplets()
initializing memory for each event is out of discussion.
If silencing a false positive is a requirement we can memset these storage area after allocation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that we are transferring uninitialized memory?
I think so, yes.
initializing memory for each event is out of discussion.
Agreed.
If silencing a false positive is a requirement ...
Making the tool happy is not a requirement, but I would say that silencing the error report is one - if only to be able to find the eventual real problems in the future.
... we can memset these storage area after allocation.
Sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a D2H asyncMemcpy cannot crash a performance wf because is not executed. |
My undestanding was that our conclusion about
,in particular on V100, was that it is a false positive. |
I don't think we can claim it is always a false positive, but I agree (and I opened a bug report with NVIDIA few weeks ago, though it haven't had much activity that I can see). One possibility could be to run the tests on a P100 instead, and if that does not report any problems, assume the behaviour to be correct on the V100 as well. If @felicepantaleo or @makortel have any suggestions, they are welcome. |
I run on workergpu13 performance wf workflow 32 threads 16 edm streams (on 4 gpus)
ttbar pu50 as well (500 events only) |
I got
running the perf wf (32thread,16streams) on zmumu from a MSSW_10_4_0_pre4_Patatrack virgin area. also one w/o cuda error
in refzmm_3.log |
I finally managed to get a crash on cmg1080 with proper debug prints, and I think I have an idea what is going on (there is indeed a race for the CUDA event between |
The fix is in #237. |
Validation summaryReference release CMSSW_10_4_0_pre4 at d74dd18
|
Closed in favour of #216 to recover the higher throughput. |
With this PR the full workflow raw->vertices can be performed on GPU only....
more work is clearly needed: besides fixing conflicts...
this PR correspond to orange in
http://innocent.home.cern.ch/innocent/RelVal/ttbarPU50_Ideal_tkgpuFDR5/plots_pixel_pixel/effandfakePtEtaPhi.pdf
eff vs fakes can be tuned at will