Full workflow on GPU #197

VinInn · 2018-11-18T10:37:36Z

With this PR the full workflow raw->vertices can be performed on GPU only....

more work is clearly needed: besides fixing conflicts...
this PR correspond to orange in
http://innocent.home.cern.ch/innocent/RelVal/ttbarPU50_Ideal_tkgpuFDR5/plots_pixel_pixel/effandfakePtEtaPhi.pdf
eff vs fakes can be tuned at will

fwyzard · 2018-12-14T17:46:57Z

Validation summary

Reference release CMSSW_10_4_0_pre4 at d74dd18
Development branch CMSSW_10_4_X_Patatrack at 6f55d70
Testing PRs:

Full workflow on GPU #197 at 909cfb0

`makeTrackValidationPlots.py` plots

/RelValTTbar_13/CMSSW_10_4_0_pre3-PU25ns_103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.8
tracking validation plots and summary for workflow 10824.7
tracking validation plots and summary for workflow 10824.9

/RelValZMM_13/CMSSW_10_4_0_pre3-103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.8
tracking validation plots and summary for workflow 10824.7
tracking validation plots and summary for workflow 10824.9

logs and `nvprof`/`nvvp` profiles

/RelValTTbar_13/CMSSW_10_4_0_pre3-PU25ns_103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- step3.py: log, visual profile and summary
development release, workflow 10824.5
- step3.py: log, visual profile and summary
development release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ✔️ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 10824.7
- step3.py: log, visual profile and summary
development release, workflow 10824.9
- step3.py: log, visual profile and summary
testing release, workflow 10824.5
- step3.py: log, visual profile and summary
testing release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ❌ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) found 721771 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ❌ cuda-memcheck --tool synccheck (report, log) found 104861 errors
testing release, workflow 10824.7
- step3.py: log, visual profile and summary
testing release, workflow 10824.9
- step3.py: log, visual profile and summary

/RelValZMM_13/CMSSW_10_4_0_pre3-103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- step3.py: log, visual profile and summary
development release, workflow 10824.5
- step3.py: log, visual profile and summary
development release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ✔️ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 10824.7
- step3.py: log, visual profile and summary
development release, workflow 10824.9
- step3.py: log, visual profile and summary
testing release, workflow 10824.5
- step3.py: log, visual profile and summary
testing release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ❌ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) found 832810 errors
- ❌ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) found no CUDA-MEMCHECK results
- ❌ cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
testing release, workflow 10824.7
- step3.py: log, visual profile and summary
testing release, workflow 10824.9
- step3.py: log, visual profile and summary

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/efc7ea0614f89ca603723c9f52ad6fb033b8049d/log .

VinInn · 2018-12-14T17:59:10Z

this is not freed

cudaCheck(cudaMalloc((void **) & d_phase1TopologyLayer_, phase1PixelTopology::layer.size() * sizeof(uint8_t)));

VinInn · 2018-12-14T18:05:13Z

"leak" fixed in #216

p.s. cannot access the logs plots etc

fwyzard · 2018-12-14T18:53:03Z

p.s. cannot access the logs plots etc

they should appear shortly

backport fix from cms-sw#216

fwyzard · 2018-12-18T20:35:21Z

Validation summary

Reference release CMSSW_10_4_0_pre4 at d74dd18
Development branch CMSSW_10_4_X_Patatrack at 6f55d70
Testing PRs:

Full workflow on GPU #197 at 5487396

`makeTrackValidationPlots.py` plots

/RelValTTbar_13/CMSSW_10_4_0_pre3-PU25ns_103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.8
tracking validation plots and summary for workflow 10824.7
tracking validation plots and summary for workflow 10824.9

/RelValZMM_13/CMSSW_10_4_0_pre3-103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.8
tracking validation plots and summary for workflow 10824.7
tracking validation plots and summary for workflow 10824.9

logs and `nvprof`/`nvvp` profiles

/RelValTTbar_13/CMSSW_10_4_0_pre3-PU25ns_103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- step3.py: log, visual profile and summary
development release, workflow 10824.5
- step3.py: log, visual profile and summary
development release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ✔️ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 10824.7
- step3.py: log, visual profile and summary
development release, workflow 10824.9
- step3.py: log, visual profile and summary
testing release, workflow 10824.5
- step3.py: log, visual profile and summary
testing release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ❌ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) found 721771 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ❌ cuda-memcheck --tool synccheck (report, log) found 104869 errors
testing release, workflow 10824.7
- step3.py: log, visual profile and summary
testing release, workflow 10824.9
- step3.py: log, visual profile and summary

/RelValZMM_13/CMSSW_10_4_0_pre3-103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- step3.py: log, visual profile and summary
development release, workflow 10824.5
- step3.py: log, visual profile and summary
development release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ✔️ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 10824.7
- step3.py: log, visual profile and summary
development release, workflow 10824.9
- step3.py: log, visual profile and summary
testing release, workflow 10824.5
- step3.py: log, visual profile and summary
testing release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ❌ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) found 832810 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ❌ cuda-memcheck --tool synccheck (report, log) found 785 errors
testing release, workflow 10824.7
- step3.py: log, visual profile and summary
testing release, workflow 10824.9
- step3.py: log, visual profile and summary

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/4370cd08b17086ff421b3c9ea4e3ef1e4de6b18d/log .

VinInn · 2018-12-19T09:36:09Z

RecoPixelVertexing/PixelTriplets/plugins/CAHitQuadrupletGeneratorGPU.cc

+    fitter.launchKernels(hh, hh.nHits, CAConstants::maxNumberOfQuadruplets(), cudaStream);
+    kernels.classifyTuples(hh, gpu_, cudaStream);
+  }
+  if (transferToCPU) {


one of these three (or all of them) seems to produce a

Uninitialized access at 0x2aab2b009dc0 on access by cudaMemcopy source

what does it mean?
that we are transferring uninitialized memory?
yes, the memory is not zeroed (why should be?) and we fill much less tracks then CAConstants::maxNumberOfQuadruplets()
initializing memory for each event is out of discussion.
If silencing a false positive is a requirement we can memset these storage area after allocation

that we are transferring uninitialized memory?

I think so, yes.

initializing memory for each event is out of discussion.

Agreed.

If silencing a false positive is a requirement ...

Making the tool happy is not a requirement, but I would say that silencing the error report is one - if only to be able to find the eventual real problems in the future.

... we can memset these storage area after allocation.

Sounds good.

initchk silenced here c46e716
(on top of #216 and #236)

VinInn · 2018-12-19T09:40:36Z

a D2H asyncMemcpy cannot crash a performance wf because is not executed.

VinInn · 2018-12-19T09:44:08Z

My undestanding was that our conclusion about

========= Barrier error detected. Divergent thread(s) in warp

,in particular on V100, was that it is a false positive.

fwyzard · 2018-12-19T09:48:07Z

My undestanding was that our conclusion about
========= Barrier error detected. Divergent thread(s) in warp
, in particular on V100, was that it is a false positive.

I don't think we can claim it is always a false positive, but I agree (and I opened a bug report with NVIDIA few weeks ago, though it haven't had much activity that I can see).

One possibility could be to run the tests on a P100 instead, and if that does not report any problems, assume the behaviour to be correct on the V100 as well. If @felicepantaleo or @makortel have any suggestions, they are welcome.

VinInn · 2018-12-19T17:26:00Z

I run on workergpu13 performance wf workflow 32 threads 16 edm streams (on 4 gpus)
and zmumu crashes (50% of the time) not necessarely at first event: always in

CUDA error 33 [/data/user/fwyzard/patatrack/build/slc7_amd64_gcc700.patatrack/tmp/BUILDROOT/344a52fa455e34923c6647f1154d765f/opt/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0_pre3_Patatrack/src/HeterogeneousCore/CUDAServices/src/CachingHostAllocator.h, 551]: invalid resource handle
terminate called after throwing an instance of 'cuda::runtime_error'
  what():  invalid resource handle

ttbar pu50 as well (500 events only)
(but w/o the cuda error message)
logfiles in
/mnt/home/innocent/mc
fromtmp*.log pu50*.log

VinInn · 2018-12-19T19:09:02Z

I got

Begin processing the 1301st record. Run 1, Event 12711, LumiSection 128 on stream 4 at 19-Dec-2018 14:04:48.670 EST
CUDA error 33 [/data/user/fwyzard/patatrack/build/slc7_amd64_gcc700.patatrack/tmp/BUILDROOT/97ed725b1553863373e93f508f3e9e35/opt/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_4_0_pre4_Patatrack/src/HeterogeneousCore/CUDAServices/src/CachingHostAllocator.h, 551]: invalid resource handle
terminate called after throwing an instance of 'cuda::runtime_error'
  what():  invalid resource handle

running the perf wf (32thread,16streams) on zmumu from a MSSW_10_4_0_pre4_Patatrack virgin area.
log in /mnt/home/innocent/mc/refzmm.log

also one w/o cuda error

Begin processing the 401st record. Run 1, Event 10837, LumiSection 109 on stream 3 at 19-Dec-2018 14:15:51.164 EST


A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Wed Dec 19 14:15:53 EST 2018

in refzmm_3.log

makortel · 2018-12-20T20:45:52Z

I finally managed to get a crash on cmg1080 with proper debug prints, and I think I have an idea what is going on (there is indeed a race for the CUDA event between HostFree() and HostAllocate()). I believe the CachingDeviceAllocator suffers from the same problem as well. Fixes follow shortly.

makortel · 2018-12-20T21:18:51Z

The fix is in #237.

fwyzard · 2019-01-08T13:42:07Z

Validation summary

Reference release CMSSW_10_4_0_pre4 at d74dd18
Development branch CMSSW_10_4_X_Patatrack at 68f320f
Testing PRs:

Full workflow on GPU #197 at 5487396

`makeTrackValidationPlots.py` plots

/RelValTTbar_13/CMSSW_10_4_0_pre3-PU25ns_103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.8
tracking validation plots and summary for workflow 10824.7
tracking validation plots and summary for workflow 10824.9

/RelValZMM_13/CMSSW_10_4_0_pre3-103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.8
tracking validation plots and summary for workflow 10824.7
tracking validation plots and summary for workflow 10824.9

logs and `nvprof`/`nvvp` profiles

/RelValTTbar_13/CMSSW_10_4_0_pre3-PU25ns_103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- step3.py: log, visual profile and summary
development release, workflow 10824.5
- step3.py: log, visual profile and summary
development release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ✔️ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 10824.7
- step3.py: log, visual profile and summary
development release, workflow 10824.9
- step3.py: log, visual profile and summary
testing release, workflow 10824.5
- step3.py: log, visual profile and summary
testing release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ❌ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) found 721771 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ❌ cuda-memcheck --tool synccheck (report, log) found 104861 errors
testing release, workflow 10824.7
- step3.py: log, visual profile and summary
testing release, workflow 10824.9
- step3.py: log, visual profile and summary

/RelValZMM_13/CMSSW_10_4_0_pre3-103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- step3.py: log, visual profile and summary
development release, workflow 10824.5
- step3.py: log, visual profile and summary
development release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ✔️ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 10824.7
- step3.py: log, visual profile and summary
development release, workflow 10824.9
- step3.py: log, visual profile and summary
testing release, workflow 10824.5
- step3.py: log, visual profile and summary
testing release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ❌ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) found 832803 errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ❌ cuda-memcheck --tool synccheck (report, log) found no CUDA-MEMCHECK results
testing release, workflow 10824.7
- step3.py: log, visual profile and summary
testing release, workflow 10824.9
- step3.py: log, visual profile and summary

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/f2f0ba95b164f8f65ac3a363fd6e574f8f562fa1/log .

fwyzard · 2019-01-08T15:55:15Z

Closed in favour of #216 to recover the higher throughput.

VinInn added 30 commits October 5, 2018 11:47

use gpu vertices

d15d0dd

add vertex spitting

fc8ffad

fix iterations

9e3a3aa

apply outlier rejection, tune error

c9418f6

fix duplicate cleaning

5e9d7cf

sort and clean

2e7910f

fishbone works

2d155f1

fishbone works

aad5235

fishbone works

3244703

add layerid

d6508a8

copy layer on gpu

e2fd6d2

efficient

c68f413

optimize parallelization

9dc2184

update notebook to include fishbone

b3ed9d0

silence it

cc973f6

mark magic 2

06365df

remove magic 256, reduce it to 128

f2439af

reduce size

ae7fc3f

remove duplicate code lines

9030763

narrow cut to avoid inefficiency for realistic

d79faf5

Merged gpuVertexRedux from repository VinInn with cms-merge-topic

376a0d4

build pentuplets

ad06e33

simplify

7bea72e

align to offline

932fec9

simplify histogrammer: no need of ws in fill

d0f3adf

test cuda_assert

6a192fd

use more stable and gpu friendly version of circle

fce72dc

assoc tested

f7dbc25

check cosdir

483d591

clean clode

384465e

VinInn mentioned this pull request Dec 14, 2018

faster track fit and faster clustering #216

Merged

fwyzard added the bug label Dec 18, 2018

fix missing Free

5487396

backport fix from cms-sw#216

VinInn commented Dec 19, 2018

View reviewed changes

makortel mentioned this pull request Dec 20, 2018

Fix a race for CUDA event in CachingHostAllocator #237

Merged

VinInn mentioned this pull request Dec 23, 2018

Speed up in clusterizer and doubletFinder #238

Closed

fwyzard removed bug labels Jan 7, 2019

fwyzard closed this Jan 8, 2019

fwyzard modified the milestones: CMSSW_10_4_X_Patatrack, CMSSW_10_4_0_pre4_Patatrack Jan 8, 2019

fwyzard mentioned this pull request Jan 8, 2019

re-enable vertex reconstruction on the GPU #191

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full workflow on GPU #197

Full workflow on GPU #197

VinInn commented Nov 18, 2018 •

edited

Loading

fwyzard commented Dec 14, 2018

VinInn commented Dec 14, 2018 •

edited

Loading

VinInn commented Dec 14, 2018 •

edited

Loading

fwyzard commented Dec 14, 2018

fwyzard commented Dec 18, 2018

VinInn Dec 19, 2018 •

edited

Loading

fwyzard Dec 19, 2018

VinInn Dec 21, 2018

VinInn commented Dec 19, 2018

VinInn commented Dec 19, 2018

fwyzard commented Dec 19, 2018

VinInn commented Dec 19, 2018

VinInn commented Dec 19, 2018 •

edited

Loading

makortel commented Dec 20, 2018

makortel commented Dec 20, 2018

fwyzard commented Jan 8, 2019

fwyzard commented Jan 8, 2019

Full workflow on GPU #197

Full workflow on GPU #197

Conversation

VinInn commented Nov 18, 2018 • edited Loading

fwyzard commented Dec 14, 2018

Validation summary

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_4_0_pre3-PU25ns_103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_4_0_pre3-103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_4_0_pre3-PU25ns_103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_4_0_pre3-103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

Logs

VinInn commented Dec 14, 2018 • edited Loading

VinInn commented Dec 14, 2018 • edited Loading

fwyzard commented Dec 14, 2018

fwyzard commented Dec 18, 2018

Validation summary

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_4_0_pre3-PU25ns_103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_4_0_pre3-103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_4_0_pre3-PU25ns_103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_4_0_pre3-103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

Logs

VinInn Dec 19, 2018 • edited Loading

Choose a reason for hiding this comment

fwyzard Dec 19, 2018

Choose a reason for hiding this comment

VinInn Dec 21, 2018

Choose a reason for hiding this comment

VinInn commented Dec 19, 2018

VinInn commented Dec 19, 2018

fwyzard commented Dec 19, 2018

VinInn commented Dec 19, 2018

VinInn commented Dec 19, 2018 • edited Loading

makortel commented Dec 20, 2018

makortel commented Dec 20, 2018

fwyzard commented Jan 8, 2019

Validation summary

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_4_0_pre3-PU25ns_103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_4_0_pre3-103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_4_0_pre3-PU25ns_103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_4_0_pre3-103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

Logs

fwyzard commented Jan 8, 2019

VinInn commented Nov 18, 2018 •

edited

Loading

`makeTrackValidationPlots.py` plots

logs and `nvprof`/`nvvp` profiles

VinInn commented Dec 14, 2018 •

edited

Loading

VinInn commented Dec 14, 2018 •

edited

Loading

`makeTrackValidationPlots.py` plots

logs and `nvprof`/`nvvp` profiles

VinInn Dec 19, 2018 •

edited

Loading

VinInn commented Dec 19, 2018 •

edited

Loading

`makeTrackValidationPlots.py` plots

logs and `nvprof`/`nvvp` profiles