Add infrastructure around cub CachingDeviceAllocator, and use it in SiPixelRawToCluster #172

makortel · 2018-09-24T20:25:15Z

This PR experiments with the cub CachingDeviceAllocator (following the discussion in #138):

The CachingDeviceAllocator gets called via CUDAService, and the interface returns a unique_ptr
- The allocator parameters can be tuned via CUDAService configuration parameters
As an experiment, I tested the approach in Raw2Cluster for both the temporary working space and for event data
- For the event data I took a first step for better placement of the CUDA data formats (under CUDADataFormats/<same sub-package as in DataFormats>) by moving the digi and cluster "products" there (they are still aggregated to a single "GPUProduct" though)
As a further experiment, a CachingHostAllocator is added (based on CachingDeviceAllocator) for pinned host memory, and it is used in Raw2Cluster

There are many details that can (and maybe should) be discussed.

No changes expected.

@VinInn @fwyzard @felicepantaleo @rovere

cmsbot · 2018-09-24T20:25:31Z

A new Pull Request was created by @makortel (Matti Kortelainen) for CMSSW_10_2_X_Patatrack.

It involves the following packages:

CUDADataFormats/Common
CUDADataFormats/SiPixelCluster
CUDADataFormats/SiPixelDigi
HeterogeneousCore/CUDAServices
RecoLocalTracker/SiPixelClusterizer
RecoLocalTracker/SiPixelRecHits
SimTracker/TrackerHitAssociation

The following packages do not have a category, yet:

CUDADataFormats/Common
CUDADataFormats/SiPixelCluster
CUDADataFormats/SiPixelDigi
Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category

@cmsbot, @fwyzard can you please review it and eventually sign? Thanks.

cms-bot commands are listed here

makortel · 2018-09-24T21:13:57Z

Then some random thoughts based on the prototype

For the event data I feel that a caching allocator for pinned host memory (cudaMallocHost/cudaHostAlloc) could be useful as well
- Otherwise (given the experience of Speed up CPU side of GPU rechits #125) the pinned memory has to be owned by the EDModule
- AFAICT cub does not have one (although it is trivial to copy-paste the device allocator)
A good SoA abstraction would be useful to reduce copy-paste (eventually for both device and pinned host memory)

CUDADataFormats/SiPixelCluster/interface/SiPixelClustersCUDA.h

makortel · 2018-09-25T20:21:34Z

The last commit should fix the leaks (by really releasing the cached memory).

CUB's tendency to "ignore" CUDA errors (or, breaking out of a loop without saying anything unless recompiled with -DCUB_STDERR) didn't really help debugging...

fwyzard · 2018-09-28T15:30:06Z

Tested with various configurations, running over 4000 real data events.
No changes in performance observed.

fwyzard · 2018-09-30T19:56:19Z

Validation summary

Reference release CMSSW_10_2_5 at a8a031d
Development branch CMSSW_10_2_X_Patatrack at 58a5ecb
Testing PRs:

Add infrastructure around cub CachingDeviceAllocator, and use it in SiPixelRawToCluster #172 at 248806b

`makeTrackValidationPlots.py` plots

/RelValTTbar_13/CMSSW_10_2_2-PU25ns_102X_upgrade2018_realistic_v11-v2/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.8
tracking validation plots and summary for workflow 10824.7
tracking validation plots and summary for workflow 10824.9

/RelValZMM_13/CMSSW_10_2_2-102X_upgrade2018_realistic_v11-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.8
tracking validation plots and summary for workflow 10824.7
tracking validation plots and summary for workflow 10824.9

DQM GUI plots

/RelValTTbar_13/CMSSW_10_2_2-PU25ns_102X_upgrade2018_realistic_v11-v2/GEN-SIM-DIGI-RAW

reference DQM plots for reference release, workflow 10824.5
DQM plots for development release, workflow 10824.5
DQM plots for development release, workflow 10824.8
DQM plots for development release, workflow 10824.7
DQM plots for development release, workflow 10824.9
DQM plots for testing release, workflow 10824.5
DQM plots for testing release, workflow 10824.8
DQM plots for testing release, workflow 10824.7
DQM plots for testing release, workflow 10824.9
DQM comparison for reference workflow 10824.5
DQM comparison for workflow 10824.8
DQM comparison for workflow 10824.7
DQM comparison for workflow 10824.9

/RelValZMM_13/CMSSW_10_2_2-102X_upgrade2018_realistic_v11-v1/GEN-SIM-DIGI-RAW

reference DQM plots for reference release, workflow 10824.5
DQM plots for development release, workflow 10824.5
DQM plots for development release, workflow 10824.8
DQM plots for development release, workflow 10824.7
DQM plots for development release, workflow 10824.9
DQM plots for testing release, workflow 10824.5
DQM plots for testing release, workflow 10824.8
DQM plots for testing release, workflow 10824.7
DQM plots for testing release, workflow 10824.9
DQM comparison for reference workflow 10824.5
DQM comparison for workflow 10824.8
DQM comparison for workflow 10824.7
DQM comparison for workflow 10824.9

logs and `nvprof`/`nvvp` profiles

/RelValTTbar_13/CMSSW_10_2_2-PU25ns_102X_upgrade2018_realistic_v11-v2/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
development release, workflow 10824.5
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
development release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ✔️ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 10824.7
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
development release, workflow 10824.9
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ✔️ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 10824.5
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
testing release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ✔️ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 10824.7
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
testing release, workflow 10824.9
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ✔️ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors

/RelValZMM_13/CMSSW_10_2_2-102X_upgrade2018_realistic_v11-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
development release, workflow 10824.5
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
development release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ✔️ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 10824.7
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
development release, workflow 10824.9
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ✔️ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 10824.5
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
testing release, workflow 10824.8
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ✔️ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 10824.7
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
testing release, workflow 10824.9
- step3.py: log, visual profile and summary
- profile.py: log, visual profile and summary
- ✔️ cuda-memcheck --tool initcheck --track-unused-memory no (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/d3aa4432e3a66fac98d59095492a81fe27dcd608/log .

fwyzard · 2018-09-30T20:51:19Z

From the validation point of view, the PR is ready to go in.
@VinInn @felicepantaleo @rovere @makortel how do we want to proceed ?

fwyzard · 2018-12-03T17:19:44Z

Unfortunately, looks like this PR introduced a large tracking inefficiency:

	10.4.0-pre2 running 10824.5	#201 running 10824.8	#172 running 10824.8
Efficiency	0.4852	0.4841	0.2187
Number of TrackingParticles (after cuts)	5666	5666	5666
Number of matched TrackingParticles	2749	2743	1239
Fake rate	0.0537	0.0359	0.0347
Duplicate rate	0.0151	0.0153	0.0130
Number of tracks	32390	31928	14567
Number of true tracks	30652	30782	14061
Number of fake tracks	1738	1146	506
Number of pileup tracks	26878	26985	12377
Number of duplicate tracks	488	490	189

Lesson learned: never merge without re-running the validation on the latest commits...

@makortel , do you have some suggestions where to look ?

fwyzard · 2018-12-03T17:23:28Z

makortel · 2018-12-03T17:23:39Z

do you have some suggestions where to look ?

Not really, I'll take a look (as well).

fwyzard · 2018-12-03T17:43:47Z

Mhm, here is the result of zeroing all memory in the allocator before returning it to the requestors:

	reference	pre-#172	#172	#172 with zeroing
Efficiency	0.4852	0.4841	0.2187	0.4841
Number of TrackingParticles (after cuts)	5666	5666	5666	5666
Number of matched TrackingParticles	2749	2743	1239	2743
Fake rate	0.0537	0.0359	0.0347	0.0358
Duplicate rate	0.0151	0.0153	0.0130	0.0155
Number of tracks	32390	31928	14567	31928
Number of true tracks	30652	30782	14061	30786
Number of fake tracks	1738	1146	506	1142
Number of pileup tracks	26878	26985	12377	26988
Number of duplicate tracks	488	490	189	495

Looks like some some kernel is not properly initialising its memory ?

makortel · 2018-12-03T18:21:12Z

The cause lies in the commit 15c15ab (that had a bit mysterious behaviour also earlier). Running a test before gives the ~30k tracks, with it ~15k.

makortel · 2018-12-03T18:33:21Z

Fix is here #208.

fwyzard · 2018-12-04T13:50:02Z

Alternative fix is #209.

fwyzard · 2018-12-04T14:00:46Z

Fall back solution is reverting #172.

…iPixelRawToCluster (#172) Add infrastructure around cub CachingDeviceAllocator for device memory allocations, and CachingHostAllocator for pinned (or managed) host memory. CUDAService uses the CachingHostAllocator to allocate requested GPU->CPU/CPU->GPU buffers and data products. Configuration options can be used to request: - to print all memory (re)allocations and frees; - to preallocate device and host buffers. SiPixelRawToCluster uses the CachingDeviceAllocator for temporary buffers and data products. Fix a memory problem with SiPixelFedCablingMapGPUWrapper::ModulesToUnpack.

cmsbot added comparison-pending labels Sep 24, 2018

makortel mentioned this pull request Sep 24, 2018

implement simple memory "working space" #138

Open

makortel commented Sep 24, 2018

View reviewed changes

CUDADataFormats/SiPixelCluster/interface/SiPixelClustersCUDA.h Show resolved Hide resolved

fwyzard removed alca-pending labels Sep 25, 2018

This comment has been minimized.

Sign in to view

fwyzard added the enhancement label Sep 26, 2018

makortel force-pushed the cubAllocator branch from 97e51f7 to 248806b Compare September 26, 2018 17:10

This comment has been minimized.

Sign in to view

This was referenced Nov 28, 2018

Require allocated type to have only a trivial constructor for make_device_unique and make_host_unique #204

Merged

Add a flag to disable the caching for the allocators #205

Merged

makortel mentioned this pull request Dec 3, 2018

Fix modulesToUnpack in raw2digi #208

Merged

fwyzard mentioned this pull request Dec 4, 2018

Cache the SiPixelFedCablingMapGPU across events #209

Closed

fwyzard modified the milestones: CMSSW_10_4_X_Patatrack, CMSSW_10_4_0_pre4_Patatrack, CMSSW_10_4_0_pre3_Patatrack Jan 8, 2019

makortel mentioned this pull request Mar 14, 2019

Use only CUDA devices supported by the SCRAM toolfile #286

Merged

fwyzard mentioned this pull request Oct 8, 2020

Patatrack integration - Pixel local reconstruction (9/N) cms-sw/cmssw#31721

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add infrastructure around cub CachingDeviceAllocator, and use it in SiPixelRawToCluster #172

Add infrastructure around cub CachingDeviceAllocator, and use it in SiPixelRawToCluster #172

makortel commented Sep 24, 2018 •

edited

Loading

cmsbot commented Sep 24, 2018

makortel commented Sep 24, 2018

This comment has been minimized.

makortel commented Sep 25, 2018

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

fwyzard commented Sep 28, 2018

fwyzard commented Sep 30, 2018 •

edited

Loading

fwyzard commented Sep 30, 2018

fwyzard commented Dec 3, 2018

fwyzard commented Dec 3, 2018

makortel commented Dec 3, 2018

fwyzard commented Dec 3, 2018

makortel commented Dec 3, 2018

makortel commented Dec 3, 2018

fwyzard commented Dec 4, 2018

fwyzard commented Dec 4, 2018

Add infrastructure around cub CachingDeviceAllocator, and use it in SiPixelRawToCluster #172

Add infrastructure around cub CachingDeviceAllocator, and use it in SiPixelRawToCluster #172

Conversation

makortel commented Sep 24, 2018 • edited Loading

cmsbot commented Sep 24, 2018

makortel commented Sep 24, 2018

This comment has been minimized.

makortel commented Sep 25, 2018

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

fwyzard commented Sep 28, 2018

fwyzard commented Sep 30, 2018 • edited Loading

Validation summary

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_2_2-PU25ns_102X_upgrade2018_realistic_v11-v2/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_2-102X_upgrade2018_realistic_v11-v1/GEN-SIM-DIGI-RAW

DQM GUI plots

/RelValTTbar_13/CMSSW_10_2_2-PU25ns_102X_upgrade2018_realistic_v11-v2/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_2-102X_upgrade2018_realistic_v11-v1/GEN-SIM-DIGI-RAW

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_2_2-PU25ns_102X_upgrade2018_realistic_v11-v2/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_2-102X_upgrade2018_realistic_v11-v1/GEN-SIM-DIGI-RAW

Logs

fwyzard commented Sep 30, 2018

fwyzard commented Dec 3, 2018

fwyzard commented Dec 3, 2018

makortel commented Dec 3, 2018

fwyzard commented Dec 3, 2018

makortel commented Dec 3, 2018

makortel commented Dec 3, 2018

fwyzard commented Dec 4, 2018

fwyzard commented Dec 4, 2018

makortel commented Sep 24, 2018 •

edited

Loading

fwyzard commented Sep 30, 2018 •

edited

Loading

`makeTrackValidationPlots.py` plots

logs and `nvprof`/`nvvp` profiles