Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement RecHit SOA and move to new framework #322

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
224 commits
Select commit Hold shift + click to select a range
4973ac7
Add back standalone GPU fit test
rovere Sep 3, 2018
5e21c6b
First Broken Line import
GimmyTomas Sep 3, 2018
56ab682
Merged PatatrackHackathon4 from repository rovere with cms-merge-topic
GimmyTomas Sep 3, 2018
d447fbb
BrokenLineGPU: work in progress
GimmyTomas Sep 4, 2018
55a86bc
Full implementation of the broken line fit. For the moment it needs m…
GimmyTomas Sep 19, 2018
d864079
Full implementation of the broken line fit. For the moment it needs t…
GimmyTomas Sep 19, 2018
e29cb32
Fixed the requested changes:
GimmyTomas Sep 20, 2018
7557d8d
Just forgot to add the files for the broken line modifier.
GimmyTomas Sep 20, 2018
9b81f01
DO NOT MERGE - reuse the 10824.7 and .9 workflows to test the broken …
fwyzard Sep 20, 2018
174a6cc
Replaced by the autogenerated cfi file
fwyzard Sep 20, 2018
ce31f94
Merge branch 'CMSSW_10_2_X_Patatrack' into broken_line_fit
fwyzard Sep 25, 2018
e8faf5d
Merge branch 'CMSSW_10_2_X_Patatrack' into broken_line_fit
fwyzard Nov 9, 2018
a25628f
fix conflicts
VinInn Jan 29, 2019
fefb9cb
Merged broken_line_fit from repository VinInn with cms-merge-topic
VinInn Jan 29, 2019
7f731ad
fix matrix init
VinInn Jan 29, 2019
e8207e6
make it working
VinInn Jan 30, 2019
a3950ea
make it working
VinInn Jan 30, 2019
8578b61
tests runs
VinInn Jan 30, 2019
36550f2
some previous clenup
VinInn Jan 30, 2019
790ccfa
change name of fit driver
VinInn Jan 30, 2019
11482bf
BL driver ok
VinInn Jan 31, 2019
91eeafd
trivial if
VinInn Jan 31, 2019
7f0f80c
make BL default
VinInn Jan 31, 2019
92b33eb
Merged broken_line_fit from repository VinInn with cms-merge-topic
VinInn Jan 31, 2019
2013a29
allow dump of hits
VinInn Feb 3, 2019
cbc4064
Merged OptFitting105 from repository VinInn with cms-merge-topic
VinInn Feb 3, 2019
cc68930
inversion by Cholesky Decomposition
VinInn Feb 3, 2019
9dbd97c
use cholesky invert, enable 3,4,5 hit fit
VinInn Feb 4, 2019
ce8f8e1
Merged OptFitting105 from repository VinInn with cms-merge-topic
VinInn Feb 4, 2019
ad060c9
remove sqrt from choleshy
VinInn Feb 5, 2019
fce5dab
Merged OptFitting105 from repository VinInn with cms-merge-topic
VinInn Feb 5, 2019
e7fc479
cleanup cpu workflows with Rimann or BrokenLine
VinInn Feb 5, 2019
7b7dd7b
cleanup cpu workflows with Rimann or BrokenLine
VinInn Feb 5, 2019
5b9e9e6
change name to workflows as well
VinInn Feb 5, 2019
e0ebdfe
apply Matti's suggestions
VinInn Feb 6, 2019
ef1387a
try to use triplets in hole
VinInn Feb 6, 2019
bf9e9ee
pass triplets to verfication
VinInn Feb 6, 2019
c1f7483
get nutple multiplicity
VinInn Feb 7, 2019
d0f0ff2
fits3,4 and 5
VinInn Feb 7, 2019
acef030
fit 5 as 4
VinInn Feb 7, 2019
28e3e65
also rienmann 3,4&5
VinInn Feb 7, 2019
05c3527
riemann 5 as 4 as well
VinInn Feb 7, 2019
b8c7529
add configurables
VinInn Feb 8, 2019
c40436c
fix printout
VinInn Feb 8, 2019
476cb5a
Merged TripletsInHole from repository VinInn with cms-merge-topic
VinInn Feb 9, 2019
cb2ffbd
remove debug
VinInn Feb 9, 2019
447ee59
Merged TripletsInHole from repository VinInn with cms-merge-topic
VinInn Feb 10, 2019
7ea5f28
add ability to count locally
VinInn Feb 10, 2019
294a447
use local counters
VinInn Feb 10, 2019
0048368
Merged TripletsInHole from repository VinInn with cms-merge-topic
VinInn Feb 10, 2019
e9dee7b
update local for real
VinInn Feb 10, 2019
8b8a4c8
Merged TripletsInHole from repository VinInn with cms-merge-topic
VinInn Feb 10, 2019
7274328
avoid early return
VinInn Feb 10, 2019
9360613
Merged TripletsInHole from repository VinInn with cms-merge-topic
VinInn Feb 10, 2019
5dd17cc
back to global
VinInn Feb 11, 2019
d80fe12
Merged TripletsInHole from repository VinInn with cms-merge-topic
VinInn Feb 11, 2019
9aafe7c
on gpu works
VinInn Feb 14, 2019
61361ec
compiles
VinInn Feb 14, 2019
10253aa
compiles
VinInn Feb 14, 2019
24065d7
fix thread local init
VinInn Feb 14, 2019
b2f779a
san tommaso
VinInn Feb 14, 2019
583f1e6
run on gpu & cpu
VinInn Feb 14, 2019
8e5191d
Merged cudaCompat from repository VinInn with cms-merge-topic
VinInn Feb 15, 2019
c0a3d22
seems working
VinInn Feb 15, 2019
dafcac8
cpu and gpu same
VinInn Feb 15, 2019
fb702cc
cpu and gpu same
VinInn Feb 15, 2019
d1e8e80
add two alternative (faster) track clusterizers
VinInn Feb 15, 2019
ee73282
make it working on cpu as well
VinInn Feb 15, 2019
4097c6d
spit workspace
VinInn Feb 17, 2019
be30ac4
Merged cudaCompat from repository VinInn with cms-merge-topic
VinInn Feb 18, 2019
1b3c093
implement Matti's suggestions
VinInn Feb 19, 2019
36834df
assert cuda emulation
VinInn Feb 19, 2019
ddb6419
fix missing free and add _
VinInn Feb 19, 2019
36711a7
removing two fragments from override
prebello Feb 15, 2019
b13d117
removing three fragments from override
prebello Feb 15, 2019
49cdcfe
fix overrideFragments
prebello Feb 15, 2019
d4d8624
fix overrideFragments
prebello Feb 15, 2019
9f06db8
add clus size
VinInn Feb 25, 2019
f6452bd
apply zsize cut at doublets
VinInn Feb 25, 2019
b1a4522
Merged gpuClusSize from repository VinInn with cms-merge-topic
VinInn Feb 25, 2019
a12d3ae
mitigate broken clus
VinInn Feb 25, 2019
b753073
Merged gpuClusSize from repository VinInn with cms-merge-topic
VinInn Feb 25, 2019
b345b9d
try larger cut on b1
VinInn Feb 25, 2019
8fd3c6b
Merged gpuClusSize from repository VinInn with cms-merge-topic
VinInn Feb 25, 2019
97b6a46
try larger cut on b1
VinInn Feb 25, 2019
9a713d1
Merged gpuClusSize from repository VinInn with cms-merge-topic
VinInn Feb 25, 2019
c3458c5
Merge pull request #25997 from fabiocos/fc-bp25955
cmsbuild Feb 25, 2019
9cddb3b
introduce z cut
VinInn Feb 26, 2019
d49dafc
Merged gpuClusSize from repository VinInn with cms-merge-topic
VinInn Feb 26, 2019
e4aa589
debugging tp assoc on gpu
VinInn Feb 26, 2019
c06d565
fix parentesis after not
VinInn Feb 27, 2019
aebdf9a
enable clus size cut
VinInn Feb 27, 2019
b8168da
add conters
VinInn Feb 27, 2019
07d8001
add clus size cut on B1
VinInn Feb 27, 2019
4122a8d
Merged gpuClusSize from repository VinInn with cms-merge-topic
VinInn Feb 27, 2019
43110c7
add good track counter
VinInn Feb 27, 2019
d9ec193
Merged gpuClusSize from repository VinInn with cms-merge-topic
VinInn Feb 27, 2019
db23080
add possibility to test full doublet combinatorics
VinInn Feb 28, 2019
ed02737
new default
VinInn Feb 28, 2019
cb7496a
soft b1-only-Outer
VinInn Mar 1, 2019
429b434
no size cut in B1inner at all
VinInn Mar 1, 2019
631b6b1
no size cut in B1inner at all
VinInn Mar 1, 2019
c0a44bb
cut on B2 clean
VinInn Mar 4, 2019
411a1a3
update heavy Ion data promptlike GT; add GEM emap for MC GTs; use aut…
tocheng Feb 28, 2019
841ba37
update 2016/2018 ECAL MC conditions for UL
tocheng Mar 1, 2019
b48e379
Fix typo in GT name
tocheng Mar 1, 2019
1222eb5
Add new MC GEM emap to 2017 cosmic deco mode GT
tocheng Mar 1, 2019
0617593
update PF calibration in offline and MC GTs
tocheng Mar 3, 2019
469d5e1
Revert "update PF calibration in offline and MC GTs"
tocheng Mar 5, 2019
b7731d0
Merge pull request #26074 from tocheng/105X_autoCondGT_backport
cmsbuild Mar 8, 2019
f3cdceb
ideal setting
VinInn Mar 9, 2019
29b3e94
allow all triplets, protect vertex from triplets
VinInn Mar 9, 2019
73e1644
separate zcut form clsize
VinInn Mar 10, 2019
df9ab83
cleanup
VinInn Mar 10, 2019
9c4e855
make cuts configurables
VinInn Mar 10, 2019
4adf830
port to pre2
VinInn Mar 10, 2019
1c4654c
Merged gpuClusSizeReloaded from repository VinInn with cms-merge-topic
VinInn Mar 10, 2019
33f4e12
make default with no ifdef, new notebook
VinInn Mar 10, 2019
de8a395
for release remove counter output
VinInn Mar 10, 2019
e1f6196
add dump of zo,ro
VinInn Mar 12, 2019
27f8bb4
add dump of zo,ro
VinInn Mar 12, 2019
cacd3a5
new notebook with z0 resolution
VinInn Mar 12, 2019
4ed7314
no cluster shape cut
VinInn Mar 13, 2019
dd7853a
fix long standing issue with tracks not on cpu
VinInn Mar 13, 2019
578089a
back clus size cut
VinInn Mar 13, 2019
b8ab753
Merge branch 'CMSSW_10_5_X_Patatrack' into gpuClusSizeReloaded
VinInn Mar 13, 2019
204088f
Merge branch 'CMSSW_10_5_X_Patatrack' into gpuClusSizeReloaded
felicepantaleo Mar 15, 2019
c9bffce
Merged gpuClusSizeReloaded from repository VinInn with cms-merge-topic
VinInn Mar 21, 2019
ce28b57
limit pt in MS and remove a sqrt
VinInn Mar 18, 2019
0fb8530
fix chi2 in riemann
VinInn Mar 20, 2019
29bdf79
remove comment
VinInn Mar 21, 2019
d02b6b9
apply chi2 cut as function of pt
VinInn Mar 19, 2019
46523ba
move tip cut to 5mm
VinInn Mar 21, 2019
0ce22da
tune AlignedRZ in barrel
VinInn Mar 19, 2019
8ca8b5d
make theta cut in the barrel stricter
VinInn Mar 21, 2019
f1ca524
change default vertex to density clustering
VinInn Mar 22, 2019
d6924d7
sort using only 16 bits
VinInn Mar 22, 2019
f7373de
count hits in more than one track
VinInn Mar 25, 2019
eaaa7ee
triplet cleaner
VinInn Mar 26, 2019
488069f
Synchronise with CMSSW_10_5_0
fwyzard Mar 26, 2019
a137c55
cleanup
VinInn Mar 27, 2019
2748143
use tip as score
VinInn Mar 27, 2019
5551655
remove obsolete asserts
VinInn Mar 27, 2019
4612993
clean, change names, introduce flag
VinInn Mar 27, 2019
ad94b9a
Merged gpuDupHits from repository VinInn with cms-merge-topic
VinInn Mar 28, 2019
3146e68
reduce number of kernels in fit
VinInn Mar 28, 2019
bae12a0
Merged gpuDupHits from repository VinInn with cms-merge-topic
VinInn Mar 29, 2019
5007a99
make them cpu compatible
VinInn Mar 15, 2019
d5cc56d
make calib available on cpu as well
VinInn Mar 16, 2019
0dc9a60
reduce number of memset
VinInn Mar 29, 2019
1119db0
correct size of cub ws
VinInn Mar 29, 2019
8eaa29b
cleaned hits
VinInn Mar 29, 2019
ea87882
protect from empty events
VinInn Mar 29, 2019
fba58bb
fix autocond
VinInn Mar 29, 2019
bf50dda
Merged CleanRaw2DigiFor106 from repository VinInn with cms-merge-topic
VinInn Mar 29, 2019
b28588c
protect for empty events
VinInn Mar 29, 2019
74e22ac
handle hit overflow
VinInn Mar 30, 2019
c1f81f7
fix loop
VinInn Mar 30, 2019
b36fb06
simplify setHitsLayerStart
VinInn Mar 30, 2019
8bf1a1e
coalesce filling module start in a single kernel, faster
VinInn Mar 31, 2019
f37b5f4
add a single kernel prefix scan
VinInn Apr 1, 2019
6365239
synchronize as advertized in cuda doc
VinInn Apr 1, 2019
c5951f1
make possible to book less memory
VinInn Apr 1, 2019
503f703
Merged CleanRaw2DigiFor106 from repository VinInn with cms-merge-topic
VinInn Apr 1, 2019
579603b
allocate memory per event
VinInn Apr 1, 2019
79b95b3
cout empty cells
VinInn Apr 2, 2019
bf16b17
make isOuterHitOfCell dynamically allocated
VinInn Apr 2, 2019
257c681
hide implementation
VinInn Apr 2, 2019
b6e4705
smart allocation of connections
VinInn Apr 2, 2019
13adff1
fix stupid initialization bug
VinInn Apr 2, 2019
25c02bd
fix stupid initialization bug
VinInn Apr 2, 2019
6ec0bc7
implemented thread safe Smart Dynamic cache
VinInn Apr 3, 2019
c86dbe2
avoid allocation if not used
VinInn Apr 3, 2019
3de29aa
avoid race
VinInn Apr 3, 2019
34a81a4
avoid nasty race at edges
VinInn Apr 3, 2019
f77bd56
use local cache to handle fit workspace
VinInn Apr 4, 2019
4cb4747
Add perfect forwarding overload for CUDAProduct constructor
makortel Apr 9, 2019
1979a01
Move BeamSpot transfer to GPU to its own producer
makortel Apr 5, 2019
a5c4273
add
VinInn Apr 10, 2019
8392e93
triplets in gaps
VinInn Apr 10, 2019
7dca1cd
Add non-cached make_host_unique with the possibility to pass flags to…
makortel Jan 15, 2019
e55192a
Use beginStream()-allocated write-combined memory for the BeamSpot tr…
makortel Apr 10, 2019
3ac4c7c
Move raw data pinned host buffers to be allocated in constructor with…
makortel Jan 15, 2019
c9fb799
Merged beamspotToCUDA from repository makortel with cms-merge-topic
VinInn Apr 11, 2019
e7659d2
Add perfect forwarding overload for CUDAProduct constructor
makortel Apr 9, 2019
fc23822
Move BeamSpot transfer to GPU to its own producer
makortel Apr 5, 2019
6deb682
Add non-cached make_host_unique with the possibility to pass flags to…
makortel Jan 15, 2019
2864b42
Use beginStream()-allocated write-combined memory for the BeamSpot tr…
makortel Apr 10, 2019
0135dbf
Move raw data pinned host buffers to be allocated in constructor with…
makortel Jan 15, 2019
c31e4cb
Fix the CUDA product availability logic in rechit
makortel Apr 11, 2019
218eb6b
merge new bs
VinInn Apr 11, 2019
b2eb44e
Merged gpuNewRecHits from repository VinInn with cms-merge-topic
VinInn Apr 11, 2019
f749b26
Merged beamspotToCUDA from repository makortel with cms-merge-topic
VinInn Apr 11, 2019
c0195e8
compiles
VinInn Apr 12, 2019
95115df
compiles
VinInn Apr 12, 2019
0c7201f
a simple test
VinInn Apr 12, 2019
30cfaca
compiles
VinInn Apr 12, 2019
1267f2c
problems with constant memory.....
VinInn Apr 12, 2019
cc72af6
problems with constant memory.....
VinInn Apr 12, 2019
5dc7c17
compiles!
VinInn Apr 14, 2019
c991b1d
gpu wf seems working
VinInn Apr 14, 2019
dbc2951
need some helper
VinInn Apr 14, 2019
3552845
compiles
VinInn Apr 15, 2019
9b9fdbf
try to fix cff
VinInn Apr 15, 2019
5125a40
gpu wf works
VinInn Apr 15, 2019
b4d2ee2
works also gpu2cpu
VinInn Apr 15, 2019
efab658
TP migrated as well
VinInn Apr 15, 2019
0abd9ae
move from local to constant memory, faster
VinInn Apr 15, 2019
1c42196
factorize algo from wrapper
VinInn Apr 16, 2019
03865a8
addressing Matti's comments
VinInn Apr 16, 2019
b235519
speed it up by 25% (regression)
VinInn Apr 16, 2019
5485462
remove cout
VinInn Apr 16, 2019
a7cd789
add eigen SOA
VinInn Apr 16, 2019
9c4223a
fix includes
VinInn Apr 16, 2019
cda7791
fix cub workspace mess
VinInn Apr 19, 2019
3a7443b
fix hits in module overflow
VinInn Apr 19, 2019
92208ed
more debug, has verbose assert
VinInn Apr 20, 2019
7f10ed0
remove debug from danek code
VinInn Apr 20, 2019
2480178
not a solution
VinInn Apr 20, 2019
7ce9f2f
sync stream, not a solution
VinInn Apr 21, 2019
e090dba
clean
VinInn Apr 21, 2019
9359f0b
not a solution, crashes multijob
VinInn Apr 22, 2019
aa7bf0e
as for Matti's comment
VinInn Apr 22, 2019
eb74217
clean
VinInn Apr 22, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CUDADataFormats/BeamSpot/BuildFile.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<use name="FWCore/ServiceRegistry"/>
<use name="HeterogeneousCore/CUDAServices"/>
<use name="cuda-api-wrappers"/>
<use name="rootcore"/>

<export>
<lib name="1"/>
</export>
32 changes: 32 additions & 0 deletions CUDADataFormats/BeamSpot/interface/BeamSpotCUDA.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#ifndef CUDADataFormats_BeamSpot_interface_BeamSpotCUDA_h
#define CUDADataFormats_BeamSpot_interface_BeamSpotCUDA_h

#include "HeterogeneousCore/CUDAUtilities/interface/device_unique_ptr.h"

#include <cuda/api_wrappers.h>

class BeamSpotCUDA {
public:
// alignas(128) doesn't really make sense as there is only one
// beamspot per event?
struct alignas(128) Data {
float x,y,z; // position
// TODO: add covariance matrix

float sigmaZ;
float beamWidthX, beamWidthY;
float dxdz, dydz;
float emittanceX, emittanceY;
float betaStar;
};

BeamSpotCUDA() = default;
BeamSpotCUDA(Data const* data_h, cuda::stream_t<>& stream);

Data const* data() const { return data_d_.get(); }

private:
cudautils::device::unique_ptr<Data> data_d_;
};

#endif
13 changes: 13 additions & 0 deletions CUDADataFormats/BeamSpot/src/BeamSpotCUDA.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#include "CUDADataFormats/BeamSpot/interface/BeamSpotCUDA.h"

#include "FWCore/ServiceRegistry/interface/Service.h"
#include "HeterogeneousCore/CUDAServices/interface/CUDAService.h"

BeamSpotCUDA::BeamSpotCUDA(Data const* data_h, cuda::stream_t<>& stream) {
edm::Service<CUDAService> cs;

assert(std::abs(data_h->x)<1.f);
data_d_ = cs->make_device_unique<Data>(stream);
cuda::memory::async::copy(data_d_.get(), data_h, sizeof(Data), stream.id());
// cudaStreamSynchronize(stream.id());
}
8 changes: 8 additions & 0 deletions CUDADataFormats/BeamSpot/src/classes.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#ifndef CUDADataFormats_BeamSpot_classes_h
#define CUDADataFormats_BeamSpot_classes_h

#include "CUDADataFormats/Common/interface/CUDAProduct.h"
#include "CUDADataFormats/BeamSpot/interface/BeamSpotCUDA.h"
#include "DataFormats/Common/interface/Wrapper.h"

#endif
4 changes: 4 additions & 0 deletions CUDADataFormats/BeamSpot/src/classes_def.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<lcgdict>
<class name="CUDAProduct<BeamSpotCUDA>" persistent="false"/>
<class name="edm::Wrapper<CUDAProduct<BeamSpotCUDA>>" persistent="false"/>
</lcgdict>
6 changes: 6 additions & 0 deletions CUDADataFormats/Common/interface/CUDAProduct.h
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,12 @@ class CUDAProduct: public CUDAProductBase {
data_(std::move(data))
{}

template <typename... Args>
explicit CUDAProduct(int device, std::shared_ptr<cuda::stream_t<>> stream, std::shared_ptr<cuda::event_t> event, Args&&... args):
CUDAProductBase(device, std::move(stream), std::move(event)),
data_(std::forward<Args>(args)...)
{}

T data_; //!
};

Expand Down
26 changes: 26 additions & 0 deletions CUDADataFormats/SiPixelCluster/interface/gpuClusteringConstants.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#ifndef CUDADataFormats_SiPixelCluster_gpuClusteringConstants_H
#define CUDADataFormats_SiPixelCluster_gpuClusteringConstants_H

#include <cstdint>

namespace pixelGPUConstants {
#ifdef GPU_SMALL_EVENTS
constexpr uint32_t maxNumberOfHits = 24*1024;
#else
constexpr uint32_t maxNumberOfHits = 48*1024; // data at pileup 50 has 18300 +/- 3500 hits; 40000 is around 6 sigma away
#endif
}

namespace gpuClustering {
constexpr uint32_t maxHitsInModule() { return 256;}

constexpr uint32_t MaxNumModules = 2000;
constexpr uint32_t MaxNumPixels = 256 * 2000; // this does not mean maxPixelPerModule == 256!
constexpr uint32_t MaxNumClustersPerModules = 1024;
constexpr uint32_t MaxHitsInModule = maxHitsInModule();
constexpr uint32_t MaxNumClusters = pixelGPUConstants::maxNumberOfHits;
constexpr uint16_t InvId = 9999; // must be > MaxNumModules

}

#endif
8 changes: 8 additions & 0 deletions CUDADataFormats/TrackingRecHit/BuildFile.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<use name="FWCore/ServiceRegistry"/>
<use name="HeterogeneousCore/CUDAUtilities"/>
<use name="HeterogeneousCore/CUDAServices"/>
<use name="cuda-api-wrappers"/>
<use name="rootcore"/>
<export>
<lib name="1"/>
</export>
162 changes: 162 additions & 0 deletions CUDADataFormats/TrackingRecHit/interface/TrackingRecHit2DCUDA.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
#ifndef CUDADataFormats_TrackingRecHit_TrackingRecHit2DCUDA_H
#define CUDADataFormats_TrackingRecHit_TrackingRecHit2DCUDA_H


#include "CUDADataFormats/SiPixelCluster/interface/gpuClusteringConstants.h"
#include "HeterogeneousCore/CUDAUtilities/interface/HistoContainer.h"

#include "HeterogeneousCore/CUDAUtilities/interface/device_unique_ptr.h"
#include "HeterogeneousCore/CUDAUtilities/interface/host_unique_ptr.h"

#include <cuda/api_wrappers.h>

namespace pixelCPEforGPU {
struct ParamsOnGPU;
}

class TrackingRecHit2DSOAView {
public:

static constexpr uint32_t maxHits() { return gpuClustering::MaxNumClusters;}
using hindex_type = uint16_t; // if above is <=2^16

using Hist = HistoContainer<int16_t,128,gpuClustering::MaxNumClusters,8*sizeof(int16_t),uint16_t,10>;

friend class TrackingRecHit2DCUDA;

__device__ __forceinline__ uint32_t nHits() const { return m_nHits; }

__device__ __forceinline__ float & xLocal(int i) { return m_xl[i];}
__device__ __forceinline__ float xLocal(int i) const { return __ldg(m_xl+i);}
__device__ __forceinline__ float & yLocal(int i) { return m_yl[i];}
__device__ __forceinline__ float yLocal(int i) const { return __ldg(m_yl+i);}

__device__ __forceinline__ float & xerrLocal(int i) { return m_xerr[i];}
__device__ __forceinline__ float xerrLocal(int i) const { return __ldg(m_xerr+i);}
__device__ __forceinline__ float & yerrLocal(int i) { return m_yerr[i];}
__device__ __forceinline__ float yerrLocal(int i) const { return __ldg(m_yerr+i);}

__device__ __forceinline__ float & xGlobal(int i) { return m_xg[i];}
__device__ __forceinline__ float xGlobal(int i) const { return __ldg(m_xg+i);}
__device__ __forceinline__ float & yGlobal(int i) { return m_yg[i];}
__device__ __forceinline__ float yGlobal(int i) const { return __ldg(m_yg+i);}
__device__ __forceinline__ float & zGlobal(int i) { return m_zg[i];}
__device__ __forceinline__ float zGlobal(int i) const { return __ldg(m_zg+i);}
__device__ __forceinline__ float & rGlobal(int i) { return m_rg[i];}
__device__ __forceinline__ float rGlobal(int i) const { return __ldg(m_rg+i);}

__device__ __forceinline__ int16_t & iphi(int i) { return m_iphi[i];}
__device__ __forceinline__ int16_t iphi(int i) const { return __ldg(m_iphi+i);}

__device__ __forceinline__ int32_t & charge(int i) { return m_charge[i];}
__device__ __forceinline__ int32_t charge(int i) const { return __ldg(m_charge+i);}
__device__ __forceinline__ int16_t & clusterSizeX(int i) { return m_xsize[i];}
__device__ __forceinline__ int16_t clusterSizeX(int i) const { return __ldg(m_xsize+i);}
__device__ __forceinline__ int16_t & clusterSizeY(int i) { return m_ysize[i];}
__device__ __forceinline__ int16_t clusterSizeY(int i) const { return __ldg(m_ysize+i);}
__device__ __forceinline__ uint16_t & detectorIndex(int i) { return m_detInd[i];}
__device__ __forceinline__ uint16_t detectorIndex(int i) const { return __ldg(m_detInd+i);}


__device__ __forceinline__ pixelCPEforGPU::ParamsOnGPU const & cpeParams() const { return *m_cpeParams; }

__device__ __forceinline__ uint32_t hitsModuleStart(int i) const { return __ldg(m_hitsModuleStart+i);}

__device__ __forceinline__ uint32_t * hitsLayerStart() { return m_hitsLayerStart; }
__device__ __forceinline__ uint32_t const * hitsLayerStart() const { return m_hitsLayerStart; }



__device__ __forceinline__ Hist & phiBinner() { return *m_hist; }
__device__ __forceinline__ Hist const & phiBinner() const { return *m_hist; }
private:

// local coord
float *m_xl, *m_yl;
float *m_xerr, *m_yerr;

// global coord
float *m_xg, *m_yg, *m_zg, *m_rg;
int16_t * m_iphi;

// cluster properties
int32_t * m_charge;
int16_t * m_xsize;
int16_t * m_ysize;
uint16_t * m_detInd;


// supporting objects
pixelCPEforGPU::ParamsOnGPU const * m_cpeParams; // forwarded from setup, NOT owned
uint32_t const * m_hitsModuleStart; // forwarded from clusters

uint32_t * m_hitsLayerStart;

Hist * m_hist;

uint32_t m_nHits;
};


class TrackingRecHit2DCUDA {
public:

using Hist = TrackingRecHit2DSOAView::Hist;

TrackingRecHit2DCUDA() = default;

explicit
TrackingRecHit2DCUDA(uint32_t nHits,
pixelCPEforGPU::ParamsOnGPU const * cpeParams,
uint32_t const * hitsModuleStart,
cuda::stream_t<>& stream);
~TrackingRecHit2DCUDA() = default;

TrackingRecHit2DCUDA(const TrackingRecHit2DCUDA&) = delete;
TrackingRecHit2DCUDA& operator=(const TrackingRecHit2DCUDA&) = delete;
TrackingRecHit2DCUDA(TrackingRecHit2DCUDA&&) = default;
TrackingRecHit2DCUDA& operator=(TrackingRecHit2DCUDA&&) = default;


TrackingRecHit2DSOAView * view() { return m_view.get(); }
TrackingRecHit2DSOAView const * view() const { return m_view.get(); }

auto nHits() const { return m_nHits; }

auto hitsModuleStart() const { return m_hitsModuleStart;}
auto hitsLayerStart() { return m_hitsLayerStart; }
auto phiBinner() { return m_hist; }
auto iphi() { return m_iphi;}


// only the local coord and detector index
cudautils::host::unique_ptr<float[]> localCoordToHostAsync(cuda::stream_t<>& stream) const;
cudautils::host::unique_ptr<uint16_t[]> detIndexToHostAsync(cuda::stream_t<>& stream) const;
cudautils::host::unique_ptr<uint32_t[]> hitsModuleStartToHostAsync(cuda::stream_t<>& stream) const;

private:

static constexpr uint32_t n16 = 4;
static constexpr uint32_t n32 = 9;
static_assert(sizeof(uint32_t) == sizeof(float)); // just stating the obvious

cudautils::device::unique_ptr<uint16_t[]> m_store16;
cudautils::device::unique_ptr<float[]> m_store32;

cudautils::device::unique_ptr<TrackingRecHit2DSOAView::Hist> m_HistStore;

cudautils::device::unique_ptr<TrackingRecHit2DSOAView> m_view;

uint32_t m_nHits;

uint32_t const * m_hitsModuleStart; // needed for legacy this is on GPU!

// needed as kernel params...
Hist * m_hist;
uint32_t * m_hitsLayerStart;
int16_t * m_iphi;

};


#endif
87 changes: 87 additions & 0 deletions CUDADataFormats/TrackingRecHit/src/TrackingRecHit2DCUDA.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
#include "CUDADataFormats/TrackingRecHit/interface/TrackingRecHit2DCUDA.h"

#include "FWCore/ServiceRegistry/interface/Service.h"
#include "HeterogeneousCore/CUDAServices/interface/CUDAService.h"
#include "HeterogeneousCore/CUDAUtilities/interface/copyAsync.h"
// #include "HeterogeneousCore/CUDAUtilities/interface/cudaCheck.h"


TrackingRecHit2DCUDA::TrackingRecHit2DCUDA(
uint32_t nHits,
pixelCPEforGPU::ParamsOnGPU const * cpeParams,
uint32_t const * hitsModuleStart,
cuda::stream_t<>& stream) : m_nHits(nHits), m_hitsModuleStart(hitsModuleStart){

edm::Service<CUDAService> cs;

auto view = cs->make_host_unique<TrackingRecHit2DSOAView>(stream);
view->m_nHits = nHits;
m_view = cs->make_device_unique<TrackingRecHit2DSOAView>(stream);

// if empy do not bother
if (0==nHits) {
cudautils::copyAsync(m_view, view, stream);
return;
}


// the single arrays are not 128 bit alligned...
// the hits are actually accessed in order only in building
// if ordering is relevant they may have to be stored phi-ordered by layer or so
// this will break 1to1 correspondence with cluster and module locality
// so unless proven VERY inefficient we keep it ordered as generated
m_store16 = cs->make_device_unique<uint16_t[]>(nHits*n16,stream);
m_store32 = cs->make_device_unique<float[]>(nHits*n32+11,stream);
m_HistStore = cs->make_device_unique<TrackingRecHit2DSOAView::Hist>(stream);

auto get16 = [&](int i) { return m_store16.get()+i*nHits;};
auto get32 = [&](int i) { return m_store32.get()+i*nHits;};


// copy all the pointers
m_hist = view->m_hist = m_HistStore.get();

view->m_cpeParams = cpeParams;
view->m_hitsModuleStart = hitsModuleStart;

view->m_xl = get32(0);
view->m_yl = get32(1);
view->m_xerr = get32(2);
view->m_yerr = get32(3);

view->m_xg = get32(4);
view->m_yg = get32(5);
view->m_zg = get32(6);
view->m_rg = get32(7);

m_iphi = view->m_iphi = reinterpret_cast<int16_t *>(get16(0));

view->m_charge = reinterpret_cast<int32_t *>(get32(8));
view->m_xsize = reinterpret_cast<int16_t *>(get16(2));
view->m_ysize = reinterpret_cast<int16_t *>(get16(3));
view->m_detInd = get16(1);

m_hitsLayerStart = view->m_hitsLayerStart = reinterpret_cast<uint32_t *>(get32(n32));

// transfer view
cudautils::copyAsync(m_view, view, stream);
// cudaStreamSynchronize(stream.id());

}


cudautils::host::unique_ptr<float[]> TrackingRecHit2DCUDA::localCoordToHostAsync(cuda::stream_t<>& stream) const {
edm::Service<CUDAService> cs;
auto ret = cs->make_host_unique<float[]>(4*nHits(), stream);
cudautils::copyAsync(ret, m_store32, 4*nHits(), stream);
return ret;
}

cudautils::host::unique_ptr<uint32_t[]> TrackingRecHit2DCUDA::hitsModuleStartToHostAsync(cuda::stream_t<>& stream) const{
edm::Service<CUDAService> cs;
auto ret = cs->make_host_unique<uint32_t[]>(2001, stream);
cudaMemcpyAsync(ret.get(), m_hitsModuleStart, 4*2001, cudaMemcpyDefault,stream.id());
//cudautils::copyAsync(ret, m_hitsModuleStart, 2001, stream);
return ret;

}
8 changes: 8 additions & 0 deletions CUDADataFormats/TrackingRecHit/src/classes.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#ifndef CUDADataFormats_SiPixelCluster_classes_h
#define CUDADataFormats_SiPixelCluster_classes_h

#include "CUDADataFormats/Common/interface/CUDAProduct.h"
#include "CUDADataFormats/TrackingRecHit/interface/TrackingRecHit2DCUDA.h"
#include "DataFormats/Common/interface/Wrapper.h"

#endif
4 changes: 4 additions & 0 deletions CUDADataFormats/TrackingRecHit/src/classes_def.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<lcgdict>
<class name="CUDAProduct<TrackingRecHit2DCUDA>" persistent="false"/>
<class name="edm::Wrapper<CUDAProduct<TrackingRecHit2DCUDA>>" persistent="false"/>
</lcgdict>
3 changes: 3 additions & 0 deletions CUDADataFormats/TrackingRecHit/test/BuildFile.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
<use name="CUDADataFormats/TrackingRecHit"/>
<flags CUDA_FLAGS="-g -DGPU_DEBUG"/>
<bin file="TrackingRecHit2DCUDA_t.cpp TrackingRecHit2DCUDA_t.cu" name="TrackingRecHit2DCUDA_t"/>
Loading