-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement RecHit SOA and move to new framework #322
implement RecHit SOA and move to new framework #322
Conversation
…atrices of static dimensions to run on the GPUs
…o use matrices of static dimensions in order to run on the GPUs.
- deleted the forgotten prints and time measurements; - created a new modifier for the broken line fit; - switched back from tipMax=1 to tipMax=0.1 (the change will maybe be done in another PR); - restored the original order of the cuts on chi2 and tip; - deleted the default label to pixelFitterByBrokenLine; - switched from CUDA_HOSTDEV to __host__ __device__; - BrokenLine.h now uses dinamically-sized-matrices (the advantage in using them over the statically-sized ones is that the code would also work with n>4) and, as before, the switch can be easily done at the start of the file; - hence, the test on GPUs now needs an increase in the stack size (at least 1761 bytes); - some doxygen comments in BrokenLine.h have been updated.
use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to take another look, but here is a first round of comments for the RecHit SOA+migration commits.
|
||
|
||
m_store16 = cs->make_device_unique<uint16_t[]>(nHits*n16,stream); | ||
m_store32 = cs->make_device_unique<float[]>(nHits*n32+11+(1+TrackingRecHit2DSOAView::Hist::wsSize())/sizeof(float),stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arrays are not necessarily 128-byte-aligned, right? (or do I miss something?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, I was thinking to introduce a stride function computing the required stride
((n*b+127)/128)*128/b and use it as stride(nHits,4) stride(nHIts,2) consistently.
it is also true that these arrays are accessed mostly random, so does not make much difference if aligned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
|
||
view->m_charge = (int32_t *)get32(8); | ||
view->m_xsize = (int16_t *)get16(2); | ||
view->m_ysize = (int16_t *)get16(3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could reinterpret_cast
be used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what difference it makes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aesthetics (also code rules)
@@ -9,6 +9,7 @@ | |||
#include "DataFormats/GeometrySurface/interface/SOARotation.h" | |||
#include "Geometry/TrackerGeometryBuilder/interface/phase1PixelTopology.h" | |||
#include "HeterogeneousCore/CUDAUtilities/interface/cuda_cxx17.h" | |||
#include "HeterogeneousCore/CUDAUtilities/interface/cudaCompat.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this include only for testing purposes, or really needed at the moment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not strictly needed at the moment...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually is needed due to the device functions....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the header gets included in some CPU .cc file as well? Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, there is the possibility of using this very CPE in the standard CPU wfs...
CUDAProduct<TrackingRecHit2DCUDA> const& inputDataWrapped = iEvent.get(tokenHit_); | ||
|
||
// try to be in parallel with tracking | ||
CUDAScopedContext ctx{iEvent.streamID(), std::move(waitingTaskHolder)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After #305 this comment is not really accurate, and the context should be constructed as
CUDAScopedContext ctx{iEvent.streamID(), std::move(waitingTaskHolder)}; | |
CUDAScopedContext ctx{inputDataWrapped, std::move(waitingTaskHolder)}; |
This work and the pixel tracking will be run in separate CUDA streams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
heterogeneous::GPUCuda, | ||
heterogeneous::CPU | ||
> > { | ||
class SiPixelRecHitHeterogeneous : public edm::global::EDProducer<> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to (eventually) rename this class as SiPixelRecHitCUDA
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, in the plan
|
||
convertGPUtoCPU(iEvent.event(), hclusters, *output); | ||
} | ||
gpuAlgo_.makeHitsAsync(hits,digis, clusters, bs, fcpe->getGPUProductAsync(ctx.stream()), ctx.stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively makeHitsAsync()
could construct and return TrackingRecHit2DCUDA
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
originally I though to keep separate the CPU class with storage in the producer and have the algo only depending on the View. This did not work out as one needs more pointers on the host.
So yes, it is a possibility
int16_t * iph = &hits.iphi(0); | ||
float * xl = &hits.xLocal(0); float * yl = &hits.yLocal(0); | ||
float * xe = &hits.xerrLocal(0); float * ye = &hits.yerrLocal(0); | ||
int16_t * xs = &hits.clusterSizeX(0); int16_t * ys = &hits.clusterSizeY(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation is off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, was just copied from the signature.
will fix and beatify
siPixelRecHitsLegacyPreSplitting = cms.VPSet( | ||
cms.PSet(type = cms.string("SiPixelRecHitedmNewDetSetVector")) | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, in RecHit case there is no need for an alias as SiPixelRecHitFromSOA
and legacy SiPixelRecHitConverter
produce the same products, so this could be simply
cuda = _siPixelRecHitFromSOA.clone()
(after moving the corresponding import above this line)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks. I just tried to make it working waiting for your explanation of the meaning of all that
@@ -68,13 +70,19 @@ PixelCPEFast::PixelCPEFast(edm::ParameterSet const & conf, | |||
|
|||
const pixelCPEforGPU::ParamsOnGPU *PixelCPEFast::getGPUProductAsync(cuda::stream_t<>& cudaStream) const { | |||
const auto& data = gpuData_.dataForCurrentDeviceAsync(cudaStream, [this](GPUData& data, cuda::stream_t<>& stream) { | |||
|
|||
std::cout << "coping pixelCPEforGPU" << std::endl; | |||
//here or above??? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want it to print out when the transfer is initiated, this is the correct place. ("above" outside of the lambda would print every time the product is asked from)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, indeed. I wanted just to make sure it was really done once
(I was planning to fill here some constant stuff, but does not work across libraries)
This PR should be superseded by #324 / #329, apart from #324 (comment) . |
in this PR (on top of #312 and #318
TrackingRecHit is introduced as CUDAFormats
RecHit producer migrated to new framework #100
Clients migrated.
tbd: rename and clean
all three wf tested
also moved to constant memory in doublet builder following Hackaton investigation
(10% speed up in the kernel)