-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Next prototype of the framework integration #100
Next prototype of the framework integration #100
Conversation
Summarizing here the outcome of the meeting today (*). We chose to continue with the
|
7b1391f
to
02a8829
Compare
Rebased on top of head of Regarding the point 1 in #100 (comment), the choice of splitting the logic to an EDProducer and an EDFilter was based on earlier experience that usually (though not always) combining them leads to problems. Thinking further this particular case
so it seems that the producer side for the two cases is fundamentally different. Therefore I added a commit toying with the idea of providing
In short, yes, the producer and the filter can be (and makes sense to) combined for the case where(/if) we want to be able to dynamically decide whether a chain of CUDA EDModules should run on a GPU or a CPU. |
Some random thoughts about CUDA streams
|
Except the first two (CUDA stream owned by a module, and ability to create additional CUDA streams on a given device) are in conflict because (in general) I can't communicate the chosen device from a module to the |
I am not sure I understand this point. Are you suggesting to use one CUDA stream to compute, and a separate CUDA stream for the transfer from GPU to CPU of the results ? Do we submit the kernels in Or do we submit them all in I think the former is easier - but what do we gain from reusing the same CUDA stream for the chain of modules ? |
To my understanding that is the standard "trick" to compute and transfer data in parallel. My main motivation was to think how that could be done within the context of this PR (regardless if we want to do that or not). There would be some benefits (even with the assumption "we achieve the parallelism with EDM streams")
We get the behaviour equivalent to TBB Flowgraph's "streaming_node". I.e., if an EDProducer does not have to transfer anything back to CPU for subsequent work (like number of digis/clusters/hits/quadruplets), it can be a regular EDProducer just queueing more kernels to the CUDA stream. Performance benefit would come from running the GPU computations in parallel to the "framework overhead". The "streaming_node" is not enforced though, it just emerges automatically if an EDProducer meets the necessary constraints.
Closer to the latter. As a concrete example let's take the raw2cluster. The chain of events would be the following
If points 4 and 5 use the same CUDA stream, they will be run serially (5 gets inserted somewhere in the middle of subsequent work of 4, or after them). They can be made run in parallel by introducing additional CUDA stream, and the mechanism I described on slide 15 of #100 (comment) will take care of the synchronization with a CUDA event. |
48d4372
to
a721b31
Compare
5200bc1
to
cf2d1bb
Compare
02a8829
to
7447c89
Compare
Provide a mechanism for a chain of modules to share a resource, that can be e.g. CUDA device memory or a CUDA stream. Minimize data movements between the CPU and the device, and support multiple devices. Allow the same job configuration to be used on all hardware combinations. See HeterogeneousCore/CUDACore/README.md for a more detailed description and examples.
Provide a mechanism for a chain of modules to share a resource, that can be e.g. CUDA device memory or a CUDA stream. Minimize data movements between the CPU and the device, and support multiple devices. Allow the same job configuration to be used on all hardware combinations. See HeterogeneousCore/CUDACore/README.md for a more detailed description and examples.
Provide a mechanism for a chain of modules to share a resource, that can be e.g. CUDA device memory or a CUDA stream. Minimize data movements between the CPU and the device, and support multiple devices. Allow the same job configuration to be used on all hardware combinations. See HeterogeneousCore/CUDACore/README.md for a more detailed description and examples.
Provide a mechanism for a chain of modules to share a resource, that can be e.g. CUDA device memory or a CUDA stream. Minimize data movements between the CPU and the device, and support multiple devices. Allow the same job configuration to be used on all hardware combinations. See HeterogeneousCore/CUDACore/README.md for a more detailed description and examples.
Provide a mechanism for a chain of modules to share a resource, that can be e.g. CUDA device memory or a CUDA stream. Minimize data movements between the CPU and the device, and support multiple devices. Allow the same job configuration to be used on all hardware combinations. See HeterogeneousCore/CUDACore/README.md for a more detailed description and examples.
Provide a mechanism for a chain of modules to share a resource, that can be e.g. CUDA device memory or a CUDA stream. Minimize data movements between the CPU and the device, and support multiple devices. Allow the same job configuration to be used on all hardware combinations. See HeterogeneousCore/CUDACore/README.md for a more detailed description and examples.
Should have been removed as part of cms-patatrack#100.
Should have been removed as part of cms-patatrack#100.
Remove SiPixelDigiHeterogeneousConverter as obsolete, should have been removed as part of #100. Address review comments for SiPixelClustersCUDA: - remove commented out default constructor and private: from DeviceConstView; this is perhaps the best compromise between non-default constructors not being preferred for device allocations, and the use case in SiPixelRecHitSoAFromLegacy (for the expected life time of this class) - remove const getters with c_ prefix - improve constructor parameter name - use more initializer list - initialize nClusters_h Address review comments for SiPixelDigiErrorsCUDA: - use type alias - remove const getters with c_ prefix and other unnecessary methods - use more initializer list Address review comments for SiPixelDigisCUDA: - remove const getters with c_ prefix and other unnecessary methods - remove commented out default constructor and private: from DeviceConstView - add comments for remaining SiPixelDigisCUDA member arrays Move PixelErrorsCompact and SiPixelDigiErrorsSoa to DataFormats/SiPixelRawData, rename classes Address review comments for SiPixelErrorsSoA - remove redundant assert - move constructor inline Address review comments for SiPixelDigisSoA - remove redundant assert - add comments Enable if constexpr also for CUDA in TrackingRecHit2DHeterogeneous Move dictionary of HostProduct<unsigned int[]> to CUDADataFormats/Common
Provide a mechanism for a chain of modules to share a resource, that can be e.g. CUDA device memory or a CUDA stream. Minimize data movements between the CPU and the device, and support multiple devices. Allow the same job configuration to be used on all hardware combinations. See HeterogeneousCore/CUDACore/README.md for a more detailed description and examples.
Provide a mechanism for a chain of modules to share a resource, that can be e.g. CUDA device memory or a CUDA stream. Minimize data movements between the CPU and the device, and support multiple devices. Allow the same job configuration to be used on all hardware combinations. See HeterogeneousCore/CUDACore/README.md for a more detailed description and examples.
Remove SiPixelDigiHeterogeneousConverter as obsolete, should have been removed as part of #100. Address review comments for SiPixelClustersCUDA: - remove commented out default constructor and private: from DeviceConstView; this is perhaps the best compromise between non-default constructors not being preferred for device allocations, and the use case in SiPixelRecHitSoAFromLegacy (for the expected life time of this class) - remove const getters with c_ prefix - improve constructor parameter name - use more initializer list - initialize nClusters_h Address review comments for SiPixelDigiErrorsCUDA: - use type alias - remove const getters with c_ prefix and other unnecessary methods - use more initializer list Address review comments for SiPixelDigisCUDA: - remove const getters with c_ prefix and other unnecessary methods - remove commented out default constructor and private: from DeviceConstView - add comments for remaining SiPixelDigisCUDA member arrays Move PixelErrorsCompact and SiPixelDigiErrorsSoa to DataFormats/SiPixelRawData, rename classes Address review comments for SiPixelErrorsSoA - remove redundant assert - move constructor inline Address review comments for SiPixelDigisSoA - remove redundant assert - add comments Enable if constexpr also for CUDA in TrackingRecHit2DHeterogeneous Move dictionary of HostProduct<unsigned int[]> to CUDADataFormats/Common
Provide a mechanism for a chain of modules to share a resource, that can be e.g. CUDA device memory or a CUDA stream. Minimize data movements between the CPU and the device, and support multiple devices. Allow the same job configuration to be used on all hardware combinations. See HeterogeneousCore/CUDACore/README.md for a more detailed description and examples.
Provide a mechanism for a chain of modules to share a resource, that can be e.g. CUDA device memory or a CUDA stream. Minimize data movements between the CPU and the device, and support multiple devices. Allow the same job configuration to be used on all hardware combinations. See HeterogeneousCore/CUDACore/README.md for a more detailed description and examples.
Provide a mechanism for a chain of modules to share a resource, that can be e.g. CUDA device memory or a CUDA stream. Minimize data movements between the CPU and the device, and support multiple devices. Allow the same job configuration to be used on all hardware combinations. See HeterogeneousCore/CUDACore/README.md for a more detailed description and examples.
Remove SiPixelDigiHeterogeneousConverter as obsolete, should have been removed as part of #100. Address review comments for SiPixelClustersCUDA: - remove commented out default constructor and private: from DeviceConstView; this is perhaps the best compromise between non-default constructors not being preferred for device allocations, and the use case in SiPixelRecHitSoAFromLegacy (for the expected life time of this class) - remove const getters with c_ prefix - improve constructor parameter name - use more initializer list - initialize nClusters_h Address review comments for SiPixelDigiErrorsCUDA: - use type alias - remove const getters with c_ prefix and other unnecessary methods - use more initializer list Address review comments for SiPixelDigisCUDA: - remove const getters with c_ prefix and other unnecessary methods - remove commented out default constructor and private: from DeviceConstView - add comments for remaining SiPixelDigisCUDA member arrays Move PixelErrorsCompact and SiPixelDigiErrorsSoa to DataFormats/SiPixelRawData, rename classes Address review comments for SiPixelErrorsSoA - remove redundant assert - move constructor inline Address review comments for SiPixelDigisSoA - remove redundant assert - add comments Enable if constexpr also for CUDA in TrackingRecHit2DHeterogeneous Move dictionary of HostProduct<unsigned int[]> to CUDADataFormats/Common
Provide a mechanism for a chain of modules to share a resource, that can be e.g. CUDA device memory or a CUDA stream. Minimize data movements between the CPU and the device, and support multiple devices. Allow the same job configuration to be used on all hardware combinations. See HeterogeneousCore/CUDACore/README.md for a more detailed description and examples.
Provide a mechanism for a chain of modules to share a resource, that can be e.g. CUDA device memory or a CUDA stream. Minimize data movements between the CPU and the device, and support multiple devices. Allow the same job configuration to be used on all hardware combinations. See HeterogeneousCore/CUDACore/README.md for a more detailed description and examples.
Remove SiPixelDigiHeterogeneousConverter as obsolete, should have been removed as part of #100. Address review comments for SiPixelClustersCUDA: - remove commented out default constructor and private: from DeviceConstView; this is perhaps the best compromise between non-default constructors not being preferred for device allocations, and the use case in SiPixelRecHitSoAFromLegacy (for the expected life time of this class) - remove const getters with c_ prefix - improve constructor parameter name - use more initializer list - initialize nClusters_h Address review comments for SiPixelDigiErrorsCUDA: - use type alias - remove const getters with c_ prefix and other unnecessary methods - use more initializer list Address review comments for SiPixelDigisCUDA: - remove const getters with c_ prefix and other unnecessary methods - remove commented out default constructor and private: from DeviceConstView - add comments for remaining SiPixelDigisCUDA member arrays Move PixelErrorsCompact and SiPixelDigiErrorsSoa to DataFormats/SiPixelRawData, rename classes Address review comments for SiPixelErrorsSoA - remove redundant assert - move constructor inline Address review comments for SiPixelDigisSoA - remove redundant assert - add comments Enable if constexpr also for CUDA in TrackingRecHit2DHeterogeneous Move dictionary of HostProduct<unsigned int[]> to CUDADataFormats/Common
Remove SiPixelDigiHeterogeneousConverter as obsolete, should have been removed as part of #100. Address review comments for SiPixelClustersCUDA: - remove commented out default constructor and private: from DeviceConstView; this is perhaps the best compromise between non-default constructors not being preferred for device allocations, and the use case in SiPixelRecHitSoAFromLegacy (for the expected life time of this class) - remove const getters with c_ prefix - improve constructor parameter name - use more initializer list - initialize nClusters_h Address review comments for SiPixelDigiErrorsCUDA: - use type alias - remove const getters with c_ prefix and other unnecessary methods - use more initializer list Address review comments for SiPixelDigisCUDA: - remove const getters with c_ prefix and other unnecessary methods - remove commented out default constructor and private: from DeviceConstView - add comments for remaining SiPixelDigisCUDA member arrays Move PixelErrorsCompact and SiPixelDigiErrorsSoa to DataFormats/SiPixelRawData, rename classes Address review comments for SiPixelErrorsSoA - remove redundant assert - move constructor inline Address review comments for SiPixelDigisSoA - remove redundant assert - add comments Enable if constexpr also for CUDA in TrackingRecHit2DHeterogeneous Move dictionary of HostProduct<unsigned int[]> to CUDADataFormats/Common
Provide a mechanism for a chain of modules to share a resource, that can be e.g. CUDA device memory or a CUDA stream. Minimize data movements between the CPU and the device, and support multiple devices. Allow the same job configuration to be used on all hardware combinations. See HeterogeneousCore/CUDACore/README.md for a more detailed description and examples.
Remove SiPixelDigiHeterogeneousConverter as obsolete, should have been removed as part of #100. Address review comments for SiPixelClustersCUDA: - remove commented out default constructor and private: from DeviceConstView; this is perhaps the best compromise between non-default constructors not being preferred for device allocations, and the use case in SiPixelRecHitSoAFromLegacy (for the expected life time of this class) - remove const getters with c_ prefix - improve constructor parameter name - use more initializer list - initialize nClusters_h Address review comments for SiPixelDigiErrorsCUDA: - use type alias - remove const getters with c_ prefix and other unnecessary methods - use more initializer list Address review comments for SiPixelDigisCUDA: - remove const getters with c_ prefix and other unnecessary methods - remove commented out default constructor and private: from DeviceConstView - add comments for remaining SiPixelDigisCUDA member arrays Move PixelErrorsCompact and SiPixelDigiErrorsSoa to DataFormats/SiPixelRawData, rename classes Address review comments for SiPixelErrorsSoA - remove redundant assert - move constructor inline Address review comments for SiPixelDigisSoA - remove redundant assert - add comments Enable if constexpr also for CUDA in TrackingRecHit2DHeterogeneous Move dictionary of HostProduct<unsigned int[]> to CUDADataFormats/Common
Provide a mechanism for a chain of modules to share a resource, that can be e.g. CUDA device memory or a CUDA stream. Minimize data movements between the CPU and the device, and support multiple devices. Allow the same job configuration to be used on all hardware combinations. See HeterogeneousCore/CUDACore/README.md for a more detailed description and examples.
Remove SiPixelDigiHeterogeneousConverter as obsolete, should have been removed as part of #100. Address review comments for SiPixelClustersCUDA: - remove commented out default constructor and private: from DeviceConstView; this is perhaps the best compromise between non-default constructors not being preferred for device allocations, and the use case in SiPixelRecHitSoAFromLegacy (for the expected life time of this class) - remove const getters with c_ prefix - improve constructor parameter name - use more initializer list - initialize nClusters_h Address review comments for SiPixelDigiErrorsCUDA: - use type alias - remove const getters with c_ prefix and other unnecessary methods - use more initializer list Address review comments for SiPixelDigisCUDA: - remove const getters with c_ prefix and other unnecessary methods - remove commented out default constructor and private: from DeviceConstView - add comments for remaining SiPixelDigisCUDA member arrays Move PixelErrorsCompact and SiPixelDigiErrorsSoa to DataFormats/SiPixelRawData, rename classes Address review comments for SiPixelErrorsSoA - remove redundant assert - move constructor inline Address review comments for SiPixelDigisSoA - remove redundant assert - add comments Enable if constexpr also for CUDA in TrackingRecHit2DHeterogeneous Move dictionary of HostProduct<unsigned int[]> to CUDADataFormats/Common
Provide a mechanism for a chain of modules to share a resource, that can be e.g. CUDA device memory or a CUDA stream. Minimize data movements between the CPU and the device, and support multiple devices. Allow the same job configuration to be used on all hardware combinations. See HeterogeneousCore/CUDACore/README.md for a more detailed description and examples.
Provide a mechanism for a chain of modules to share a resource, that can be e.g. CUDA device memory or a CUDA stream. Minimize data movements between the CPU and the device, and support multiple devices. Allow the same job configuration to be used on all hardware combinations. See HeterogeneousCore/CUDACore/README.md for a more detailed description and examples.
Provide a mechanism for a chain of modules to share a resource, that can be e.g. CUDA device memory or a CUDA stream. Minimize data movements between the CPU and the device, and support multiple devices. Allow the same job configuration to be used on all hardware combinations. See HeterogeneousCore/CUDACore/README.md for a more detailed description and examples.
Provide a mechanism for a chain of modules to share a resource, that can be e.g. CUDA device memory or a CUDA stream. Minimize data movements between the CPU and the device, and support multiple devices. Allow the same job configuration to be used on all hardware combinations. See HeterogeneousCore/CUDACore/README.md for a more detailed description and examples.
Provide a mechanism for a chain of modules to share a resource, that can be e.g. CUDA device memory or a CUDA stream. Minimize data movements between the CPU and the device, and support multiple devices. Allow the same job configuration to be used on all hardware combinations. See HeterogeneousCore/CUDACore/README.md for a more detailed description and examples.
Despite of my original plan of not proceeding with framework side before the demonstrator, here is a prototype of the CUDA algorithm integration based on my discussions with @Dr15Jones and @wddgit. See the included
README.md
for more technical details.I'm marking the PR with RFC because we need to discuss first on the details and understand whether it could make sense to deploy it already for the demonstrator. Otherwise the PR can serve as a discussion forum on the topic until the demonstrator is finished.
HeterogeneousEDProducer
edm::Ref
sNeed to add(not needed with `SwitchProducer)cms.Path
s in thecff
filesFixes #133.
@felicepantaleo @fwyzard @VinInn @rovere