-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix modulesToUnpack in raw2digi #208
Fix modulesToUnpack in raw2digi #208
Conversation
@makortel |
Before #172, when the memory was allocated only once per job (per EDM stream), in case all modules are to be unpacked ( After #172, the This PR is a minimal fix to fill the memory on every event. Taking another look on the code now, it could be optimized a bit by doing the construction of the host-side array once (per cabling map update) and only transfer to device on every event (which still feels a bit stupid though). |
OK, this PR does significantly impact the performance. Before:
After:
|
In this case, I would say that we do want to allocate and fill |
vinavx2 is in a sorry state :-( in the meantime, I attempted a different fix (#209) - but it fails miserably running with multiple streams:
CachingHostAllocator.h, 542 is:
Can you think of any reasons why the event or stream should be invalid ? |
In fact, I get the very same error using this PR (compiled with
|
One hint: even with multiple threads and edm streams, running on a single GPU (setting |
…r event-by-event only for regionality
7133aa3
to
89d2076
Compare
The last commit allocates+fills+transfers the default case (unpack all modules) once within the EventSetup. If it is confirmed to work, I'd prefer this PR over #209 because this PR continues to have |
Here's a comparison of the throughput and memory usage for:
reverting #172
I can run up to 11 concurrent streams on each V100:
#208 + #212
Here as well I can run up to 11 concurrent streams on each V100:
#209 + #212
Same results regarding the number of streams:
So, the performance of the two fixes are basically the same. As for running with multiple GPUs:
|
I haven't been able to reproduce it, let's forget about it for the time being... |
Validation summaryReference release CMSSW_10_4_0_pre2 at c6061f4
|
The validation run at flatiron institute uses two GPUs, so most of the "development" workflows failed outright. Most of the "testing" workflows succeeded, and recover the performance of the CPU workflows.
|
Will finally merge then, thank you. |
…Heterogeneous (#208) As an optimisation, move the default non-regional case to the EventSetup, and allocate, fill and transfer event-by-event only for the regional case.
…Heterogeneous (#208) As an optimisation, move the default non-regional case to the EventSetup, and allocate, fill and transfer event-by-event only for the regional case.
…Heterogeneous (#208) As an optimisation, move the default non-regional case to the EventSetup, and allocate, fill and transfer event-by-event only for the regional case.
…Heterogeneous (#208) As an optimisation, move the default non-regional case to the EventSetup, and allocate, fill and transfer event-by-event only for the regional case.
…Heterogeneous (#208) As an optimisation, move the default non-regional case to the EventSetup, and allocate, fill and transfer event-by-event only for the regional case.
…Heterogeneous (#208) As an optimisation, move the default non-regional case to the EventSetup, and allocate, fill and transfer event-by-event only for the regional case.
This PR fixes the problem of uninitialized memory reported in #172 (comment) and #172 (comment).
Basically the problem was that the "modules to unpack" was left uninitialized for all but first event.
@fwyzard @VinInn