Fix CachingHostAllocator for multiple GPUs #212

makortel · 2018-12-04T19:29:00Z

The CachingHostAllocator uses an associated CUDA stream and device for "asynchronous free" (to support the creation+transfer of "me" pointer in data formats

cmssw/CUDADataFormats/SiPixelDigi/src/SiPixelDigisCUDA.cc

Lines 17 to 25 in e5291a0

    
             auto view = cs->make_host_unique<DeviceConstView>(stream); 
        
             view->xx_ = xx_d.get(); 
        
             view->yy_ = yy_d.get(); 
        
             view->adc_ = adc_d.get(); 
        
             view->moduleInd_ = moduleInd_d.get(); 
        
             view_d = cs->make_device_unique<DeviceConstView>(stream); 
        
             cudaCheck(cudaMemcpyAsync(view_d.get(), view.get(), sizeof(DeviceConstView), cudaMemcpyDefault, stream.id())); 
        
           }

). The implementation missed one detail regarding multiple GPUs: when claiming a previously-cached memory block, the current device may differ from the device of the previous allocation, and in that case, the CUDA event must be re-created for the new device.

This PR fixes that behavior, and should fix the crashes reported in #208 (comment).

@fwyzard

… from the last time

makortel added 4 commits December 4, 2018 19:01

Enhance debug prints with device information

e99374a

Reset the device when re-using a cached host block

3f60e96

Need to recreate the CUDA event in case the associated device changes…

81aa038

… from the last time

Throw if there is an error in deallocation

d46b950

makortel mentioned this pull request Dec 4, 2018

Fix modulesToUnpack in raw2digi #208

Merged

This was referenced Dec 5, 2018

Full workflow on GPU #197

Closed

Cache the SiPixelFedCablingMapGPU across events #209

Closed

fwyzard merged commit 4684349 into cms-patatrack:CMSSW_10_4_X_Patatrack Dec 7, 2018

fwyzard added this to the CMSSW_10_4_0_pre3_Patatrack milestone Dec 7, 2018

fwyzard added bug fixed labels Dec 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CachingHostAllocator for multiple GPUs #212

Fix CachingHostAllocator for multiple GPUs #212

makortel commented Dec 4, 2018

	auto view = cs->make_host_unique<DeviceConstView>(stream);
	view->xx_ = xx_d.get();
	view->yy_ = yy_d.get();
	view->adc_ = adc_d.get();
	view->moduleInd_ = moduleInd_d.get();

	view_d = cs->make_device_unique<DeviceConstView>(stream);
	cudaCheck(cudaMemcpyAsync(view_d.get(), view.get(), sizeof(DeviceConstView), cudaMemcpyDefault, stream.id()));
	}

Fix CachingHostAllocator for multiple GPUs #212

Fix CachingHostAllocator for multiple GPUs #212

Conversation

makortel commented Dec 4, 2018