Skip to content

Commit

Permalink
Merge CUDADeviceChooser and CUDADeviceFilter to CUDADeviceChooserFilt…
Browse files Browse the repository at this point in the history
…er, add CUDADeviceChooserProducer

Developments

Fix unit test

Unit test for CUDADeviceChooserProducer
  • Loading branch information
makortel committed Aug 7, 2018
1 parent 741388b commit 02a8829
Show file tree
Hide file tree
Showing 10 changed files with 318 additions and 170 deletions.
50 changes: 27 additions & 23 deletions HeterogeneousCore/CUDACore/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,38 +21,42 @@ deployed and `HeterogeneousEDProducer` retired.

## Choosing device

The device choosing logic is split to an EDProducer, an EDFilter, and
use of Paths in the configuration.

First, a `CUDADeviceChooser` EDProducer is run. It has the logic to
device whether the following chain of EDModules should run on a CUDA
device or not, and if yes, on which CUDA device. If it decides "yes",
it produces a `CUDAToken`, which contains the device id and a CUDA
stream. If it decides "no", it does not produce anything.

Next step is a `CUDADeviceFilter` EDFilter. It checks whether the
`CUDADeviceChooser` produced a product or not. If "yes", it returns
`true`, and if "no", it returns `false`.

Finally, the pieces need to be put together in the configuration. The
`CUDADeviceChooser` can be "anywhere", but the `CUDADeviceFilter`
should be the first module on a `cms.Path`, followed by the CUDA
EDProducers (in the future it may become sufficient to have only the
first EDProducer of a chain in the `Path`).
### Dynamically between GPU and CPU

The device choosing (CPU vs. GPU, which GPU) logic is done by an
EDFilter and using Paths in the configuration.

First, a `CUDADeviceChooserFilter` EDFilter is run. It has the logic
to device whether the following chain of EDModules should run on a
CUDA device or not, and if yes, on which CUDA device. If it decides
"yes", it returns `true` and produces a `CUDAToken`, which contains
the device id and a CUDA stream. If it decides "no", it returns
`false` and does not produce anything.

Then, the pieces need to be put together in the configuration. The
`CUDADeviceChooserFilter` should be put as the first module on a
`cms.Path`, followed by the CUDA EDProducers (in the future it may
become sufficient to have only the first EDProducer of a chain in the
`Path`).
```python
process.fooCUDADevice = cms.EDProducer("CUDADeviceChooser")
process.fooCUDADeviceFilter = cms.EDFilter("CUDADeviceFilter",
process.fooCUDADeviceFilter = cms.EDFilter("CUDADeviceChooserFilter",
src = cms.InputTag("fooCUDADevice")
)
process.fooCUDA = cms.EDProducer("FooProducerCUDA")
process.fooPathCUDA = cms.Path(
process.fooCUDADeviceFilter + process.fooCUDA
)
process.fooTask = cms.Task(
process.fooDevice
)
```

### Always on GPU

In case the chain of modules should always be run on a GPU, the
EDFilter and Paths are not needed. In this case, a
`CUDADeviceChooserProducer` should be used to produce the `CUDAToken`.
If the machine has no GPUs or `CUDAService` is disabled, the producer
throws an exception.


## Data model

The GPU data can be a single pointer to device data, or a class/struct
Expand Down
89 changes: 0 additions & 89 deletions HeterogeneousCore/CUDACore/plugins/CUDADeviceChooser.cc

This file was deleted.

76 changes: 76 additions & 0 deletions HeterogeneousCore/CUDACore/plugins/CUDADeviceChooserFilter.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#include "FWCore/Framework/interface/global/EDFilter.h"
#include "FWCore/Framework/interface/Event.h"
#include "FWCore/Framework/interface/Frameworkfwd.h"
#include "FWCore/Framework/interface/MakerMacros.h"
#include "FWCore/ParameterSet/interface/ParameterSet.h"
#include "FWCore/ParameterSet/interface/ParameterSetDescription.h"
#include "FWCore/ServiceRegistry/interface/Service.h"
#include "HeterogeneousCore/CUDACore/interface/CUDAToken.h"
#include "HeterogeneousCore/CUDAServices/interface/CUDAService.h"

#include "chooseCUDADevice.h"

namespace {
struct DeviceCache {
int device;
bool enabled;
};
}

class CUDADeviceChooserFilter: public edm::global::EDFilter<edm::StreamCache<::DeviceCache>> {
public:
explicit CUDADeviceChooserFilter(const edm::ParameterSet& iConfig);
~CUDADeviceChooserFilter() override = default;

static void fillDescriptions(edm::ConfigurationDescriptions& descriptions);

std::unique_ptr<::DeviceCache> beginStream(edm::StreamID id) const;

bool filter(edm::StreamID id, edm::Event& iEvent, const edm::EventSetup& iSetup) const override;

private:
bool enabled_;
};

CUDADeviceChooserFilter::CUDADeviceChooserFilter(const edm::ParameterSet& iConfig):
enabled_(iConfig.getParameter<bool>("enabled"))
{
produces<CUDAToken>();
}

void CUDADeviceChooserFilter::fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
edm::ParameterSetDescription desc;
desc.add<bool>("enabled", true)->setComment("This parameter is intended for debugging purposes only. If disabling some CUDA chains is needed for production, it is better to remove the CUDA modules altogether from the configuration.");
descriptions.addWithDefaultLabel(desc);
descriptions.setComment("This EDFilter chooses whether a chain of CUDA EDModules depending on it should run or not, and on which CUDA device they should run. The decision is communicated downstream with the filter decision. In addition, if the filter returns true, a 'CUDAToken' is produced into the event (for false nothing is produced).");
}

std::unique_ptr<::DeviceCache> CUDADeviceChooserFilter::beginStream(edm::StreamID id) const {
auto ret = std::make_unique<::DeviceCache>();

edm::Service<CUDAService> cudaService;
ret->enabled = (enabled_ && cudaService->enabled(id));
if(!ret->enabled) {
return ret;
}

ret->device = cudacore::chooseCUDADevice(id);

LogDebug("CUDADeviceChooserFilter") << "EDM stream " << id << " set to CUDA device " << ret->device;

return ret;
}

bool CUDADeviceChooserFilter::filter(edm::StreamID id, edm::Event& iEvent, const edm::EventSetup& iSetup) const {
auto cache = streamCache(id);
if(!cache->enabled) {
return false;
}

auto ret = std::make_unique<CUDAToken>(cache->device);
LogDebug("CUDADeviceChooserFilter") << "EDM stream " << id << " CUDA device " << ret->device() << " with CUDA stream " << ret->stream().id();
iEvent.put(std::move(ret));
return true;
}

DEFINE_FWK_MODULE(CUDADeviceChooserFilter);
68 changes: 68 additions & 0 deletions HeterogeneousCore/CUDACore/plugins/CUDADeviceChooserProducer.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
#include "FWCore/Framework/interface/global/EDProducer.h"
#include "FWCore/Framework/interface/Event.h"
#include "FWCore/Framework/interface/Frameworkfwd.h"
#include "FWCore/Framework/interface/MakerMacros.h"
#include "FWCore/ParameterSet/interface/ParameterSet.h"
#include "FWCore/ParameterSet/interface/ParameterSetDescription.h"
#include "FWCore/ServiceRegistry/interface/Service.h"
#include "HeterogeneousCore/CUDACore/interface/CUDAToken.h"
#include "HeterogeneousCore/CUDAServices/interface/CUDAService.h"

#include "chooseCUDADevice.h"

#include <memory>

namespace {
struct DeviceCache {
int device;
};
}

class CUDADeviceChooserProducer: public edm::global::EDProducer<edm::StreamCache<::DeviceCache>> {
public:
explicit CUDADeviceChooserProducer(const edm::ParameterSet& iConfig);
~CUDADeviceChooserProducer() override = default;

static void fillDescriptions(edm::ConfigurationDescriptions& descriptions);

std::unique_ptr<::DeviceCache> beginStream(edm::StreamID id) const;

void produce(edm::StreamID id, edm::Event& iEvent, const edm::EventSetup& iSetup) const;
};

CUDADeviceChooserProducer::CUDADeviceChooserProducer(const edm::ParameterSet& iConfig) {
edm::Service<CUDAService> cudaService;
if(!cudaService->enabled()) {
throw cms::Exception("Configuration") << "CUDAService is disabled so CUDADeviceChooserProducer is unable to make decisions on which CUDA device to run. If you need to run without CUDA devices, please use CUDADeviceChooserFilter for conditional execution, or remove all CUDA modules from your configuration.";
}
produces<CUDAToken>();
}

void CUDADeviceChooserProducer::fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
edm::ParameterSetDescription desc;
descriptions.addWithDefaultLabel(desc);
descriptions.setComment("This EDProducer chooses on which CUDA device the chain of CUDA EDModules depending on it should run. The decision is communicated downstream with the 'CUDAToken' event product. It is an error if there are no CUDA devices, or CUDAService is disabled.");
}

std::unique_ptr<::DeviceCache> CUDADeviceChooserProducer::beginStream(edm::StreamID id) const {
auto ret = std::make_unique<::DeviceCache>();

edm::Service<CUDAService> cudaService;
if(!cudaService->enabled(id)) {
throw cms::Exception("LogicError") << "CUDA is disabled for EDM stream " << id << " in CUDAService, so CUDADeviceChooser is unable to decide the CUDA device for this EDM stream. If you need to dynamically decide whether a chain of CUDA EDModules is run or not, please use CUDADeviceChooserFilter instead.";
}
ret->device = cudacore::chooseCUDADevice(id);

LogDebug("CUDADeviceChooserProducer") << "EDM stream " << id << " set to CUDA device " << ret->device;

return ret;
}

void CUDADeviceChooserProducer::produce(edm::StreamID id, edm::Event& iEvent, const edm::EventSetup& iSetup) const {
auto ret = std::make_unique<CUDAToken>(streamCache(id)->device);
LogDebug("CUDADeviceChooserProducer") << "EDM stream " << id << " CUDA device " << ret->device() << " with CUDA stream " << ret->stream().id();
iEvent.put(std::move(ret));
}


DEFINE_FWK_MODULE(CUDADeviceChooserProducer);
40 changes: 0 additions & 40 deletions HeterogeneousCore/CUDACore/plugins/CUDADeviceFilter.cc

This file was deleted.

22 changes: 22 additions & 0 deletions HeterogeneousCore/CUDACore/plugins/chooseCUDADevice.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#include "chooseCUDADevice.h"
#include "FWCore/ServiceRegistry/interface/Service.h"
#include "HeterogeneousCore/CUDAServices/interface/CUDAService.h"

namespace cudacore {
int chooseCUDADevice(edm::StreamID id) {
edm::Service<CUDAService> cudaService;

// For startes we "statically" assign the device based on
// edm::Stream number. This is suboptimal if the number of
// edm::Streams is not a multiple of the number of CUDA devices
// (and even then there is no load balancing).
//
// TODO: improve. Possible ideas include
// - allocate M (< N(edm::Streams)) buffers per device per "chain of modules", choose dynamically which (buffer, device) to use
// - our own CUDA memory allocator
// * being able to cheaply allocate+deallocate scratch memory allows to make the execution fully dynamic e.g. based on current load
// * would probably still need some buffer space/device to hold e.g. conditions data
// - for conditions, how to handle multiple lumis per job?
return id % cudaService->numberOfDevices();
}
}
10 changes: 10 additions & 0 deletions HeterogeneousCore/CUDACore/plugins/chooseCUDADevice.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#ifndef HeterogeneousCore_CUDACore_chooseCUDADevice_h
#define HeterogeneousCore_CUDACore_chooseCUDADevice_h

#include "FWCore/Utilities/interface/StreamID.h"

namespace cudacore {
int chooseCUDADevice(edm::StreamID id);
}

#endif
Loading

0 comments on commit 02a8829

Please sign in to comment.