Merge CUDADeviceChooser and CUDADeviceFilter to CUDADeviceChooserFilt…

…er, add CUDADeviceChooserProducer Developments Fix unit test Unit test for CUDADeviceChooserProducer
cms-patatrack · Aug 7, 2018 · 02a8829 · 02a8829
1 parent 741388b
commit 02a8829
Show file tree

Hide file tree

Showing 10 changed files with 318 additions and 170 deletions.
diff --git a/HeterogeneousCore/CUDACore/README.md b/HeterogeneousCore/CUDACore/README.md
@@ -21,38 +21,42 @@ deployed and `HeterogeneousEDProducer` retired.
 
 ## Choosing device
 
-The device choosing logic is split to an EDProducer, an EDFilter, and
-use of Paths in the configuration.
-
-First, a `CUDADeviceChooser` EDProducer is run. It has the logic to
-device whether the following chain of EDModules should run on a CUDA
-device or not, and if yes, on which CUDA device. If it decides "yes",
-it produces a `CUDAToken`, which contains the device id and a CUDA
-stream. If it decides "no", it does not produce anything.
-
-Next step is a `CUDADeviceFilter` EDFilter. It checks whether the
-`CUDADeviceChooser` produced a product or not. If "yes", it returns
-`true`, and if "no", it returns `false`.
-
-Finally, the pieces need to be put together in the configuration. The
-`CUDADeviceChooser` can be "anywhere", but the `CUDADeviceFilter`
-should be the first module on a `cms.Path`, followed by the CUDA
-EDProducers (in the future it may become sufficient to have only the
-first EDProducer of a chain in the `Path`).
+### Dynamically between GPU and CPU
+
+The device choosing (CPU vs. GPU, which GPU) logic is done by an
+EDFilter and using Paths in the configuration.
+
+First, a `CUDADeviceChooserFilter` EDFilter is run. It has the logic
+to device whether the following chain of EDModules should run on a
+CUDA device or not, and if yes, on which CUDA device. If it decides
+"yes", it returns `true` and produces a `CUDAToken`, which contains
+the device id and a CUDA stream. If it decides "no", it returns
+`false` and does not produce anything.
+
+Then, the pieces need to be put together in the configuration. The
+`CUDADeviceChooserFilter` should be put as the first module on a
+`cms.Path`, followed by the CUDA EDProducers (in the future it may
+become sufficient to have only the first EDProducer of a chain in the
+`Path`).
 ```python
-process.fooCUDADevice = cms.EDProducer("CUDADeviceChooser")
-process.fooCUDADeviceFilter = cms.EDFilter("CUDADeviceFilter",
+process.fooCUDADeviceFilter = cms.EDFilter("CUDADeviceChooserFilter",
     src = cms.InputTag("fooCUDADevice")
 )
 process.fooCUDA = cms.EDProducer("FooProducerCUDA")
 process.fooPathCUDA = cms.Path(
     process.fooCUDADeviceFilter + process.fooCUDA
 )
-process.fooTask = cms.Task(
-    process.fooDevice
-)
 ```
 
+### Always on GPU
+
+In case the chain of modules should always be run on a GPU, the
+EDFilter and Paths are not needed. In this case, a
+`CUDADeviceChooserProducer` should be used to produce the `CUDAToken`.
+If the machine has no GPUs or `CUDAService` is disabled, the producer
+throws an exception.
+
+
 ## Data model
 
 The GPU data can be a single pointer to device data, or a class/struct

diff --git a/HeterogeneousCore/CUDACore/plugins/CUDADeviceChooser.cc b/HeterogeneousCore/CUDACore/plugins/CUDADeviceChooser.cc
diff --git a/HeterogeneousCore/CUDACore/plugins/CUDADeviceChooserFilter.cc b/HeterogeneousCore/CUDACore/plugins/CUDADeviceChooserFilter.cc
@@ -0,0 +1,76 @@
+#include "FWCore/Framework/interface/global/EDFilter.h"
+#include "FWCore/Framework/interface/Event.h"
+#include "FWCore/Framework/interface/Frameworkfwd.h"
+#include "FWCore/Framework/interface/MakerMacros.h"
+#include "FWCore/ParameterSet/interface/ParameterSet.h"
+#include "FWCore/ParameterSet/interface/ParameterSetDescription.h"
+#include "FWCore/ServiceRegistry/interface/Service.h"
+#include "HeterogeneousCore/CUDACore/interface/CUDAToken.h"
+#include "HeterogeneousCore/CUDAServices/interface/CUDAService.h"
+
+#include "chooseCUDADevice.h"
+
+namespace {
+  struct DeviceCache {
+    int device;
+    bool enabled;
+  };
+}
+
+class CUDADeviceChooserFilter: public edm::global::EDFilter<edm::StreamCache<::DeviceCache>> {
+public:
+  explicit CUDADeviceChooserFilter(const edm::ParameterSet& iConfig);
+  ~CUDADeviceChooserFilter() override = default;
+
+  static void fillDescriptions(edm::ConfigurationDescriptions& descriptions);
+
+  std::unique_ptr<::DeviceCache> beginStream(edm::StreamID id) const;
+
+  bool filter(edm::StreamID id, edm::Event& iEvent, const edm::EventSetup& iSetup) const override;
+
+private:
+  bool enabled_;
+};
+
+CUDADeviceChooserFilter::CUDADeviceChooserFilter(const edm::ParameterSet& iConfig):
+  enabled_(iConfig.getParameter<bool>("enabled"))
+{
+  produces<CUDAToken>();
+}
+
+void CUDADeviceChooserFilter::fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
+  edm::ParameterSetDescription desc;
+  desc.add<bool>("enabled", true)->setComment("This parameter is intended for debugging purposes only. If disabling some CUDA chains is needed for production, it is better to remove the CUDA modules altogether from the configuration.");
+  descriptions.addWithDefaultLabel(desc);
+  descriptions.setComment("This EDFilter chooses whether a chain of CUDA EDModules depending on it should run or not, and on which CUDA device they should run. The decision is communicated downstream with the filter decision. In addition, if the filter returns true, a 'CUDAToken' is produced into the event (for false nothing is produced).");
+}
+
+std::unique_ptr<::DeviceCache> CUDADeviceChooserFilter::beginStream(edm::StreamID id) const {
+  auto ret = std::make_unique<::DeviceCache>();
+
+  edm::Service<CUDAService> cudaService;
+  ret->enabled = (enabled_ && cudaService->enabled(id));
+  if(!ret->enabled) {
+    return ret;
+  }
+
+  ret->device = cudacore::chooseCUDADevice(id);
+
+  LogDebug("CUDADeviceChooserFilter") << "EDM stream " << id << " set to CUDA device " << ret->device;
+
+  return ret;
+}
+
+bool CUDADeviceChooserFilter::filter(edm::StreamID id, edm::Event& iEvent, const edm::EventSetup& iSetup) const {
+  auto cache = streamCache(id);
+  if(!cache->enabled) {
+    return false;
+  }
+
+  auto ret = std::make_unique<CUDAToken>(cache->device);
+  LogDebug("CUDADeviceChooserFilter") << "EDM stream " << id << " CUDA device " << ret->device() << " with CUDA stream " << ret->stream().id();
+  iEvent.put(std::move(ret));
+  return true;
+}
+
+DEFINE_FWK_MODULE(CUDADeviceChooserFilter);
diff --git a/HeterogeneousCore/CUDACore/plugins/CUDADeviceChooserProducer.cc b/HeterogeneousCore/CUDACore/plugins/CUDADeviceChooserProducer.cc
@@ -0,0 +1,68 @@
+#include "FWCore/Framework/interface/global/EDProducer.h"
+#include "FWCore/Framework/interface/Event.h"
+#include "FWCore/Framework/interface/Frameworkfwd.h"
+#include "FWCore/Framework/interface/MakerMacros.h"
+#include "FWCore/ParameterSet/interface/ParameterSet.h"
+#include "FWCore/ParameterSet/interface/ParameterSetDescription.h"
+#include "FWCore/ServiceRegistry/interface/Service.h"
+#include "HeterogeneousCore/CUDACore/interface/CUDAToken.h"
+#include "HeterogeneousCore/CUDAServices/interface/CUDAService.h"
+
+#include "chooseCUDADevice.h"
+
+#include <memory>
+
+namespace {
+  struct DeviceCache {
+    int device;
+  };
+}
+
+class CUDADeviceChooserProducer: public edm::global::EDProducer<edm::StreamCache<::DeviceCache>> {
+public:
+  explicit CUDADeviceChooserProducer(const edm::ParameterSet& iConfig);
+  ~CUDADeviceChooserProducer() override = default;
+
+  static void fillDescriptions(edm::ConfigurationDescriptions& descriptions);
+
+  std::unique_ptr<::DeviceCache> beginStream(edm::StreamID id) const;
+
+  void produce(edm::StreamID id, edm::Event& iEvent, const edm::EventSetup& iSetup) const;
+};
+
+CUDADeviceChooserProducer::CUDADeviceChooserProducer(const edm::ParameterSet& iConfig) {
+  edm::Service<CUDAService> cudaService;
+  if(!cudaService->enabled()) {
+    throw cms::Exception("Configuration") << "CUDAService is disabled so CUDADeviceChooserProducer is unable to make decisions on which CUDA device to run. If you need to run without CUDA devices, please use CUDADeviceChooserFilter for conditional execution, or remove all CUDA modules from your configuration.";
+  }
+  produces<CUDAToken>();
+}
+
+void CUDADeviceChooserProducer::fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
+  edm::ParameterSetDescription desc;
+  descriptions.addWithDefaultLabel(desc);
+  descriptions.setComment("This EDProducer chooses on which CUDA device the chain of CUDA EDModules depending on it should run. The decision is communicated downstream with the 'CUDAToken' event product. It is an error if there are no CUDA devices, or CUDAService is disabled.");
+}
+
+std::unique_ptr<::DeviceCache> CUDADeviceChooserProducer::beginStream(edm::StreamID id) const {
+  auto ret = std::make_unique<::DeviceCache>();
+
+  edm::Service<CUDAService> cudaService;
+  if(!cudaService->enabled(id)) {
+    throw cms::Exception("LogicError") << "CUDA is disabled for EDM stream " << id << " in CUDAService, so CUDADeviceChooser is unable to decide the CUDA device for this EDM stream. If you need to dynamically decide whether a chain of CUDA EDModules is run or not, please use CUDADeviceChooserFilter instead.";
+  }
+  ret->device = cudacore::chooseCUDADevice(id);
+
+  LogDebug("CUDADeviceChooserProducer") << "EDM stream " << id << " set to CUDA device " << ret->device;
+
+  return ret;
+}
+
+void CUDADeviceChooserProducer::produce(edm::StreamID id, edm::Event& iEvent, const edm::EventSetup& iSetup) const {
+  auto ret = std::make_unique<CUDAToken>(streamCache(id)->device);
+  LogDebug("CUDADeviceChooserProducer") << "EDM stream " << id << " CUDA device " << ret->device() << " with CUDA stream " << ret->stream().id();
+  iEvent.put(std::move(ret));
+}
+
+
+DEFINE_FWK_MODULE(CUDADeviceChooserProducer);
diff --git a/HeterogeneousCore/CUDACore/plugins/CUDADeviceFilter.cc b/HeterogeneousCore/CUDACore/plugins/CUDADeviceFilter.cc
diff --git a/HeterogeneousCore/CUDACore/plugins/chooseCUDADevice.cc b/HeterogeneousCore/CUDACore/plugins/chooseCUDADevice.cc
@@ -0,0 +1,22 @@
+#include "chooseCUDADevice.h"
+#include "FWCore/ServiceRegistry/interface/Service.h"
+#include "HeterogeneousCore/CUDAServices/interface/CUDAService.h"
+
+namespace cudacore {
+  int chooseCUDADevice(edm::StreamID id) {
+    edm::Service<CUDAService> cudaService;
+
+    // For startes we "statically" assign the device based on
+    // edm::Stream number. This is suboptimal if the number of
+    // edm::Streams is not a multiple of the number of CUDA devices
+    // (and even then there is no load balancing).
+    //
+    // TODO: improve. Possible ideas include
+    // - allocate M (< N(edm::Streams)) buffers per device per "chain of modules", choose dynamically which (buffer, device) to use
+    // - our own CUDA memory allocator
+    //   * being able to cheaply allocate+deallocate scratch memory allows to make the execution fully dynamic e.g. based on current load
+    //   * would probably still need some buffer space/device to hold e.g. conditions data
+    //     - for conditions, how to handle multiple lumis per job?
+    return id % cudaService->numberOfDevices();
+  }
+}
diff --git a/HeterogeneousCore/CUDACore/plugins/chooseCUDADevice.h b/HeterogeneousCore/CUDACore/plugins/chooseCUDADevice.h
@@ -0,0 +1,10 @@
+#ifndef HeterogeneousCore_CUDACore_chooseCUDADevice_h
+#define HeterogeneousCore_CUDACore_chooseCUDADevice_h
+
+#include "FWCore/Utilities/interface/StreamID.h"
+
+namespace cudacore {
+  int chooseCUDADevice(edm::StreamID id);
+}
+
+#endif