Skip to content

Commit

Permalink
[SYCL][Graph] Implement dynamic command-groups
Browse files Browse the repository at this point in the history
Implement Dynamic Command-Group feature specified in
PR [[SYCL][Graph] Add specification for kernel binary updates](intel#14896)

This feature enables updating `ur_kernel_handle_t` objects in graph nodes
between executions as well as parameters and execution range of nodes.

This functionality is currently supported on CUDA & HIP which are used
for testing in the new E2E tests. Level Zero support will follow
shortly, resulting in the removal of the `XFAIL` labels from the E2E
tests.

The code for adding nodes to a graph has been refactored to split out
verification of edges, and marking memory objects used in a node, as
separate helper functions. This allows path for adding a command-group
node to do this functions over each CG in the list before creating the
node itself.

The `dynamic_parameter_impl` code has also been refactored so the code
is shared for updating a dynamic parameter used in both a regular kernel
node and a dynamic command-group node.

See the addition to the design doc for further details on the
implementation.
  • Loading branch information
EwanC committed Oct 22, 2024
1 parent c6001ee commit b8bbdd6
Show file tree
Hide file tree
Showing 33 changed files with 2,332 additions and 253 deletions.
24 changes: 24 additions & 0 deletions sycl/doc/design/CommandGraph.md
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,30 @@ requirements for these new accessors to correctly trigger allocations before
updating. This is similar to how individual graph commands are enqueued when
accessors are used in a graph node.

### Dynamic Command-Group

To implement the `dynamic_command_group` class for updating the command-groups (CG)
associated with nodes, the CG member of the node implementation class changes
from a `std::unique_ptr` to a `std::shared_ptr` so that multiple nodes and the
`dynamic_command_group_impl` object can share the same CG object. This avoids
the overhead of having to allocate and free copies of the CG when a new active
CG is selected.

The `dynamic_command_group_impl` class contains weak pointers to the nodes which
have been created with it, so that when a new active CG is selected it can
propagate the change to those nodes. The `node_impl` class also contains a
reference to the dynamic command-group that created it, so that when the graph
is finalized each node can use the list of kernels in its dynamic command-group
as part of the `urCommandBufferAppendKernelLaunchExp` call to pass the possible
alternative kernels.

The `sycl::detail::CGExecKernel` class has been added to, so that if the
object was created from an element in the dynamic command-group list, the class
stores a vector of weak pointers to the other alternative command-groups created
from the same dynamic command-group object. This allows the DPC++ scheduler to
access the list of alternative kernels when calling the UR API to append a
kernel command to a command-buffer.

## Optimizations
### Interactions with Profiling

Expand Down
24 changes: 24 additions & 0 deletions sycl/include/sycl/ext/oneapi/experimental/graph.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ class node_impl;
class graph_impl;
class exec_graph_impl;
class dynamic_parameter_impl;
class dynamic_command_group_impl;
} // namespace detail

enum class node_type {
Expand Down Expand Up @@ -216,6 +217,23 @@ class depends_on_all_leaves : public ::sycl::detail::DataLessProperty<
} // namespace node
} // namespace property

class __SYCL_EXPORT dynamic_command_group {
public:
dynamic_command_group(
const command_graph<graph_state::modifiable> &Graph,
const std::vector<std::function<void(handler &)>> &CGFList);

size_t get_active_cgf() const;
void set_active_cgf(size_t Index);

private:
template <class Obj>
friend const decltype(Obj::impl) &
sycl::detail::getSyclObjImpl(const Obj &SyclObject);

std::shared_ptr<detail::dynamic_command_group_impl> impl;
};

namespace detail {
// Templateless modifiable command-graph base class.
class __SYCL_EXPORT modifiable_command_graph {
Expand Down Expand Up @@ -337,6 +355,12 @@ class __SYCL_EXPORT modifiable_command_graph {
modifiable_command_graph(const std::shared_ptr<detail::graph_impl> &Impl)
: impl(Impl) {}

/// Template-less implementation of add() for dynamic command-group nodes.
/// @param DynCGF Dynamic Command-group function object to add.
/// @param Dep List of predecessor nodes.
/// @return Node added to the graph.
node addImpl(dynamic_command_group &DynCGF, const std::vector<node> &Dep);

/// Template-less implementation of add() for CGF nodes.
/// @param CGF Command-group function to add.
/// @param Dep List of predecessor nodes.
Expand Down
25 changes: 25 additions & 0 deletions sycl/include/sycl/handler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1165,7 +1165,9 @@ class __SYCL_EXPORT handler {
StoreLambda<KName, decltype(Wrapper), Dims, TransformedArgType>(
std::move(Wrapper));
setType(detail::CGType::Kernel);
#ifndef __INTEL_PREVIEW_BREAKING_CHANGES
setNDRangeUsed(false);
#endif
#endif
} else
#endif // !__SYCL_DISABLE_PARALLEL_FOR_RANGE_ROUNDING__ &&
Expand All @@ -1188,8 +1190,10 @@ class __SYCL_EXPORT handler {
StoreLambda<NameT, KernelType, Dims, TransformedArgType>(
std::move(KernelFunc));
setType(detail::CGType::Kernel);
#ifndef __INTEL_PREVIEW_BREAKING_CHANGES
setNDRangeUsed(false);
#endif
#endif
#else
(void)KernelFunc;
#endif // __SYCL_FORCE_PARALLEL_FOR_RANGE_ROUNDING__
Expand Down Expand Up @@ -1239,7 +1243,9 @@ class __SYCL_EXPORT handler {
StoreLambda<NameT, KernelType, Dims, TransformedArgType>(
std::move(KernelFunc));
setType(detail::CGType::Kernel);
#ifndef __INTEL_PREVIEW_BREAKING_CHANGES
setNDRangeUsed(true);
#endif
#endif
}

Expand All @@ -1262,7 +1268,9 @@ class __SYCL_EXPORT handler {
setNDRangeDescriptor(std::move(NumWorkItems));
processLaunchProperties<PropertiesT>(Props);
setType(detail::CGType::Kernel);
#ifndef __INTEL_PREVIEW_BREAKING_CHANGES
setNDRangeUsed(false);
#endif
extractArgsAndReqs();
MKernelName = getKernelName();
#endif
Expand All @@ -1288,7 +1296,9 @@ class __SYCL_EXPORT handler {
setNDRangeDescriptor(std::move(NDRange));
processLaunchProperties(Props);
setType(detail::CGType::Kernel);
#ifndef __INTEL_PREVIEW_BREAKING_CHANGES
setNDRangeUsed(true);
#endif
extractArgsAndReqs();
MKernelName = getKernelName();
#endif
Expand Down Expand Up @@ -1329,7 +1339,9 @@ class __SYCL_EXPORT handler {
setNDRangeDescriptor(NumWorkGroups, /*SetNumWorkGroups=*/true);
StoreLambda<NameT, KernelType, Dims, LambdaArgType>(std::move(KernelFunc));
setType(detail::CGType::Kernel);
#ifndef __INTEL_PREVIEW_BREAKING_CHANGES
setNDRangeUsed(false);
#endif
#endif // __SYCL_DEVICE_ONLY__
}

Expand Down Expand Up @@ -1954,7 +1966,9 @@ class __SYCL_EXPORT handler {
StoreLambda<NameT, KernelType, Dims, TransformedArgType>(
std::move(KernelFunc));
setType(detail::CGType::Kernel);
#ifndef __INTEL_PREVIEW_BREAKING_CHANGES
setNDRangeUsed(false);
#endif
#endif
}

Expand Down Expand Up @@ -2052,7 +2066,9 @@ class __SYCL_EXPORT handler {
detail::checkValueRange<Dims>(NumWorkItems, WorkItemOffset);
setNDRangeDescriptor(std::move(NumWorkItems), std::move(WorkItemOffset));
setType(detail::CGType::Kernel);
#ifndef __INTEL_PREVIEW_BREAKING_CHANGES
setNDRangeUsed(false);
#endif
extractArgsAndReqs();
MKernelName = getKernelName();
#endif
Expand Down Expand Up @@ -2131,7 +2147,9 @@ class __SYCL_EXPORT handler {
setNDRangeDescriptor(std::move(NumWorkItems));
MKernel = detail::getSyclObjImpl(std::move(Kernel));
setType(detail::CGType::Kernel);
#ifndef __INTEL_PREVIEW_BREAKING_CHANGES
setNDRangeUsed(false);
#endif
if (!lambdaAndKernelHaveEqualName<NameT>()) {
extractArgsAndReqs();
MKernelName = getKernelName();
Expand Down Expand Up @@ -2172,7 +2190,9 @@ class __SYCL_EXPORT handler {
setNDRangeDescriptor(std::move(NumWorkItems), std::move(WorkItemOffset));
MKernel = detail::getSyclObjImpl(std::move(Kernel));
setType(detail::CGType::Kernel);
#ifndef __INTEL_PREVIEW_BREAKING_CHANGES
setNDRangeUsed(false);
#endif
if (!lambdaAndKernelHaveEqualName<NameT>()) {
extractArgsAndReqs();
MKernelName = getKernelName();
Expand Down Expand Up @@ -2212,7 +2232,9 @@ class __SYCL_EXPORT handler {
setNDRangeDescriptor(std::move(NDRange));
MKernel = detail::getSyclObjImpl(std::move(Kernel));
setType(detail::CGType::Kernel);
#ifndef __INTEL_PREVIEW_BREAKING_CHANGES
setNDRangeUsed(true);
#endif
if (!lambdaAndKernelHaveEqualName<NameT>()) {
extractArgsAndReqs();
MKernelName = getKernelName();
Expand Down Expand Up @@ -3341,6 +3363,7 @@ class __SYCL_EXPORT handler {
size_t Size, bool Block = false);
friend class ext::oneapi::experimental::detail::graph_impl;
friend class ext::oneapi::experimental::detail::dynamic_parameter_impl;
friend class ext::oneapi::experimental::detail::dynamic_command_group_impl;

bool DisableRangeRounding();

Expand Down Expand Up @@ -3604,8 +3627,10 @@ class __SYCL_EXPORT handler {
}
#endif

#ifndef __INTEL_PREVIEW_BREAKING_CHANGES
// Set that an ND Range was used during a call to parallel_for
void setNDRangeUsed(bool Value);
#endif

inline void internalProfilingTagImpl() {
throwIfActionIsCreated();
Expand Down
5 changes: 4 additions & 1 deletion sycl/source/detail/cg.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,9 @@ class CGExecKernel : public CG {
std::string MKernelName;
std::vector<std::shared_ptr<detail::stream_impl>> MStreams;
std::vector<std::shared_ptr<const void>> MAuxiliaryResources;
/// Used to implement ext_oneapi_graph dynamic_command_group. Stores the list
/// of command-groups that a kernel command can be updated to.
std::vector<std::weak_ptr<CGExecKernel>> MAlternativeKernels;
ur_kernel_cache_config_t MKernelCacheConfig;
bool MKernelIsCooperative = false;
bool MKernelUsesClusterLaunch = false;
Expand All @@ -277,7 +280,7 @@ class CGExecKernel : public CG {
MKernelBundle(std::move(KernelBundle)), MArgs(std::move(Args)),
MKernelName(std::move(KernelName)), MStreams(std::move(Streams)),
MAuxiliaryResources(std::move(AuxiliaryResources)),
MKernelCacheConfig(std::move(KernelCacheConfig)),
MAlternativeKernels{}, MKernelCacheConfig(std::move(KernelCacheConfig)),
MKernelIsCooperative(KernelIsCooperative),
MKernelUsesClusterLaunch(MKernelUsesClusterLaunch) {
assert(getType() == CGType::Kernel && "Wrong type of exec kernel CG.");
Expand Down
Loading

0 comments on commit b8bbdd6

Please sign in to comment.