Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove make_managed_from_factory #291

Merged
merged 4 commits into from
Dec 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion RELEASE_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ The format of this file is based on [Keep a Changelog](http://keepachangelog.com
### Removed
- Removes deprecated ManagedArray::getPointer method. Use ManagedArray::data instead.
- Removes optional support for implicitly casting between raw pointers and ManagedArrays (CHAI\_ENABLE\_IMPLICIT\_CONVERSIONS). Use makeManagedArray and ManagedArray::data to perform explicit conversions instead.
- Removes equality and inequality comparison operators between ManagedArrays and raw pointers
- Removes equality and inequality comparison operators between ManagedArrays and raw pointers.
- Removes make\_managed\_from\_factory function for creating managed\_ptr objects from factory functions. This change will lead to safer adoption of allocators during construction and destruction of managed\_ptr objects.

## [Version 2024.07.0] - Release date 2024-07-26

Expand Down
218 changes: 25 additions & 193 deletions src/chai/managed_ptr.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -57,10 +57,10 @@ namespace chai {
///
/// This wrapper stores both host and device pointers so that polymorphism can be
/// used in both contexts with a single API.
/// The make_managed and make_managed_from_factory functions call new on both the
/// host and device so that polymorphism is valid in both contexts. Simply copying
/// the bits of an object to the device will not copy the vtable, so new must be
/// called on the device.
/// The make_managed function calls new on both the host and device so that
/// polymorphism is valid in both contexts. Simply copying the bits of an
/// object to the device will not copy the vtable, so new must be called
/// on the device.
///
/// Usage Requirements:
/// Methods that can be called on the host and/or device must be declared
Expand All @@ -72,28 +72,27 @@ namespace chai {
/// you must explicitly modify the object in both the host context and the
/// device context.
/// C-style array members of T need to be initialized correctly with a host or
/// device C-style array. If a ManagedArray is passed to the make_managed or
/// make_managed_from_factory methods in place of a C-style array, wrap it in
/// a call to chai::unpack to extract the C-style arrays contained within the
/// ManagedArray. This will pass the extracted host C-style array to the host
/// constructor and the extracted device C-style array to the device
/// constructor. If it is desired that these host and device C-style arrays be
/// kept in sync like the normal behavior of ManagedArray, define a callback
/// that maintains a copy of the ManagedArray and upon the ACTION_MOVE event
/// calls the copy constructor of that ManagedArray.
/// device C-style array. If a ManagedArray is passed to the make_managed
/// function in place of a C-style array, wrap it in a call to chai::unpack to
/// extract the C-style arrays contained within the ManagedArray. This will
/// pass the extracted host C-style array to the host constructor and the
/// extracted device C-style array to the device constructor. If it is desired
/// that these host and device C-style arrays be kept in sync like the normal
/// behavior of ManagedArray, define a callback that maintains a copy of the
/// ManagedArray and upon the ACTION_MOVE event calls the copy constructor of
/// that ManagedArray.
/// If a C-style array is passed to make_managed, accessing that member will be
/// valid only in the correct context. To prevent the accidental use of that
/// member in the wrong context, any methods that access it should be __host__
/// only or __device__ only. Special care should be taken when passing C-style
/// arrays as arguments to member functions.
/// The same restrictions for C-style array members also apply to raw pointer
/// members. If a managed_ptr is passed to the make_managed or
/// make_managed_from_factory methods in place of a raw pointer, wrap it in
/// a call to chai::unpack to extract the raw pointers contained within the
/// managed_ptr. This will pass the extracted host pointer to the host
/// constructor and the extracted device pointer to the device constructor.
/// If it is desired that these host and device pointers be kept in sync,
/// define a callback that maintains a copy of the managed_ptr and upon the
/// members. If a managed_ptr is passed to the make_managed function in place of
/// a raw pointer, wrap it in a call to chai::unpack to extract the raw pointers
/// contained within the managed_ptr. This will pass the extracted host pointer
/// to the host constructor and the extracted device pointer to the device
/// constructor. If it is desired that these host and device pointers be kept in
/// sync, define a callback that maintains a copy of the managed_ptr and upon the
/// ACTION_MOVE event call the copy constructor of that managed_ptr.
/// Again, if a raw pointer is passed to make_managed, accessing that member will
/// only be valid in the correct context. Take care when passing raw pointers
Expand All @@ -102,12 +101,12 @@ namespace chai {
/// turn off GPU error checking, pass -DCHAI_ENABLE_GPU_ERROR_CHECKING=OFF as
/// an argument to cmake when building CHAI. To turn on synchronization after
/// every kernel, set the appropriate environment variable (e.g. CUDA_LAUNCH_BLOCKING or HIP_LAUNCH_BLOCKING).
/// Alternatively, call cudaDeviceSynchronize() after any call to make_managed,
/// make_managed_from_factory, or managed_ptr::free, and check the return code
/// for errors. If your code crashes in the constructor/destructor of T, then
/// it is recommended to turn on this synchronization. For example, the
/// constructor of T might run out of per-thread stack space on the GPU. If
/// that happens, you can increase the device limit of per-thread stack space.
/// Alternatively, call cudaDeviceSynchronize() after any call to make_managed
/// or managed_ptr::free, and check the return code for errors. If your code
/// crashes in the constructor/destructor of T, then it is recommended to turn
/// on this synchronization for debugging. For example, the constructor of T
/// might run out of per-thread stack space on the GPU. If that happens, you
/// can increase the device limit of per-thread stack space.
///
template <typename T>
class managed_ptr {
Expand Down Expand Up @@ -804,27 +803,6 @@ namespace chai {
*gpuPointer = new T(processArguments(args)...);
}

///
/// @author Alan Dayton
///
/// Creates a new object on the device by calling the given factory method.
///
/// @param[out] gpuPointer Used to return the device pointer to the new object
/// @param[in] f The factory method (must be a __device__ or __host__ __device__
/// method
/// @param[in] args The arguments to the factory method
///
/// @note Cannot capture argument packs in an extended device lambda,
/// so explicit kernel is needed.
///
template <typename T,
typename F,
typename... Args>
CHAI_GLOBAL void make_on_device_from_factory(T** gpuPointer, F f, Args... args)
{
*gpuPointer = f(processArguments(args)...);
}

///
/// @author Alan Dayton
///
Expand Down Expand Up @@ -933,44 +911,6 @@ namespace chai {
return cpuPointer;
}

///
/// @author Alan Dayton
///
/// Calls a factory method to create a new object on the host.
/// Sets the execution space to the CPU so that ManagedArrays and managed_ptrs
/// are moved to the host as necessary.
///
/// @param[in] f The factory method
/// @param[in] args The arguments to the factory method
///
/// @return The host pointer to the new object
///
template <typename T,
typename F,
typename... Args>
CHAI_HOST T* make_on_host_from_factory(F f, Args&&... args) {
#if !defined(CHAI_DISABLE_RM)
// Get the ArrayManager and save the current execution space
chai::ArrayManager* arrayManager = chai::ArrayManager::getInstance();
ExecutionSpace currentSpace = arrayManager->getExecutionSpace();

// Set the execution space so that ManagedArrays and managed_ptrs
// are handled properly
arrayManager->setExecutionSpace(CPU);
#endif

// Create the object on the device
T* cpuPointer = f(args...);

#if !defined(CHAI_DISABLE_RM)
// Set the execution space back to the previous value
arrayManager->setExecutionSpace(currentSpace);
#endif

// Return the GPU pointer
return cpuPointer;
}

///
/// @author Alan Dayton
///
Expand Down Expand Up @@ -1035,67 +975,6 @@ namespace chai {
free(cpuBuffer);
gpuFree(gpuBuffer);

#if !defined(CHAI_DISABLE_RM)
// Set the execution space back to the previous value
arrayManager->setExecutionSpace(currentSpace);
#endif

// Return the GPU pointer
return gpuPointer;
}

///
/// @author Alan Dayton
///
/// Calls a factory method to create a new object on the device.
///
/// @param[in] f The factory method
/// @param[in] args The arguments to the factory method
///
/// @return The device pointer to the new object
///
template <typename T,
typename F,
typename... Args>
CHAI_HOST T* make_on_device_from_factory(F f, Args&&... args) {
#if !defined(CHAI_DISABLE_RM)
// Get the ArrayManager and save the current execution space
chai::ArrayManager* arrayManager = chai::ArrayManager::getInstance();
ExecutionSpace currentSpace = arrayManager->getExecutionSpace();
#if defined(CHAI_ENABLE_GPU_SIMULATION_MODE)
arrayManager->setGPUSimMode(true);
#endif

// Set the execution space so that chai::ManagedArrays and
// chai::managed_ptrs are handled properly
arrayManager->setExecutionSpace(GPU);
#endif

// Allocate space on the GPU to hold the pointer to the new object
T** gpuBuffer;
gpuMalloc((void**)(&gpuBuffer), sizeof(T*));

// Create the object on the device
#if defined(CHAI_ENABLE_GPU_SIMULATION_MODE)
detail::make_on_device_from_factory(gpuBuffer, f, args...);
arrayManager->setGPUSimMode(false);
#elif defined(__CUDACC__) && defined(CHAI_ENABLE_MANAGED_PTR_ON_GPU)
detail::make_on_device_from_factory<T><<<1, 1>>>(gpuBuffer, f, args...);
#elif defined(__HIPCC__) && defined(CHAI_ENABLE_MANAGED_PTR_ON_GPU)
hipLaunchKernelGGL(detail::make_on_device_from_factory, 1, 1, 0, 0, gpuBuffer, f, args...);
#endif

// Allocate space on the CPU for the pointer and copy the pointer to the CPU
T** cpuBuffer = (T**) malloc(sizeof(T*));
gpuMemcpy(cpuBuffer, gpuBuffer, sizeof(T*), gpuMemcpyDeviceToHost);

// Get the GPU pointer
T* gpuPointer = cpuBuffer[0];

// Free the host and device buffers
free(cpuBuffer);
gpuFree(gpuBuffer);

#if !defined(CHAI_DISABLE_RM)
// Set the execution space back to the previous value
arrayManager->setExecutionSpace(currentSpace);
Expand Down Expand Up @@ -1155,53 +1034,6 @@ namespace chai {
#endif
}

///
/// @author Alan Dayton
///
/// Makes a managed_ptr<T>.
/// Factory function to create managed_ptrs.
///
/// @param[in] f The factory function that will create the object
/// @param[in] args The arguments to the factory function
///
template <typename T,
typename F,
typename... Args>
CHAI_HOST managed_ptr<T> make_managed_from_factory(F&& f, Args&&... args) {
static_assert(detail::is_invocable<F, Args...>::value,
"F is not invocable with the given arguments.");

static_assert(std::is_pointer<typename std::result_of<F(Args...)>::type>::value,
"F does not return a pointer.");

using R = typename std::remove_pointer<typename std::result_of<F(Args...)>::type>::type;

static_assert(std::is_convertible<R*, T*>::value,
"F does not return a pointer that is convertible to T*.");

#if (defined(CHAI_GPUCC) || defined(CHAI_ENABLE_GPU_SIMULATION_MODE)) && defined(CHAI_ENABLE_MANAGED_PTR_ON_GPU)
// Construct on the GPU first to take advantage of asynchrony
#if defined(CHAI_ENABLE_GPU_SIMULATION_MODE)
chai::ArrayManager* threadRM = chai::ArrayManager::getInstance();
threadRM->setGPUSimMode(true);
#endif
T* gpuPointer = make_on_device_from_factory<R>(f, args...);
#if defined(CHAI_ENABLE_GPU_SIMULATION_MODE)
threadRM->setGPUSimMode(false);
#endif
#endif

// Construct on the CPU
T* cpuPointer = make_on_host_from_factory<R>(f, args...);

// Construct and return the managed_ptr
#if defined(CHAI_GPUCC) && defined(CHAI_ENABLE_MANAGED_PTR_ON_GPU)
return managed_ptr<T>({CPU, GPU}, {cpuPointer, gpuPointer});
#else
return managed_ptr<T>({CPU}, {cpuPointer});
#endif
}

///
/// @author Alan Dayton
///
Expand Down
Loading