refactor memory ownership svm #6073

mfoerste4 · 2024-09-20T11:14:57Z

As discussed in #6057 this is a PR to refactor the raw memory allocations in the svm model which are passed back to the caller.

I have removed all raw allocation pointers from the SVM model and replaced them by device_buffers. Since ownership is more restrictive now we do have additional data copies when:

passing a model into the C-API
retrieving a model via the C-API
un-pickling/deserializing a model in python (removed copy with latest commit)

CC @tfeher, @divyegala

divyegala

I see that the C and Python APIs need a copy and ultimately, this is because rmm::device_buffer in this structure as well as the model is owned by the struct/class. Instead, if we use std::weak_ptr<rmm::device_buffer>> or plain rmm::device_buffer& then we can avoid copies and ownership issues by letting the C and Python API own the data but still allowing the C++ API to resize it.

The C API can also release the shared_ptr/device_buffer to transfer ownership to the user, thus removing the need for a copy there.

The Python API is owned by us and so, we can create shared_ptr/device_buffer in the Cython layer and store them as class members.

cpp/src/svm/results.cuh

cpp/src/svm/smosolver.cuh

divyegala · 2024-09-23T23:29:15Z

cpp/src/svm/svc_impl.cuh

    rmm::device_uvector<math_t> unique_labels(0, stream);
    model.n_classes = raft::label::getUniquelabels(unique_labels, labels, n_rows, stream);
-    rmm::device_async_resource_ref rmm_alloc = rmm::mr::get_current_device_resource();
-    model.unique_labels                      = (math_t*)rmm_alloc.allocate_async(
-      model.n_classes * sizeof(math_t), rmm::CUDA_ALLOCATION_ALIGNMENT, stream);
-    raft::copy(model.unique_labels, unique_labels.data(), model.n_classes, stream);
+    model.unique_labels.resize(model.n_classes * sizeof(math_t), stream);
+    raft::copy((math_t*)model.unique_labels.data(), unique_labels.data(), model.n_classes, stream);


Is unique_labels being re-used elsewhere or can this copy be avoided by directly passing modeul.unique_labels to getUniquelabels?

No, it is only used locally. However, I cannot pass a plain device_buffer into raft::label::getUniquelabels as it only takes device_uvector.

You can use unique_labels.release() to get an r-value device_buffer and avoid the copy https://github.com/rapidsai/rmm/blob/b51447393c523cc929608d84850c70a3eae08af3/include/rmm/device_uvector.hpp#L414

divyegala · 2024-09-23T23:30:05Z

cpp/src/svm/svc_impl.cuh


  // Unfortunately we need runtime support for both types
  raft::device_matrix_view<math_t, int, raft::layout_stride> dense_support_matrix_view;
  if (is_dense_support) {
    dense_support_matrix_view =
      raft::make_device_strided_matrix_view<math_t, int, raft::layout_f_contiguous>(
-        model.support_matrix.data, model.n_support, n_cols, 0);
+        (math_t*)model.support_matrix.data.data(), model.n_support, n_cols, 0);


reinterpret_cast

I have switched to reinterpret_cast in many places now. Here in this context I cannot easily switch as the model is const. To my understanding resolving this here would require subsequent const_casts or worse.

divyegala · 2024-09-23T23:30:10Z

cpp/src/svm/svc_impl.cuh

-                                                                   n_cols,
-                                                                   model.support_matrix.nnz)
+      ? raft::make_device_compressed_structure_view<int, int, int>(
+          (int*)model.support_matrix.indptr.data(),


reinterpret_cast

Same as above

divyegala · 2024-09-23T23:30:19Z

cpp/src/svm/svc_impl.cuh

-      ? raft::make_device_csr_matrix_view<math_t, int, int, int>(model.support_matrix.data,
-                                                                 csr_structure_view)
+      ? raft::make_device_csr_matrix_view<math_t, int, int, int>(
+          (math_t*)model.support_matrix.data.data(), csr_structure_view)


reinterpret_cast

Same as above

divyegala · 2024-09-23T23:30:26Z

cpp/src/svm/svc_impl.cuh

@@ -278,7 +277,7 @@ void svcPredictX(const raft::handle_t& handle,
                       &one,
                       K.data(),
                       transpose_kernel ? model.n_support : n_batch,
-                       model.dual_coefs,
+                       (math_t*)model.dual_coefs.data(),


reinterpret_cast

Same as above

divyegala · 2024-09-23T23:30:35Z

cpp/src/svm/svc_impl.cuh

@@ -287,7 +286,7 @@ void svcPredictX(const raft::handle_t& handle,

  }  // end of loop

-  math_t* labels = model.unique_labels;
+  math_t* labels = (math_t*)model.unique_labels.data();


reinterpret_cast

Same as above

divyegala · 2024-09-23T23:33:00Z

cpp/src/svm/svc_impl.cuh

-  }
+  cudaStream_t stream = handle.get_stream();
+
+  // Note that the underlying allocations are not *freed* but rather reset


I see that this is the case with resize(). Why not deallocate_async() instead to free memory?

deallocate_async() is a private method. I added additional calls to shrink_to_fit() to ensure memory is freed right away.

cpp/src/svm/svm_api.cpp

mfoerste4 · 2024-09-25T14:43:42Z

Thanks @divyegala for the rewiew. I have adressed or commented on your immediate suggestions.

I see that the C and Python APIs need a copy and ultimately, this is because rmm::device_buffer in this structure as well as the model is owned by the struct/class. Instead, if we use std::weak_ptr<rmm::device_buffer>> or plain rmm::device_buffer& then we can avoid copies and ownership issues by letting the C and Python API own the data but still allowing the C++ API to resize it.

The C API can also release the shared_ptr/device_buffer to transfer ownership to the user, thus removing the need for a copy there.

The Python API is owned by us and so, we can create shared_ptr/device_buffer in the Cython layer and store them as class members.

I don't quite understand how this can be accomplished with the given device_buffer implementation. Even if we pass the storages as unique pointers - we then can pass ownership of the whole device-buffer, but we will not be able to pass in/out actual memory allocations within the device_buffer.

For the python API we retrieve data as CumlArray and want to pass the ptr into the device_buffer. For the C-API we have fixed pointer style I/O where cannot simply switch to device_buffer*. IIUC we would need a class like the device_buffer but with additional capabilities to switch in between owning and not-owning.

divyegala · 2024-09-25T19:03:04Z

@mfoerste4

Let me write an example for Python API and how to handle ownership, assuming the C++ API does not own the memory and instead holds onto references of the form rmm::device_buffer&. As for the C API, I think you are right and there's no way for us to release the ownership of memory, but at least we can avoid copies in the Python world:

Python API:
As mentioned in the original issue, instead of CumlArray we can use DeviceBuffer from RMM Python

.pxd
.pyx

As seen in the .pxd definition, the Python DeviceBuffer class holds an underlying reference to the C++ object device_buffer in the form of the property unique_ptr<device_buffer> c_obj. Thus, the code looks like:

class SVM:
def __init__(self):
  self.some_array = rmm.DeviceBuffer(...) # instead of CumlArray
  
def some_func(self):
  cpp_func(deref(some_array.c_obj.get()), ...)

…deserialization

mfoerste4 · 2024-10-02T20:39:46Z

@divyegala , thanks for your suggestions. I have refactored the SvmModel to work with device_buffer* in order to paste in the buffers after (de-)serialization without a copy.

divyegala · 2024-10-08T00:15:16Z

cpp/src/svm/svc_impl.cuh

+    model.unique_labels->resize(model.n_classes * sizeof(math_t), stream);
+    raft::copy((math_t*)model.unique_labels->data(), unique_labels.data(), model.n_classes, stream);


Reference: https://github.com/rapidsai/rmm/blob/c494395e58288cac16321ce90e9b15f3508ae89a/include/rmm/device_uvector.hpp#L414

Suggested change

model.unique_labels->resize(model.n_classes * sizeof(math_t), stream);

raft::copy((math_t*)model.unique_labels->data(), unique_labels.data(), model.n_classes, stream);

model.unique_labels = unique_labels.release()

model.unique_labels is just a ptr to the buffer (e.g. owned by the python code), so nobody would be responsible to dealloc whatever we extract from the device_uvector.

divyegala · 2024-10-08T00:19:13Z

cpp/src/svm/svm_api.cpp

+        *dual_coefs = (double*)rmm_alloc.allocate_async(
+          model.dual_coefs->size(), rmm::CUDA_ALLOCATION_ALIGNMENT, stream);


As mentioned in #6057 by @harrism , these raw calls to allocate need to go away. Should we change this API to assume that the user is passing in pre-allocated memory?

The user does not know the size of the arrays in advance, so we would need to change the C-API to rely on resizable objects (just like the C++ API utilizes the device_buffer).

Hmmm that is a tricky situation. @harrism any thoughts?

Use a device_buffer?

Oh, this is a C API?! My solution would be to not have a C API! (Why is a C API in a .cpp file?)

C API source code is written in .cpp and we just export the API in .h header via extern C

harrism · 2024-10-09T22:20:43Z

python/cuml/cuml/svm/svc.pyx

-        int* indptr
-        int* indices
-        math_t* data
+        device_buffer* indptr


Why are these raw pointers? Why doesn't the class own the buffers?

For that matter, why doesn't the class have a constructor and a destructor (especially if it contains raw pointers!)

draft for dual_coefs

1a4733e

github-actions bot added Cython / Python Cython or Python issue CUDA/C++ labels Sep 20, 2024

mfoerste4 added 2 commits September 23, 2024 19:44

utilize device_buffer for all raw arrays in svm model

3390ceb

remove C++ dealloc management from python

610bdeb

mfoerste4 self-assigned this Sep 23, 2024

mfoerste4 added Tech Debt Issues related to debt improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 23, 2024

mfoerste4 marked this pull request as ready for review September 23, 2024 20:16

mfoerste4 requested review from a team as code owners September 23, 2024 20:16

mfoerste4 changed the title ~~[DRAFT] refactor memory ownership svm~~ refactor memory ownership svm Sep 23, 2024

dantegd requested a review from divyegala September 23, 2024 21:54

divyegala requested changes Sep 23, 2024

View reviewed changes

review suggestions

6fc3eaf

mfoerste4 requested a review from divyegala September 25, 2024 14:43

change SvmModel data back to pointers in order to prevent copy after …

ddb8238

…deserialization

divyegala reviewed Oct 8, 2024

View reviewed changes

harrism reviewed Oct 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor memory ownership svm #6073

refactor memory ownership svm #6073

mfoerste4 commented Sep 20, 2024 •

edited

Loading

divyegala left a comment •

edited

Loading

divyegala Sep 23, 2024

mfoerste4 Sep 25, 2024

divyegala Sep 27, 2024

divyegala Sep 23, 2024

mfoerste4 Sep 25, 2024 •

edited

Loading

divyegala Sep 23, 2024

mfoerste4 Sep 25, 2024

divyegala Sep 23, 2024

mfoerste4 Sep 25, 2024

divyegala Sep 23, 2024

mfoerste4 Sep 25, 2024

divyegala Sep 23, 2024

mfoerste4 Sep 25, 2024

divyegala Sep 23, 2024

mfoerste4 Sep 25, 2024

mfoerste4 commented Sep 25, 2024

divyegala commented Sep 25, 2024

mfoerste4 commented Oct 2, 2024

divyegala Oct 8, 2024

mfoerste4 Oct 8, 2024

divyegala Oct 8, 2024

divyegala Oct 8, 2024

mfoerste4 Oct 8, 2024

divyegala Oct 8, 2024

harrism Oct 9, 2024

harrism Oct 9, 2024 •

edited

Loading

divyegala Oct 10, 2024

harrism Oct 9, 2024

		model.unique_labels->resize(model.n_classes * sizeof(math_t), stream);
		raft::copy((math_t*)model.unique_labels->data(), unique_labels.data(), model.n_classes, stream);

	model.unique_labels->resize(model.n_classes * sizeof(math_t), stream);
	raft::copy((math_t*)model.unique_labels->data(), unique_labels.data(), model.n_classes, stream);
	model.unique_labels = unique_labels.release()

		dual_coefs = (double)rmm_alloc.allocate_async(
		model.dual_coefs->size(), rmm::CUDA_ALLOCATION_ALIGNMENT, stream);

refactor memory ownership svm #6073

Are you sure you want to change the base?

refactor memory ownership svm #6073

Conversation

mfoerste4 commented Sep 20, 2024 • edited Loading

divyegala left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfoerste4 Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfoerste4 commented Sep 25, 2024

divyegala commented Sep 25, 2024

mfoerste4 commented Oct 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harrism Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfoerste4 commented Sep 20, 2024 •

edited

Loading

divyegala left a comment •

edited

Loading

mfoerste4 Sep 25, 2024 •

edited

Loading

harrism Oct 9, 2024 •

edited

Loading