Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeviceBuffer::drop force always succeed ? #48

Open
zeroexcuses opened this issue May 17, 2020 · 4 comments
Open

DeviceBuffer::drop force always succeed ? #48

zeroexcuses opened this issue May 17, 2020 · 4 comments

Comments

@zeroexcuses
Copy link

Quoting: https://github.com/bheisler/RustaCUDA/blob/master/src/memory/device/device_buffer.rs#L132-L172

    /// Deallocating device memory can return errors from previous asynchronous work. This function
    /// destroys the given buffer and returns the error and the un-destroyed buffer on failure.
...
    pub fn drop(mut dev_buf: DeviceBuffer<T>) -> DropResult<DeviceBuffer<T>> {

The fact that drop can fail is slightly problematic as I can't figure out how to use it with RAII. In particular, when there are no more references to a DeviceBuffer, I want it to free the GPU memory. However, if this drop can fail, I can't guarantee that the GPU Memory is freed.

What is the right way to handle freeing a DeviceBuffer ? (Again, the fact that cuFree can fail is very surprising to me.)

@bheisler
Copy link
Owner

Yeah, CUDA allows the various deallocator functions to fail, so I had to expose that somehow. I don't much like this compromise, but it's the best I have.

If you want to be really sure about it, you'll have to call the DeviceBuffer::drop function manually in a loop, handling errors and trying again until it succeeds. I can't safely implement that in RustaCUDA because I wouldn't know what the application wants to do with the errors.

You can just let things fall out of scope - those types implement Drop and will attempt to clean up after themselves, but only once. If Drop::drop fails, then they panic. This has been sufficient for my needs (I've never actually seen the deallocators fail in practice), but I can understand that if you're relying on this in production that would be undesirable.

Sorry, I don't think there's much I can do here.

@LutzCle
Copy link
Contributor

LutzCle commented May 26, 2020

Drop fails frequently during development. If you do illegal memory accesses inside your GPU kernel, you will get something like this:

thread 'main' panicked at 'Failed to deallocate CUDA page-locked memory.: LaunchFailed', $HOME/.cargo/git/checkouts/rustacuda-84d6f0ef4d2f3ecc/cc20ddc/src/memory/locked.rs:263:17

The proper solution would be to use Rust instead of CUDA on the GPU for memory safety ;-)

On a more serious note, this isn't just really a CUDA-specific problem. munmap can fail, too, and that is a POSIX API. Perhaps there exists a good Rust-y solution for munmap that RustaCUDA could borrow?

@thecog19
Copy link

👋 We've been able to consistently reproduce this locally, in fact, our code does this regularly. We'd love to help fix this behavior in a way that makes sense, and not deal with a big honking panic in the middle of our code. Unfortunately our knowledge of cuda/nvidia/gpus is pretty weak. @HiggstonRainbird is going to post an example that we've been using to reproduce, but it may be hardware specific. Any pointers about how we can begin to attempt to modify this behavior in a way that will let us either dealocate the memory or move on without breaking anything?

@HiggstonRainbird
Copy link

The consistent reproduction of this issue @thecog19 described can be found here: buffer_drop.zip. We have been attempting to use the NvFBC crate to capture the GPU framebuffer, and then use rustacuda to perform operations on that framebuffer without the information leaving the GPU.

However, despite the framebuffer capture itself working (including successful transfers of the framebuffer to RAM and to disk), it seems that trying to pass the pointer to the framebuffer into a rustacuda DeviceBuffer fails somehow, and CUDA is unable to drop the device buffer's memory once we're done with it.

use std::error::Error;

use nvfbc::{cuda::*, BufferFormat};
use rustacuda::{context::{Context, ContextFlags}, device::Device, memory::DeviceBuffer, CudaFlags};
use rustacuda_core::DevicePointer;

fn main() -> Result<(), Box<dyn Error>> {
	rustacuda::init(CudaFlags::empty())?;
	let device = Device::get_device(0)?;
	let _context = Context::create_and_push(
		ContextFlags::MAP_HOST | ContextFlags::SCHED_AUTO, device)?;

	let mut capturer = CudaCapturer::new()?;

	let status = capturer.status()?;
	if !status.can_create_now {
		panic!("Can't create a CUDA capture session.");
	}

	capturer.start(BufferFormat::Rgb, 30)?;
	let frame_info = capturer.next_frame(CaptureMethod::NoWaitIfNewFrame)?;

    let pointer = frame_info.device_buffer as *mut u64;
	let device_buffer = unsafe { DeviceBuffer::from_raw_parts(
		DevicePointer::wrap(pointer),
		frame_info.device_buffer_len as usize,
	) };

	let _result = match DeviceBuffer::drop(device_buffer) {
		Ok(()) => println!("Device_buffer successfully destroyed"),
		Err((e, _buf)) => {
			println!("Failed to destroy device_buffer: {:?}", e);
		},
	};

    Ok(())
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants