Way for Python wrapper over C++ class to borrow reference, released on garbage collection? #770

DannyWeitekamp · 2024-10-27T21:43:38Z

DannyWeitekamp
Oct 27, 2024

I'm wondering if I'm missing something because it seems that nanobind doesn't implement a very simple pattern of object ownership that I've seen in other Python <-> compiled code interfaces like numba.

I was hoping that I could use nanobind to have objects with intrusive reference counting that would just have their Python wrappers grab one reference (i.e. incr_ref()) when they are created, and release that reference (i.e. dec_ref()) when those Python objects are garbage collected.

It seems like most of the options in nanobind enforce a policy of releasing ownership to Python when they are wrapped. I think shared_ptr might be an exception, but I'd really rather avoid using shared_ptr (for performance reasons, and general distaste). It's unclear to me why releasing ownership to Python is encouraged as the default in many cases (particularly with intrusive reference counting). In my application, I have situations where objects are kept in containers. Any policy that transferred ownership of an object to Python would cause the following nasty failure mode:

c = MyContainer(initialize_with_n_objects=10)
o = c[5]
o = None # delete called for object at c[5]
c.print_all_objects() # Segfault: object @ c[5] was already freed

Is there any way to achieve the object ownership pattern that I am considering here to avoid this issue?

I have most of a hacky solution worked out that involves:

Finding the pointer to inst_dealloc() with a dummy object and writing it to a global variable (I can't directly use the actual implementation since it's not exposed anywhere in /include)
Using nb::slots to replace tp_dealloc with my own custom function that calls dec_ref() and then calls the global pointer to inst_dealloc().
Use nb::rv_policy::reference everywhere to prevent Python from ever directly triggering a delete

I'm stuck with trying to implement this workaround because I cannot figure out how to get a pointer to the wrapped C++ object from the PyObject* pointer that is passed into the new tp_dealloc implementation. I'm getting std::bad_alloc when I run:

typedef void (*dealloc_ty)(PyObject *self);
dealloc_ty nb_inst_dealloc = nullptr; // Filled in dynamically later

static void obj_dealloc(PyObject *self) {
    auto self_handle = nb::handle_t<Object>(self);
    Object* cpp_self = (Object*) nb::cast<Object*>(self_handle);
    cpp_self->dec_ref();

    if(nb_inst_dealloc){
        (*nb_inst_dealloc)(self);    
    }
}

Anyway, I am wondering if I am missing something essential here or if I really need to resort to a hack like this to achieve what I'm trying to do?

P.S. The documentation for intrusive reference counting is a bit misleading. It seems to imply that one could implement their own Object class and expect everything to work, for instance with ref<T>. I'm fairly certain that I really need to implement my own dec_ref() in my project because there are cases, where objects can be either written to buffers or malloced, in which case dec_ref() shouldn't always delete on zero (it might need to just decref a pointer to its buffer object instead). I ended up being able to get by by using my own implementation of ref<T> instead since the builtin one seems to assume that it is getting a subclass of intrusive_base (which the documentation describes as being an optional shortcut, not a requirement).

DannyWeitekamp · 2024-10-28T04:24:41Z

DannyWeitekamp
Oct 28, 2024
Author

After doing some digging in the docs it turns out I should be using nb::inst_ptr instead of nb::cast.

With this:

Object* cpp_self = (Object*) nb::inst_ptr<Object>(self_handle);

The hack seems to work as expected.

Really enjoying using nanobind so far by the way ❤️. Looks like a really awesome tool so far.

0 replies

wjakob · 2024-11-05T01:31:02Z

wjakob
Nov 5, 2024
Maintainer

What you are doing here seems quite unsafe/dangerous. Did you see the reference_internal policy? This is usually used to implement access to internal fields of an instance and works like reference with an extra lifetime dependency.

0 replies

DannyWeitekamp · 2024-11-05T23:53:04Z

DannyWeitekamp
Nov 5, 2024
Author

The approach I've described above is certainly non-standard (it is a hack after all), nonetheless it seems safer than what I have been able to achieve with nanobind's default options. When applied to situations common in my application nanobind's default options seem to either 1) consistently produce memory leaks, 2) consistently produce segfaults or 3) make copies when I really wanted references.

For context, I'm currently porting some code that had been written with numba to C++. The original numba code was tested pretty comprehensively in terms of memory leaks, and I'm hoping to achieve the same behavior with nanobind. Of course, I'd rather use standard nanobind approaches over my hack, if that can be accomplished without compromising, the safety, speed, and functionality of the program (there does already seem to be a considerable improvement in speed... so off to a good start).

My hack is a direct reimplementation of numba's memory management approach, where memory is always owned by intrusive reference counted objects allocated by the Numba Runtime (NRT) (i.e. by the compiled C/C++-like parts), and when an object is wrapped in a PyObject, that PyObject just borrows a single intrusive (NRT-side) reference to the allocated object which is released when the PyObject is garbage collected.

I was aware of reference_internal. However, I had not tried it extensively because it seemed to solve the reference counting problem only on the Python side but not guarantee memory safety on the C++ side. My understanding from the documentation based on the example on this page was that reference_internal makes the child PyObject borrow a (Python-side) reference to the parent PyObject (keeping it around).

struct MyClass {
public:
    MyField &field() { return m_field; }

private:
    MyField m_field;
};

nb::class_<MyClass>(m, "MyClass")
   .def("field", &MyClass::field, nb::rv_policy::reference_internal);

My impression is that reference_internal is designed for cases like the above where there is a strict parent-child relationship, where the parent owns the child as part of its struct, or has the ONLY reference to the child on the C++-side. It doesn't seem appropriate for handling situations where there are objects on the C++-side that are never exposed to Python (but are hold references to or are referenced by things that are exposed to Python)---or otherwise need to stay alive even though their PyObjects have been freed.

Consider the following example, and imagine that Dog and Kennel really need to be released to Python with some variant of reference for any of the following reasons:

We don't want to make copies for performance reasons.
We don't want to make copies because we need to edit them and have that change reflected in the original.
They are allocated in a non-standard way (i.e. within an Allocator) and thus it is unsafe to directly call delete on them
They hold references to other stuff that has been released to Python

#include <iostream>
#include <string>
#include <atomic>
#include <memory> 

// Note: Need to be included in this order
#include <nanobind/nanobind.h>
#include <nanobind/intrusive/counter.h>
#include <nanobind/intrusive/ref.h>
#include <nanobind/intrusive/counter.inl>
#include <nanobind/stl/string.h>

namespace nb = nanobind;
using namespace nb::literals;
using nb::ref;
using nb::intrusive_base;

int total_allocated_objs = 0 ;

class Object : public intrusive_base {
public:
    uint64_t get_refcount() {
        // Needed to edit intrusive_base to expose this
        return intrusive_base::get_refcount();
    }
};

struct Dog : public Object{
    std::string name;
    int         age;

    Dog(std::string _name, int _age=1) : 
        Object(), name(_name), age(_age) {
    }

    static ref<Dog> make(std::string _name, int _age=1){
        total_allocated_objs++;
        return new Dog(_name, _age); 
    }

    ~Dog(){
        total_allocated_objs--;
        std::cout << "Doggie Died: " << name << "  @" << uint64_t(this) << std::endl;
    }
};

struct Kennel : public Object {
    ref<Dog> dog;

    Kennel() : Object() {}

    static ref<Kennel> make(std::string _name, int _age=1){
        total_allocated_objs += 2;
        Kennel* kennel = new Kennel();
        kennel->dog = new Dog(_name, _age);
        // std::cout << "K dog refcount(2):" << kennel->dog->get_refcount() << std::endl;
        return kennel; 
    }

    ~Kennel(){
        total_allocated_objs--;
        std::cout << "Kennel of " << dog->name << " Died:  @" << uint64_t(this) << std::endl;
    }
};


NB_MODULE(my_ext, m) {
    nb::class_<Object>(
      m, "Object",
      nb::intrusive_ptr<Object>(
          [](Object *o, PyObject *po) noexcept { o->set_self_py(po); })
      );

    nb::intrusive_init(
    [](PyObject *o) noexcept {
        nb::gil_scoped_acquire guard;
        Py_INCREF(o);
    },
    [](PyObject *o) noexcept {
        nb::gil_scoped_acquire guard;
        Py_DECREF(o);
    });

    nb::class_<Dog>(m, "Dog")
        .def(nb::new_(&Dog::make), "name"_a="fido", "age"_a=1, nb::rv_policy::reference)
        .def_rw("name", &Dog::name)
        .def("get_refcount", &Object::get_refcount)
    ;

    nb::class_<Kennel>(m, "Kennel")
        .def(nb::new_(&Kennel::make), "name"_a="fido", "age"_a=1, nb::rv_policy::reference)
        .def_rw("dog", &Kennel::dog, nb::rv_policy::reference_internal)
        .def("get_refcount", &Object::get_refcount)
    ;

    m.def("leaked_objects", [](){return total_allocated_objs;});
}

Now we test in Python

from dummer_ext import Dog, Kennel, leaked_objects
d = Dog("woofy")
d = None
print("^ 'woofy' Shoulda died.\n")

k = Kennel("scratchy", 50)
# print("kennel:", k.get_refcount())
k = None
print("^  Kennel Shoulda died.")
print("^ 'scratchy' Shoulda died.\n")

k = Kennel("buddy", 64)
# print('k->buddy (1)=2,', k.dog.get_refcount())
# print('k->buddy (2)=2,', k.dog.get_refcount())
# print('k->buddy (3)=2,', k.dog.get_refcount())
# print("kennel:", k.get_refcount())
d = k.dog
print("k.dog is k.dog: ", d is k.dog)
# print('k->buddy (4)=3,', k.dog.get_refcount())
k = None
print("^  Kennel Shoulda died.")
# print('k->buddy (5)=2,', d.get_refcount())
d = None
print("^ 'buddy' Shoulda died.")

print("\n--- End of Program --")
print("Leaked Objects: ", leaked_objects())

Doggie Died: woofy  @94112506739936
^ 'woofy' Shoulda died.

Kennel of scratchy Died:  @94112506754592
Doggie Died: scratchy  @94112506739936
^  Kennel Shoulda died.
^ 'scratchy' Shoulda died.

Kennel of buddy Died:  @94112506754592
Doggie Died: buddy  @94112506739936
[1]    60022 segmentation fault (core dumped)  python3.10 ../python/tests/run_broken_dog.py

For me this segfaults on d = k.dog, is probably because Kennel("buddy", 64) has mysteriously deleted itself early. I have all kinds of variants of this where I do my best to stick within the nanobind's defaults, and each solution seems to segfault or leaking. For instance I have a variant where intrusive_base is replaced with my own Object base class, and ref is replaced with my own implemention. That variant just leaks everything without a segfault.

The only approach that passes this simple test without faults or leaks is the version where I hack tp_dealloc to call dec_ref() on garbage collection, and additionally make sure the base Object class holds a pointer to each PyObject when a new one is created (that pointer is null'ed on garbage collection).

I'm sure your time is limited. I'm certainly not asking for a solution, perhaps just a discussion. After all, I already have a robust solution---it just happens to work around nanobind instead of with it. Feel free to show me wrong and share a setup that passes this test, although this toy example is really only tests a small fraction of the flexibility that I need. I'm mostly sharing this because it seems odd to me that, as far as I can tell, nanobind doesn't support this very simple pattern of memory management which seems quite common, and affords a lot more flexibility what I've been able to produce with nanobind's current defaults.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Way for Python wrapper over C++ class to borrow reference, released on garbage collection? #770

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Way for Python wrapper over C++ class to borrow reference, released on garbage collection? #770

DannyWeitekamp Oct 27, 2024

Replies: 3 comments

DannyWeitekamp Oct 28, 2024 Author

wjakob Nov 5, 2024 Maintainer

DannyWeitekamp Nov 5, 2024 Author

DannyWeitekamp
Oct 27, 2024

DannyWeitekamp
Oct 28, 2024
Author

wjakob
Nov 5, 2024
Maintainer

DannyWeitekamp
Nov 5, 2024
Author