nb::ndarray<...> implementation refactored (#721)

This PR refactors the ``nb::ndarray`` implementation to remove certain redundancies. In particular, there were duplicate code paths to process ``ndarray`` template parameters at compile time and at runtime, which are now merged. Significant edits to the documentation are intended to make nd-array bindings more approachable to newcomers. Finally, the refactor was also an opportunity to realize two usability improvements: 1. The constructor to return new nd-arrays from C++ now considers all template arguments: - **Memory order**: ``c_contig``, ``f_contig``. - **Shape**: ``nb::shape<3, 4, 5>``, etc. - **Device type**: ``nb::device::cpu``, ``nb::device::cuda``, etc. - **Framework**: ``nb::numpy``, ``nb::pytorch``, etc. - **Data type**: ``uint64_t``, ``std::complex<double>``, etc. Previously, only the **framework** and **data type** annotations were taken into account when returning nd-arrays, while all of them were examined when *accepting* arrays during overload resolution. This inconsistency was a repeated source of confusion among users. To give an example, the following now works out of the box without the need to redundantly specify the shape and strides to the ``Array`` constructor below: ```cpp using Array = nb::ndarray<float, nb::numpy, nb::shape<4, 4>, nb::f_contig>; struct Matrix4f { float m[4][4]; Array data() { return Array(m); } }; nb::class_<Matrix4f>(m, "Matrix4f") .def("data", &Matrix4f::data, nb::rv_policy::reference_internal); ``` 2. A new nd-array ``.cast()`` method forces the immediate creation of a Python object with the specified target framework and return value policy, while preserving the type signature in return values. This is useful to return temporaries (e.g. stack-allocated memory) from functions. There are two minor but potentially breaking changes: 1. The ndarray type caster now interprets the ``rv_policy::automatic_reference`` return value policy analogously to the ``rv_policy::automatic``, which means that it references a memory region when the user specifies an ``owner``, and it otherwise copies. This makes it safe to use the ``nb::cast()`` and ``nb::ndarray::cast()`` functions that use this policy as a default. 2. The ``nb::any_contig`` memory order annotation, which previously did nothing, now accepts C- or F-contiguous arrays and rejects non-contiguous ones. In both of these cases, the prior convention seems like it would cause bugs/breakage in practice. If nobody depends on this behavior, it should be OK to fix these without a major version bump. A small change without compatibility implications: the `owner` argument has a default `{}` argument again. I think this is reasonably safe because the `automatic_*` return value policies in nanobind will copy the input array if there isn't an owner. This effectively reverts a change from commit 937a1df.
wjakob · Sep 19, 2024 · e67d934 · e67d934
1 parent 7f7e0c0
commit e67d934
Show file tree

Hide file tree

Showing 13 changed files with 1,163 additions and 773 deletions.
diff --git a/docs/api_extra.rst b/docs/api_extra.rst
@@ -644,8 +644,8 @@ N-dimensional array type
 ------------------------
 
 The following type can be used to exchange n-dimension arrays with frameworks
-like NumPy, PyTorch, Tensorflow, JAX, CuPy, and others. It requires an additional
-include directive:
+like NumPy, PyTorch, Tensorflow, JAX, CuPy, and others. It requires an
+additional include directive:
 
 .. code-block:: cpp
 
@@ -664,11 +664,36 @@ section <ndarrays>`.
 
 .. cpp:class:: template <typename... Args> ndarray
 
+   .. cpp:type:: Scalar
+
+      The scalar type underlying the array (or ``void`` if not specified)
+
    .. cpp:var:: static constexpr bool ReadOnly
 
-      A constant static boolean that is true if the array's data is read-only.
-      This is determined by the class template arguments, not by any dynamic
-      properties of the referenced array.
+      A ``constexpr`` Boolean value that is ``true`` if the ndarray template
+      arguments (`Args... <Args>`) include the ``nb::ro`` annotation or a
+      ``const``-qualified scalar type.
+
+   .. cpp:var:: static constexpr char Order
+
+      A ``constexpr`` character value set based on the ndarray template
+      arguments (`Args... <Args>`). It equals
+
+      - ``'C'`` if :cpp:class:`c_contig` is specified,
+      - ``'F'`` if :cpp:class:`f_contig` is specified,
+      - ``'A'`` if :cpp:class:`any_contig` is specified,
+      - ``'\0'`` otherwise.
+
+   .. cpp:var:: static constexpr int DeviceType
+
+      A ``constexpr`` integer value set to the device type ID extracted from
+      the ndarray template arguments (`Args... <Args>`), or
+      :cpp:struct:`device::none::value <device::none>` when none was specified.
+
+   .. cpp:type:: VoidPtr = std::conditional_t<ReadOnly, const void *, void *>
+
+      A potentially ``const``-qualified ``void*`` pointer type used by some
+      of the ``ndarray`` constructors.
 
    .. cpp:function:: ndarray() = default
 
@@ -677,8 +702,8 @@ section <ndarrays>`.
    .. cpp:function:: template <typename... Args2> explicit ndarray(const ndarray<Args2...> &other)
 
       Reinterpreting constructor that wraps an existing nd-array (parameterized
-      by `Args`) into a new ndarray (parameterized by `Args2`).   No copy or
-      conversion is made.
+      by `Args... <Args>`) into a new ndarray (parameterized by `Args2...
+      <Args2>`). No copy or conversion is made.
 
       Dropping parameters is always safe. For example, a function that
       returns different array types could call it to convert ``ndarray<T>`` to
@@ -708,37 +733,87 @@ section <ndarrays>`.
       Move assignment operator. Steals the referenced array without changing reference counts.
       Decreases the reference count of the previously referenced array and potentially destroy it.
 
-   .. cpp:function:: ndarray(void * data, size_t ndim, const size_t * shape, handle owner = nanobind::handle(), const int64_t * strides = nullptr, dlpack::dtype dtype = nanobind::dtype<Scalar>(), int32_t device_type = device::cpu::value, int32_t device_id = 0)
+   .. _ndarray_dynamic_constructor:
+
+   .. cpp:function:: ndarray(VoidPtr data, const std::initializer_list<size_t> shape = { }, handle owner = { }, std::initializer_list<int64_t> strides = { }, dlpack::dtype dtype = nanobind::dtype<Scalar>(), int32_t device_type = DeviceType, int32_t device_id = 0, char order = Order)
 
-      Create an array wrapping an existing memory allocation. The following
-      parameters can be specified:
+      Create an array wrapping an existing memory allocation.
 
-      - `data`: pointer address of the memory region. When the ndarray is
-        parameterized by a constant scalar type to indicate read-only access, a
-        const pointer must be passed instead.
+      Only the `data` parameter is strictly required, while some other
+      parameters can be be inferred from static :cpp:class:`nb::ndarray\<...\>
+      <ndarray>` template parameters.
 
-      - `ndim`: the number of dimensions.
+      The parameters have the following meaning:
 
-      - `shape`: specifies the size along each axis. The referenced array must
-        must have `ndim` entries.
+      - `data`: a CPU/GPU/.. pointer to the memory region storing the array
+        data.
+
+        When the array is parameterized by a ``const`` scalar type, or when it
+        has a :cpp:class:`nb::ro <ro>` read-only annotation, a ``const``
+        pointer can be passed here.
+
+      - `shape`: an initializer list that simultaneously specifies the number
+        of dimensions and the size along each axis. If left at its default
+        ``{}``, the :cpp:class:`nb::shape <nanobind::shape>` template parameter
+        will take precedence (if present).
 
       - `owner`: if provided, the array will hold a reference to this object
-        until it is destructed.
+        until its destruction. This makes it possible to create zero-copy views
+        into other data structures, while guaranteeing the memory safety of
+        array accesses.
+
+      - `strides`: an initializer list explaining the layout of the data in
+        memory. Each entry denotes the number of elements to jump over to
+        advance to the next item along the associated axis.
 
-      - `strides` is optional; a value of ``nullptr`` implies C-style strides.
+        `strides` must either have the same size as `shape` or be empty. In the
+        latter case, strides are automatically computed according to the
+        `order` parameter.
 
-      - `dtype` describes the data type (floating point, signed/unsigned
-        integer) and bit depth.
+        Note that strides in nanobind express *element counts* rather than
+        *byte counts*. This convention differs from other frameworks (e.g.,
+        NumPy) and is a consequence of the underlying `DLPack
+        <https://github.com/dmlc/dlpack>`_ protocol.
 
-      - The `device_type` and `device_id` indicate the device and address
-        space associated with the pointer `value`.
+      - `dtype` describes the numeric data type of array elements (e.g.,
+        floating point, signed/unsigned integer) and their bit depth.
 
-   .. cpp:function:: ndarray(void * data, const std::initializer_list<size_t> shape, handle owner = nanobind::handle(), std::initializer_list<int64_t> strides = { }, dlpack::dtype dtype = nanobind::dtype<Scalar>(), int32_t device_type = device::cpu::value, int32_t device_id = 0)
+        You can use the :cpp:func:`nb::dtype\<T\>() <nanobind::dtype>` function to obtain the right
+        value for a given type.
 
-      Alternative form of the above constructor, which accepts the ``shape``
-      and ``strides`` arguments using a ``std::initializer_list``. It
-      automatically infers the value of ``ndim`` based on the size of
-      ``shape``.
+      - `device_type` and `device_id` specify where the array data is stored.
+        The `device_type` must be an enumerant like
+        :cpp:class:`nb::device::cuda::value <device::cuda>`, while the meaning
+        of the device ID is unspecified and platform-dependent.
+
+        Note that the `device_id` is set to ``0`` by default and cannot be
+        inferred by nanobind. If your extension creates arrays on multiple
+        different compute accelerators, you *must* provide this parameter.
+
+      - The `order` parameter denotes the coefficient order in memory and is only
+        relevant when `strides` is empty. Specify ``'C'`` for C-style or ``'F'``
+        for Fortran-style. When this parameter is not explicitly specified, the
+        implementation uses the order specified as an ndarray template
+        argument, or C-style order as a fallback.
+
+      Both ``strides`` and ``shape`` will be copied by the constructor, hence
+      the targets of these initializer lists do not need to remain valid
+      following the constructor call.
+
+      .. warning::
+
+         The Python *global interpreter lock* (GIL) must be held when calling
+         this function.
+
+   .. cpp:function:: ndarray(VoidPtr data, size_t ndim, const size_t * shape, handle owner, const int64_t * strides = nullptr, dlpack::dtype dtype = nanobind::dtype<Scalar>(), int device_type = DeviceType, int device_id = 0, char order = Order)
+
+      Alternative form of the above constructor, which accepts the `shape`
+      and `strides` arguments using pointers instead of initializer lists.
+      The number of dimensions must be specified via the `ndim` parameter
+      in this case.
+
+      See the previous constructor for details, the remaining behavior is
+      identical.
 
    .. cpp:function:: dlpack::dtype dtype() const
 
@@ -788,13 +863,13 @@ section <ndarrays>`.
 
       Check whether the array is in a valid state.
 
-   .. cpp:function:: int32_t device_type() const
+   .. cpp:function:: int device_type() const
 
       ID denoting the type of device hosting the array. This will match the
       ``value`` field of a device class, such as :cpp:class:`device::cpu::value
       <device::cpu>` or :cpp:class:`device::cuda::value <device::cuda>`.
 
-   .. cpp:function:: int32_t device_id() const
+   .. cpp:function:: int device_id() const
 
       In a multi-device/GPU setup, this function returns the ID of the device
       storing the array.
@@ -804,15 +879,18 @@ section <ndarrays>`.
       Return a pointer to the array data.
       If :cpp:var:`ReadOnly` is true, a pointer-to-const is returned.
 
-   .. cpp:function:: template <typename... Ts> auto& operator()(Ts... indices)
+   .. cpp:function:: template <typename... Args2> auto& operator()(Args2... indices)
 
       Return a reference to the element stored at the provided index/indices.
       If :cpp:var:`ReadOnly` is true, a reference-to-const is returned.
-      Note that ``sizeof(Ts)`` must match :cpp:func:`ndim()`.
+      Note that ``sizeof...(Args2)`` must match :cpp:func:`ndim()`.
 
       This accessor is only available when the scalar type and array dimension
       were specified as template parameters.
 
+      This function should only be used when the array storage is accessible
+      through the CPU's virtual memory address space.
+
    .. cpp:function:: template <typename... Extra> auto view()
 
       Returns an nd-array view that is optimized for fast array access on the
@@ -824,6 +902,18 @@ section <ndarrays>`.
       ``shape()``, ``stride()``, and ``operator()`` following the conventions
       of the `ndarray` type.
 
+   .. cpp:function:: auto cast(rv_policy policy = rv_policy::automatic_reference, handle parent = {})
+
+      The expression ``array.cast(policy, parent)`` is almost equivalent to
+      :cpp:func:`nb::cast(array, policy, parent) <cast>`.
+
+      The main difference is that the return type of :cpp:func:`nb::cast
+      <cast>` is :cpp:class:`nb::object <object>`, which renders as a rather
+      non-descriptive ``object`` in Python bindings. The ``.cast()`` method
+      returns a custom wrapper type that still derives from
+      :cpp:class:`nb::object <object>`, but whose type signature in bindings
+      reproduces that of the original nd-array.
+
 Data types
 ^^^^^^^^^^
 
@@ -947,7 +1037,10 @@ Contiguity
 
 .. cpp:class:: any_contig
 
-   Don't place any demands on array contiguity (the default).
+   Accept both C- and F-contiguous arrays.
+
+If you prefer not to require contiguity, simply do not provide any of the
+``*_contig`` template parameters listed above.
 
 Device type
 +++++++++++

diff --git a/docs/changelog.rst b/docs/changelog.rst
@@ -10,18 +10,15 @@ It also has a separate ABI version that is *not* subject to semantic
 versioning.
 
 The ABI version is relevant whenever a type binding from one extension module
-should be visible in another (also nanobind-based) extension module. In this
+should be visible in another nanobind-based extension module. In this
 case, both modules must use the same nanobind ABI version, or they will be
 isolated from each other. Releases that don't explicitly mention an ABI version
 below inherit that of the preceding release.
 
 Version 2.2.0 (TBA)
 -------------------
 
-- The NVIDIA CUDA compiler (``nvcc``) is now explicitly supported and included
-  in nanobind's CI test suite.
-
-- nanobind has always used `PEP 590 vector calls
+* nanobind has always used `PEP 590 vector calls
   <https://www.python.org/dev/peps/pep-0590>`__ to efficiently dispatch calls
   to function and method bindings, but it lacked the ability to do so for
   constructors (e.g., ``MyType(arg1, arg2, ...)``).
@@ -41,6 +38,63 @@ Version 2.2.0 (TBA)
   with :cpp:class:`nb::is_arithmetic() <is_flag>` creates enumerations deriving
   from :py:class:`enum.IntFlag`.
 
+* A refactor of :cpp:class:`nb::ndarray\<...\> <ndarray>` was an opportunity to
+  realize two usability improvements:
+
+  1. The constructor used to return new nd-arrays from C++ now considers
+     all template arguments:
+
+     - **Memory order**: :cpp:class:`c_contig`, :cpp:class:`f_contig`.
+     - **Shape**: :cpp:class:`nb::shape\<3, 4, 5\> <shape>`, etc.
+     - **Device type**: :cpp:class:`nb::device::cpu <device::cpu>`, :cpp:class:`nb::device::cuda <device::cuda>`, etc.
+     - **Framework**: :cpp:class:`nb::numpy <numpy>`, :cpp:class:`nb::pytorch <pytorch>`, etc.
+     - **Data type**: ``uint64_t``, ``std::complex<double>``, etc.
+
+     Previously, only the **framework** and **data type** annotations were
+     taken into account when returning nd-arrays, while all of them were
+     examined when *accepting* arrays during overload resolution. This
+     inconsistency was a repeated source of confusion among users.
+
+     To give an example, the following now works out of the box without the
+     need to redundantly specify the shape and strides to the ``Array``
+     constructor below:
+
+     .. code-block:: cpp
+
+        using Array = nb::ndarray<float, nb::numpy, nb::shape<4, 4>, nb::f_contig>;
+
+        struct Matrix4f {
+            float m[4][4];
+            Array data() { return Array(m); }
+        };
+
+        nb::class_<Matrix4f>(m, "Matrix4f")
+            .def("data", &Matrix4f::data, nb::rv_policy::reference_internal);
+
+  2. A new nd-array :cpp:func:`.cast() <ndarray::cast>` method forces the
+     immediate creation of a Python object with the specified target framework
+     and return value policy, while preserving the type signature in return
+     values. This is useful to :ref:`return temporaries (e.g. stack-allocated
+     memory) <ndarray-temporaries>` from functions.
+
+  There are two minor but potentially breaking changes:
+
+  1. The ndarray type caster now interprets the
+     :cpp:enumerator:`nb::rv_policy::automatic_reference
+     <rv_policy::automatic_reference>` return value policy analogously to the
+     :cpp:enumerator:`nb::rv_policy::automatic <rv_policy::automatic>`, which
+     means that it references a memory region when the user specifies an
+     ``owner``, and it otherwise copies. This makes it safe to use the
+     :cpp:func:`nb::cast() <cast>` and :cpp:func:`nb::ndarray::cast()
+     <ndarray::cast>` functions that use this policy as a default.
+
+  2. The :cpp:class:`nb::any_contig <any_contig>` memory order annotation,
+     which previously did nothing, now accepts C- or F-contiguous arrays and
+     rejects non-contiguous ones.
+
+- The NVIDIA CUDA compiler (``nvcc``) is now explicitly supported and included
+  in nanobind's CI test suite.
+
 * Added support for return value policy customization to the type casters of
   ``Eigen::Ref<...>`` and ``Eigen::Map<...>`` (commit `67316e
   <https://github.com/wjakob/nanobind/commit/67316eb88955a15e8e89a57ce9a53d8d66263287>`__).
@@ -61,25 +115,21 @@ Version 2.2.0 (TBA)
 * Fixed implicit conversion of complex nd-arrays. (issue `#709
   <https://github.com/wjakob/nanobind/issues/709>`__)
 
-* Minor fixes and improvements (PR `#696
-  <https://github.com/wjakob/nanobind/pull/696>`__, `#693
-  <https://github.com/wjakob/nanobind/pull/693>`__, `#675
-  <https://github.com/wjakob/nanobind/pull/675>`__, commit `75d259
-  <https://github.com/wjakob/nanobind/commit/75d259c7c16db9586e5cd3aa4715e09a25e76d83>`__).
-
 * Casting via :cpp:func:`nb::cast <cast>` can now specify an owner object for
   use with the :cpp:enumerator:`nb::rv_policy::reference_internal
-  <rv_policy::reference_internal>` return value policy  (PR `#667
-  <https://github.com/wjakob/nanobind/pull/667>`__) #667
+  <rv_policy::reference_internal>` return value policy (PR `#667
+  <https://github.com/wjakob/nanobind/pull/667>`__).
 
-* The ``std::optional<T>`` type caster is now implemented so that it can also
-  accommodate other frameworks such as Boost, Abseil, etc. (PR `#675
-  <https://github.com/wjakob/nanobind/pull/675>`__)
+* The ``std::optional<T>`` type caster is now implemented in such a way that it
+  can also accommodate non-STL frameworks, such as Boost, Abseil, etc. (PR
+  `#675 <https://github.com/wjakob/nanobind/pull/675>`__)
 
 * ABI version 15.
 
-* Minor fixes and improvements.
-
+* Minor fixes and improvements (PR `#696
+  <https://github.com/wjakob/nanobind/pull/696>`__, `#693
+  <https://github.com/wjakob/nanobind/pull/693>`__, commit `75d259
+  <https://github.com/wjakob/nanobind/commit/75d259c7c16db9586e5cd3aa4715e09a25e76d83>`__).
 
 Version 2.1.0 (Aug 11, 2024)
 ----------------------------
@@ -572,15 +622,15 @@ New features
 
 * Several :cpp:class:`nb::ndarray\<..\> <ndarray>` improvements:
 
-  1. CPU loops involving nanobind ndarrays weren't getting properly vectorized.
+  1. CPU loops involving nanobind nd-arrays weren't getting properly vectorized.
      This release of nanobind adds *views*, which provide an efficient
      abstraction that enables better code generation. See the documentation
      section on :ref:`array views <ndarray-views>` for details.
      (commit `8f602e
      <https://github.com/wjakob/nanobind/commit/8f602e187b0634e1df13ba370352cf092e9042c0>`__).
 
   2. Added support for nonstandard arithmetic types (e.g., ``__int128`` or
-     ``__fp16``) in ndarrays. See the :ref:`documentation section
+     ``__fp16``) in nd-arrays. See the :ref:`documentation section
      <ndarray-nonstandard>` for details. (commit `49eab2
      <https://github.com/wjakob/nanobind/commit/49eab2845530f84a1f029c5c1c5541ab3c1f9adc>`__).
 
@@ -589,7 +639,7 @@ New features
      :cpp:class:`nb::ndim\<3\> <ndim>`. (commit `1350a5
      <https://github.com/wjakob/nanobind/commit/1350a5e15b28e80ffc2130a779f3b8c559ddb620>`__).
 
-  4. Added an explicit constructor that can be used to add or remove ndarray
+  4. Added an explicit constructor that can be used to add or remove nd-array
      constraints. (commit `a1ac207
      <https://github.com/wjakob/nanobind/commit/a1ac207ab82206b8e50fe456f577c02270014fb3>`__).