diff --git a/doc/source/architecture_0.4.3.png b/doc/source/architecture_0.4.3.png new file mode 100644 index 000000000..2aebf34dc Binary files /dev/null and b/doc/source/architecture_0.4.3.png differ diff --git a/doc/source/design.png b/doc/source/design.png deleted file mode 100755 index d9bbdedfb..000000000 Binary files a/doc/source/design.png and /dev/null differ diff --git a/doc/source/design.rst b/doc/source/design.rst index a37a939a5..b6c1c2a0d 100644 --- a/doc/source/design.rst +++ b/doc/source/design.rst @@ -12,7 +12,7 @@ The Kernel Tuner is designed to be extensible and support different search and execution strategies. The current architecture of the Kernel Tuner can be seen as: -.. image:: design.png +.. image:: architecture_0.4.3.png :width: 500pt At the top we have the kernel code and the Python script that tunes it, @@ -33,32 +33,33 @@ the only supported runner, which does exactly what its name says. It compiles and benchmarks configurations using a single sequential Python process. Other runners are foreseen in future releases. -The runners are implemented on top of a high-level *Device Interface*, +The runners are implemented on top of the core, which implements a +high-level *Device Interface*, which wraps all the functionality for compiling and benchmarking kernel configurations based on the low-level *Device Function Interface*. Currently, we have -four different implementations of the device function interface, which +five different implementations of the device function interface, which basically abstracts the different backends into a set of simple functions such as ``ready_argument_list`` which allocates GPU memory and moves data to the GPU, and functions like ``compile``, ``benchmark``, or ``run_kernel``. The functions in the core are basically the main building blocks for implementing runners. -At the bottom, three of the backends are shown. -PyCUDA and PyOpenCL are for tuning either CUDA or OpenCL kernels. -A relatively new addition is the Cupy backend based on Cupy for tuning -CUDA kernels using the NVRTC compiler. +The observers are explained in :ref:`observers`. + +At the bottom, the backends are shown. +PyCUDA, CuPy, cuda-python and PyOpenCL are for tuning either CUDA or OpenCL kernels. The C Functions implementation can actually call any compiler, typically NVCC -or GCC is used. This backend was created not just to be able to tune C -functions, but mostly to tune C functions that in turn launch GPU kernels. +or GCC is used. There is limited support for tuning Fortran kernels. +This backend was created not just to be able to tune C +functions, but in particular to tune C functions that in turn launch GPU kernels. The rest of this section contains the API documentation of the modules discussed above. For the documentation of the user API see the :doc:`user-api`. - Strategies ---------- @@ -109,6 +110,12 @@ kernel_tuner.cupy.CupyFunctions :special-members: __init__ :members: +kernel_tuner.nvcuda.CudaFunctions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. autoclass:: kernel_tuner.nvcuda.CudaFunctions + :special-members: __init__ + :members: + kernel_tuner.opencl.OpenCLFunctions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. autoclass:: kernel_tuner.opencl.OpenCLFunctions