Release Version 0.4.0 · KernelTuner/kernel_tuner

This version adds a great deal of new functionality and extra flexibility and additional control to the user over what is being benchmarked and when. From the CHANGELOG:

Added

support for (lambda) function instead of list of strings for restrictions
support for (lambda) function instead of list for specifying grid divisors
support for (lambda) function instead of tuple for specifying problem_size
function to store the top tuning results
function to create header file with device targets from stored results
support for using tuning results in PythonKernel
option to control measurements using observers
support for NVML tunable parameters
option to simulate auto-tuning searches from existing cache files
Cupy backend to support C++ templated CUDA kernels
support for templated CUDA kernels using PyCUDA backend
documentation on tunable parameter vocabulary

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 0.4.0

Added