Version 0.4.0
benvanwerkhoven
released this
09 Apr 11:50
·
1155 commits
to master
since this release
This version adds a great deal of new functionality and extra flexibility and additional control to the user over what is being benchmarked and when. From the CHANGELOG:
Added
- support for (lambda) function instead of list of strings for restrictions
- support for (lambda) function instead of list for specifying grid divisors
- support for (lambda) function instead of tuple for specifying problem_size
- function to store the top tuning results
- function to create header file with device targets from stored results
- support for using tuning results in PythonKernel
- option to control measurements using observers
- support for NVML tunable parameters
- option to simulate auto-tuning searches from existing cache files
- Cupy backend to support C++ templated CUDA kernels
- support for templated CUDA kernels using PyCUDA backend
- documentation on tunable parameter vocabulary