This repository has been archived by the owner on Mar 21, 2024. It is now read-only.
CUB 1.16.0 #434
alliepiper
announced in
Announcements
CUB 1.16.0
#434
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Summary
CUB 1.16.0 is a major release providing several improvements to the device scope algorithms.
DeviceRadixSort
now supports large (64-bit indexed) input data. A newUniqueByKey
algorithm has been added toDeviceSelect
.DeviceAdjacentDifference
provides newSubtractLeft
andSubtractRight
functionality.This release also deprecates several obsolete APIs, including type traits and
BlockAdjacentDifference
algorithms. Many bugfixes and documentation updates are also included.64-bit Offsets in
DeviceRadixSort
Public APIsUsers frequently want to process large datasets using CUB’s device-scope algorithms, but the current public APIs limit input data sizes to those that can be indexed by a 32-bit integer. Beginning with this release, CUB is updating these APIs to support 64-bit offsets, as discussed in #212.
The device-scope algorithms will be updated with 64-bit offset support incrementally, starting with the
cub::DeviceRadixSort
family of algorithms. Thanks to @canonizer for contributing this functionality.New
DeviceSelect::UniqueByKey
Algorithmcub::DeviceSelect
now provides aUniqueByKey
algorithm, which has been ported from Thrust. Thanks to @zasdfgbnm for this contribution.New
DeviceAdjacentDifference
AlgorithmsThe new
cub::DeviceAdjacentDifference
interface, also ported from Thrust, providesSubtractLeft
andSubtractRight
algorithms as CUB kernels.Deprecation Notices
Synchronous CUDA Dynamic Parallelism Support
A future version of CUB will change the
debug_synchronous
behavior of device-scope algorithms when invoked via CUDA Dynamic Parallelism (CDP).This will only affect calls to CUB device-scope algorithms launched from device-side code with
debug_synchronous = true
. Such invocations will continue to print extra debugging information, but they will no longer synchronize after kernel launches.Deprecated Traits
CUB provided a variety of metaprogramming type traits in order to support C++03. Since C++14 is now required, these traits have been deprecated in favor of their STL equivalents, as shown below:
CUB now uses the STL traits internally, resulting in a ~6% improvement in compile time.
Misnamed
cub::BlockAdjacentDifference
APIsThe algorithms in
cub::BlockAdjacentDifference
have been deprecated, as their names did not clearly describe their intent. TheFlagHeads
method is nowSubtractLeft
, andFlagTails
has been replaced bySubtractRight
.Breaking Changes
BlockAdjacentDifference::FlagHeads
andFlagTails
methods. Use the newSubtractLeft
andSubtractRight
methods instead.<type_traits>
as described above.New Features
thrust::adjacent_difference
kernel and expose it ascub::DeviceAdjacentDifference
.thrust::unique_by_key
kernel and expose it ascub::DeviceSelect::UniqueByKey
. Thanks to @zasdfgbmn for this contribution.Enhancements
DeviceRadixSort
public APIs. Thanks to @canonizer for this contribution.DeviceMergeSort
compilation time.CMAKE_INSTALL_INCLUDEDIR
values in Thrust’s CMake install rules. Thanks for @robertmaynard for this contribution.Bug Fixes
dyn_smem
example.min
/max
macros defined inwindows.h
.util_device
.DeviceSegmentedSort
.nv_exec_check_disable
pragma is only used on nvcc.-Wsizeof-array-div
warning on gcc 11. Thanks to @robertmaynard for this contribution.DiscardIterator
on gcc 10.small
macro defined inwindows.h
.DeviceSpmv
parameters that are absent from public APIs.DeviceScan
algorithms that guaranteed run-to-run deterministic results for floating-point addition.This discussion was created from the release CUB 1.16.0.
Beta Was this translation helpful? Give feedback.
All reactions