Releases: NVIDIA/thrust
Thrust 2.1.0
New Features
- #1805: Add default constructors to
transform_output_iterator
andtransform_input_output_iterator
. Thanks to Mark Harris (@harrism) for this contribution. - #1836: Enable constructions of vectors from
std::initializer_list
.
Bug Fixes
- #1768: Fix type conversion warning in the
thrust::complex
utilities. Thanks to Zishi Wu (@zishiwu123) for this contribution. - #1809: Fix some warnings about usage of
__host__
functions in__device__
code. - #1825: Fix Thrust’s CMake install rules. Thanks to Robert Maynard (@robertmaynard) for this contribution.
- #1827: Fix
thrust::reduce_by_key
when using non-default-initializable iterators. - #1832: Fix bug in device-side CDP
thrust::reduce
when using a large number of inputs.
Other Enhancements
- #1815: Update Thrust’s libcu++ git submodule to version 1.8.1.
- #1841: Fix invalid code in execution policy documentation example. Thanks to Raphaël Frantz (@Eren121) for this contribution.
- #1848: Improve error messages when attempting to launch a kernel on a device that is not supported by compiled PTX versions. Thanks to Zahra Khatami (@zkhatami) for this contribution.
- #1855: Remove usage of deprecated CUDA error codes.
Thrust 2.0.1
Other Enhancements
- Disable CDP parallelization of device-side invocations of Thrust algorithms on SM90+. The removal of device-side synchronization support in recent architectures makes Thrust’s fork-join model unimplementable on device, so a serial implementation will be used instead. Host-side invocations of Thrust algorithms are not affected.
Thrust 1.17.2
Summary
Thrust 1.17.2 is a minor bugfix release that provides an updated version of CUB.
Thrust 2.0.0
Summary
The Thrust 2.0.0 major release adds a dependency on libcu++ and contains several breaking changes. These include new diagnostics when inspecting device-only lambdas from the host, removal of the cub
symlink in the Thrust repository root, and removal of the deprecated THRUST_*_BACKEND
macros. It also includes several minor bugfixes and cleanups.
Breaking Changes
- #1605: Add libcu++ dependency.
- A suitable version of libcu++ is provided through the
${THRUST_ROOT}/dependencies/libcudacxx/
submodule. - Non-cmake users may need to add the libcu++ include path to their builds (
-I ${THRUST_ROOT}/dependencies/libcudacxx/include/
). - The Thrust CMake packages have been updated to add this include path.
- A suitable version of libcu++ is provided through the
- #1605: The following macros are no longer defined by default. They can be re-enabled by defining
THRUST_PROVIDE_LEGACY_ARCH_MACROS
. These will be removed completely in a future release.THRUST_IS_HOST_CODE
: Replace withNV_IF_TARGET
.THRUST_IS_DEVICE_CODE
: Replace withNV_IF_TARGET
.THRUST_INCLUDE_HOST_CODE
: Replace withNV_IF_TARGET
.THRUST_INCLUDE_DEVICE_CODE
: Replace withNV_IF_TARGET
.THRUST_DEVICE_CODE
: Replace withNV_IF_TARGET
.
- #1661: Thrust’s CUDA Runtime support macros have been updated to support
NV_IF_TARGET
. They are now defined consistently across all host/device compilation passes. This should not affect most usages of these macros, but may require changes for some edge cases.THRUST_RUNTIME_FUNCTION
: Execution space annotations for functions that invoke CUDA Runtime APIs.- Old behavior:
- RDC enabled: Defined to
__host__ __device__
- RDC not enabled:
- NVCC host pass: Defined to
__host__ __device__
- NVCC device pass: Defined to
__host__
- NVCC host pass: Defined to
- RDC enabled: Defined to
- New behavior:
- RDC enabled: Defined to
__host__ __device__
- RDC not enabled: Defined to
__host__
- RDC enabled: Defined to
- Old behavior:
__THRUST_HAS_CUDART__
: No change in behavior, but no longer used in Thrust. Provided for legacy support only. Legacy behavior:- RDC enabled: Defined to 1.
- RDC not enabled:
- NVCC host pass: Defined to 1.
- NVCC device pass: Defined to 0.
THRUST_RDC_ENABLED
: New macro, may be combined withNV_IF_TARGET
to replace most usages of__THRUST_HAS_CUDART__
. Behavior:- RDC enabled: Macro is defined.
- RDC not enabled: Macro is not defined.
- #1701: Remove the
cub
symlink from the root of the Thrust repository.- This symlink caused issues in certain build environments (e.g. #1328).
- Builds that relied on this symlink will need to add the full CUB include path (
-I ${THRUST_ROOT}/dependencies/cub
). - CMake builds that use the Thrust packages via CPM,
add_subdirectory
, orfind_package
are not affected.
- #1760: A compile-time error is now emitted when a
__device__
-only lambda’s return type is queried from host code (requires libcu++ ≥ 1.9.0).- Due to limitations in the CUDA programming model, the result of this query is unreliable, and will silently return an incorrect result. This leads to difficult to debug errors.
- When using libcu++ 1.9.0, an error will be emitted with information about work-arounds:
- Use a named function object with a
__device__
-only implementation ofoperator()
. - Use a
__host__ __device__
lambda. - Use
cuda::proclaim_return_type
(Added in libcu++ 1.9.0)
- Use a named function object with a
- #1761: Removed support for deprecated
THRUST_DEVICE_BACKEND
andTHRUST_HOST_BACKEND
macros. TheTHRUST_DEVICE_SYSTEM
andTHRUST_HOST_SYSTEM
macros should be used instead.
Bug Fixes
- #1605: Fix some execution space warnings in the allocator library.
- #1683: Fix bug in
iterator_category_to_traversal
metafunctions. - #1715: Add missing
__thrust_exec_check_disable__
annotation tothrust::make_zip_function
. Thanks to @mfbalin for this contribution. - #1722: Remove CUDA-specific error handler from code that may be executed on non-CUDA backends. Thanks to @dkolsen-pgi for this contribution.
- #1756: Fix
copy_if
for output iterators that don’t support copy assignment. Thanks for @mfbalin for this contribution.
Other Enhancements
Thrust 1.17.1
Summary
Thrust 1.17.1 is a minor bugfix release that provides an updated version of CUB.
Thrust 1.17.0
Thrust 1.17.0
Summary
Thrust 1.17.0 is the final minor release of the 1.X series. This release provides GDB pretty-printers for device vectors/references, a new unique_count
algorithm, and an easier way to create tagged Thrust iterators. Several documentation fixes are included, which can be found on the new Thrust documentation site at https://nvidia.github.io/thrust. We’ll be migrating existing documentation sources to this new location over the next few months.
New Features
- #1586: Add new
thrust::make_tagged_iterator
convenience function. Thanks to @karthikeyann for this contribution. - #1619: Add
unique_count
algorithm. Thanks to @upsj for this contribution. - #1631: Add GDB pretty-printers for device vectors/references to
scripts/gdb-pretty-printers.py
. Thanks to @upsj for this contribution.
Bug Fixes
- #1671: Fixed
reduce_by_key
when called with 2^31 elements.
Other Enhancements
- #1512: Use CUB to implement
adjacent_difference
. - #1555: Use CUB to implement
scan_by_key
. - #1611: Add new doxybook-based Thrust documentation at https://nvidia.github.io/thrust.
- #1639: Fixed broken link in documentation. Thanks to @jrhemstad for this contribution.
- #1644: Increase contrast of search input text in new doc site. Thanks to @bdice for this contribution.
- #1647: Add
__forceinline__
annotations to a functor wrapper. Thanks to @mkuron for this contribution. - #1660: Fixed typo in documentation example for
permutation_iterator
. - #1669: Add a new
explicit_cuda_stream.cu
example that shows how to use explicit CUDA streams andpar
/par_nosync
execution policies.
Thrust 1.16.0
Summary
Thrust 1.16.0 provides a new “nosync” hint for the CUDA backend, as well as numerous bugfixes and stability improvements.
New thrust::cuda::par_nosync
Execution Policy
Most of Thrust’s parallel algorithms are fully synchronous and will block the calling CPU thread until all work is completed. This design avoids many pitfalls associated with asynchronous GPU programming, resulting in simpler and less-error prone usage for new CUDA developers. Unfortunately, this improvement in user experience comes at a performance cost that often frustrates more experienced CUDA programmers.
Prior to this release, the only synchronous-to-asynchronous migration path for existing Thrust codebases involved significant refactoring, replacing calls to thrust
algorithms with a limited set of future
-based thrust::async
algorithms or lower-level CUB kernels. The new thrust::cuda::par_nosync
execution policy provides a new, less-invasive entry point for asynchronous computation.
par_nosync
is a hint to the Thrust execution engine that any non-essential internal synchronizations should be skipped and that an explicit synchronization will be performed by the caller before accessing results.
While some Thrust algorithms require internal synchronization to safely compute their results, many do not. For example, multiple thrust::for_each
invocations can be launched without waiting for earlier calls to complete:
// Queue three `for_each` kernels:
thrust::for_each(thrust::cuda::par_nosync, vec1.begin(), vec1.end(), Op{});
thrust::for_each(thrust::cuda::par_nosync, vec2.begin(), vec2.end(), Op{});
thrust::for_each(thrust::cuda::par_nosync, vec3.begin(), vec3.end(), Op{});
// Do other work while kernels execute:
do_something();
// Must explictly synchronize before accessing `for_each` results:
cudaDeviceSynchronize();
Thanks to @fkallen for this contribution.
Deprecation Notices
CUDA Dynamic Parallelism Support
A future version of Thrust will remove support for CUDA Dynamic Parallelism (CDP).
This will only affect calls to Thrust algorithms made from CUDA device-side code that currently launches a kernel; such calls will instead execute sequentially on the calling GPU thread instead of launching a device-wide kernel.
Breaking Changes
- Thrust 1.14.0 included a change that aliased the
cub
namespace tothrust::cub
. This has caused issues with ambiguous namespaces for projects that declareusing namespace thrust;
from the global namespace. We recommend against this practice. - #1572: Removed several unnecessary header includes. Downstream projects may need to update their includes if they were relying on this behavior.
New Features
Enhancements
- #1511: Use CUB’s new
DeviceMergeSort
API and remove Thrust’s internal implementation. - #1566: Improved performance of
thrust::shuffle
. Thanks to @djns99 for this contribution. - #1584: Support user-defined
CMAKE_INSTALL_INCLUDEDIR
values in Thrust’s CMake install rules. Thanks to @robertmaynard for this contribution.
Bug Fixes
- #1496: Fix some issues affecting
icc
builds. - #1552: Fix some collisions with the
min
/max
macros defined inwindows.h
. - #1582: Fix issue with function type alias on 32-bit MSVC builds.
- #1591: Workaround issue affecting compilation with
nvc++
. - #1597: Fix some collisions with the
small
macro defined inwindows.h
. - #1599, #1603: Fix some issues with version handling in Thrust’s CMake packages.
- #1614: Clarify that scan algorithm results are non-deterministic for pseudo-associative operators (e.g. floating-point addition).
Thrust 1.15.0
Summary
Thrust 1.15.0 provides numerous bugfixes, including non-numeric thrust::sequence
support, several MSVC-related compilation fixes, fewer conversion warnings, counting_iterator
initialization, and documentation updates.
Deprecation Notices
A future version of Thrust will remove support for CUDA Dynamic Parallelism (CDP).
This will only affect calls to Thrust algorithms made from CUDA device-side code that currently launches a kernel; such calls will instead execute sequentially on the calling GPU thread instead of launching a device-wide kernel.
Bug Fixes
- #1507: Allow
thrust::sequence
to work with non-numeric types. Thanks to Ben Jude (@bjude) for this contribution. - #1509: Avoid macro collision when calling
max()
on MSVC. Thanks to Thomas (@tomintheshell) for this contribution. - #1514: Initialize all members in
counting_iterator
's default constructor. - #1518: Fix
std::allocator_traits
on MSVC + C++17. - #1530: Fix several
-Wconversion
warnings. Thanks to Matt Stack (@matt-stack) for this contribution. - #1539: Fixed typo in
thrust::for_each
documentation. Thanks to Salman (@untamedimpala) for this contribution. - #1548: Avoid name collision with
B0
macro in termios.h system header. Thanks to Philip Deegan (@PhilipDeegan) for this contribution.
Thrust 1.14.0 (NVIDIA HPC SDK 21.9)
Thrust 1.14.0 is a major release accompanying the NVIDIA HPC SDK 21.9.
This release adds the ability to wrap the thrust::
namespace in an external namespace, providing a workaround for a variety of shared library linking issues. Thrust also learned to detect when CUB's symbols are in a wrapped namespace and properly import them. To enable this feature, use #define THRUST_CUB_WRAPPED_NAMESPACE foo
to wrap both Thrust and CUB in the foo::
namespace. See thrust/detail/config/namespace.h
for details and more namespace options.
Several bugfixes are also included: The tuple_size
and tuple_element
helpers now support cv-qualified types.
scan_by_key
uses less memory. thrust::iterator_traits
is better integrated with std::iterator_traits
. See below for more details and references.
New Features
- #1464: Add preprocessor hooks that allow
thrust::
to be wrapped in an external namespace, and support cases when CUB is wrapped in an external namespace.
Bug Fixes
- #1457: Support cv-qualified types in
thrust::tuple_size
andthrust::tuple_element
. Thanks to Jake Hemstad for this contribution. - #1471: Fixed excessive memory allocation in
scan_by_key
. Thanks to Lilo Huang for this contribution. - #1476: Removed dead code from the
expand
example. Thanks to Lilo Huang for this contribution. - #1488: Fixed the path to the installed CUB headers in the CMake
find_package
configuration files. - #1491: Fallback to
std::iterator_traits
when nothrust::iterator_traits
specialization exists for an iterator type. Thanks to Divye Gala for this contribution.
Thrust 1.13.1 (CUDA Toolkit 11.5)
Thrust 1.13.1 is a minor release accompanying the CUDA Toolkit 11.5.
This release provides a new hook for embedding the thrust::
namespace inside a custom namespace. This is intended to work around various issues related to linking multiple shared libraries that use Thrust. The existing CUB_NS_PREFIX
and CUB_NS_POSTFIX
macros already provided this capability for CUB; this update provides a simpler mechanism that is extended to and integrated with Thrust. Simply define THRUST_CUB_WRAPPED_NAMESPACE
to a namespace name, and both thrust::
and cub::
will be placed inside the new namespace. Using different wrapped namespaces for each shared library will prevent issues like those reported in #1401.
New Features
- #1464: Add
THRUST_CUB_WRAPPED_NAMESPACE
hooks.
Bug Fixes
- #1488: Fix path to installed CUB in Thrust's CMake config files.