18 May 19:09

13b70d4

Thrust 1.9.7-1 (CUDA Toolkit 10.2 for Tegra)

Thrust 1.9.7-1 is a minor release accompanying the CUDA Toolkit 10.2 release for Tegra. It is nearly identical to 1.9.7.

Bug Fixes

Remove support for GCC's broken nodiscard-like attribute.

Assets 2

16 May 08:22

brycelelbach

1.9.7

f52cc02

Thrust 1.9.7 (CUDA Toolkit 10.2)

Thrust 1.9.7 is a minor release accompanying the CUDA Toolkit 10.2 release. Unfortunately, although the version and patch numbers are identical, one bug fix present in Thrust 1.9.7 (NVBug 2646034: Fix incorrect dependency handling for stream acquisition in thrust::future) was not included in the CUDA Toolkit 10.2 preview release for AArch64 SBSA. The tag cuda-10.2aarch64sbsa contains the exact version of Thrust present in the CUDA Toolkit 10.2 preview release for AArch64 SBSA.

Bug Fixes

#967, NVBug 2448170: Fix the CUDA backend thrust::for_each so that it supports large input sizes with 64-bit indices.
NVBug 2646034: Fix incorrect dependency handling for stream acquisition in thrust::future.
- Not present in the CUDA Toolkit 10.2 preview release for AArch64 SBSA.
#968, NVBug 2612102: Fix the thrust::mr::polymorphic_adaptor to actually use its template parameter.

Assets 2

18 May 21:52

brycelelbach

1.9.6-1

2949034

Thrust 1.9.6-1 (NVIDIA HPC SDK 20.3)

Thrust 1.9.6-1 is a variant of 1.9.6 accompanying the NVIDIA HPC SDK 20.3 release. It contains modifications necessary to serve as the implementation of NVC++'s GPU-accelerated C++17 Parallel Algorithms when using the CUDA Toolkit 10.1 Update 2 release.

Assets 2

16 May 08:21

brycelelbach

1.9.6

fd2a9d4

Thrust 1.9.6 (CUDA Toolkit 10.1 Update 2)

Thrust 1.9.6 is a minor release accompanying the CUDA Toolkit 10.1 Update 2 release.

Bug Fixes

NVBug 2509847: Inconsistent alignment of thrust::complex
NVBug 2586774: Compilation failure with Clang + older libstdc++ that doesn't have std::is_trivially_copyable
NVBug 200488234: CUDA header files contain unicode characters which leads compiling errors on Windows
#949, #973, NVBug 2422333, NVBug 2522259, NVBug 2528822: thrust::detail::aligned_reinterpret_cast must be annotated with __host__ __device__.
NVBug 2599629: Missing include in the OpenMP sort implementation
NVBug 200513211: Truncation warning in test code under VC142

Assets 2

14 May 16:15

griwes

1.9.5

aded199

Thrust 1.9.5 (CUDA Toolkit 10.1 Update 1)

Thrust v1.9.5 is a minor bugfix release accompanying the CUDA 10.1 Update 1 CUDA Toolkit release.

Bug Fixes

2502854 Assignment of complex vector between host and device fails to compile in CUDA >=9.1 with GCC 6.

Assets 2

01 Mar 03:37

brycelelbach

1.9.4

4f43a17

Thrust 1.9.4 (CUDA Toolkit 10.1)

Thrust 1.9.4 adds asynchronous interfaces for parallel algorithms, a new allocator system including caching allocators and unified memory support, as well as a variety of other enhancements, mostly related to C++11/C++14/C++17/C++20 support. The new asynchronous algorithms in the thrust::async namespace return thrust::event or thrust::future objects, which can be waited upon to synchronize with the completion of the parallel operation.

Breaking API Changes

Synchronous Thrust algorithms now block until all of their operations have completed. Use the new asynchronous Thrust algorithms for non-blocking behavior.

New Features

thrust::event and thrust::future<T>, uniquely-owned asynchronous handles consisting of a state (ready or not ready), content (some value; for thrust::future only), and an optional set of objects that should be destroyed only when the future's value is ready and has been consumed.
- The design is loosely based on C++11's std::future.
- They can be .wait'd on, and the value of a future can be waited on and retrieved with .get or .extract.
- Multiple thrust::events and thrust::futures can be combined with thrust::when_all.
- thrust::futures can be converted to thrust::events.
- Currently, these primitives are only implemented for the CUDA backend and are C++11 only.
New asynchronous algorithms that return thrust::event/thrust::futures, implemented as C++20 range style customization points:
- thrust::async::reduce.
- thrust::async::reduce_into, which takes a target location to store the reduction result into.
- thrust::async::copy, including a two-policy overload that allows explicit cross system copies which execution policy properties can be attached to.
- thrust::async::transform.
- thrust::async::for_each.
- thrust::async::stable_sort.
- thrust::async::sort.
- By default the asynchronous algorithms use the new caching allocators. Deallocation of temporary storage is deferred until the destruction of the returned thrust::future. The content of thrust::futures is stored in either device or universal memory and transferred to the host only upon request to prevent unnecessary data migration.
- Asynchronous algorithms are currently only implemented for the CUDA system and are C++11 only.
exec.after(f, g, ...), a new execution policy method that takes a set of thrust::event/thrust::futures and returns an execution policy that operations on that execution policy should depend upon.
New logic and mindset for the type requirements for cross-system sequence copies (currently only used by thrust::async::copy), based on:
- thrust::is_contiguous_iterator and THRUST_PROCLAIM_CONTIGUOUS_ITERATOR for detecting/indicating that an iterator points to contiguous storage.
- thrust::is_trivially_relocatable and THRUST_PROCLAIM_TRIVIALLY_RELOCATABLE for detecting/indicating that a type is memcpyable (based on principles from https://wg21.link/P1144).
- The new approach reduces buffering, increases performance, and increases correctness.
- The fast path is now enabled when copying fp16 and CUDA vector types with thrust::async::copy.
All Thrust synchronous algorithms for the CUDA backend now actually synchronize. Previously, any algorithm that did not allocate temporary storage (counterexample: thrust::sort) and did not have a computation-dependent result (counterexample: thrust::reduce) would actually be launched asynchronously. Additionally, synchronous algorithms that allocated temporary storage would become asynchronous if a custom allocator was supplied that did not synchronize on allocation/deallocation, unlike cudaMalloc/cudaFree. So, now thrust::for_each, thrust::transform, thrust::sort, etc are truly synchronous. In some cases this may be a performance regression; if you need asynchrony, use the new asynchronous algorithms.
Thrust's allocator framework has been rewritten. It now uses a memory resource system, similar to C++17's std::pmr but supporting static polymorphism. Memory resources are objects that allocate untyped storage and allocators are cheap handles to memory resources in this new model. The new facilities live in <thrust/mr/*>.
- thrust::mr::memory_resource<Pointer>, the memory resource base class, which takes a (possibly tagged) pointer to void type as a parameter.
- thrust::mr::allocator<T, MemoryResource>, an allocator backed by a memory resource object.
- thrust::mr::polymorphic_adaptor_resource<Pointer>, a type-erased memory resource adaptor.
- thrust::mr::polymorphic_allocator<T>, a C++17-style polymorphic allocator backed by a type-erased memory resource object.
- New tunable C++17-style caching memory resources, thrust::mr::(disjoint_)?(un)?synchronized_pool_resource, designed to cache both small object allocations and large repetitive temporary allocations. The disjoint variants use separate storage for management of the pool, which is necessary if the memory being allocated cannot be accessed on the host (e.g. device memory).
- System-specific allocators were rewritten to use the new memory resource framework.
- New thrust::device_memory_resource for allocating device memory.
- New thrust::universal_memory_resource for allocating memory that can be accessed from both the host and device (e.g. cudaMallocManaged).
- New thrust::universal_host_pinned_memory_resource for allocating memory that can be accessed from the host and the device but always resides in host memory (e.g. cudaMallocHost).
- thrust::get_per_device_resource and thrust::per_device_allocator, which lazily create and retrieve a per-device singleton memory resource.
- Rebinding mechanisms (rebind_traits and rebind_alloc) for thrust::allocator_traits.
- thrust::device_make_unique, a factory function for creating a std::unique_ptr to a newly allocated object in device memory.
- <thrust/detail/memory_algorithms>, a C++11 implementation of the C++17 uninitialized memory algorithms.
- thrust::allocate_unique and friends, based on the proposed C++23 std::allocate_unique (https://wg21.link/P0211).
New type traits and metaprogramming facilities. Type traits are slowly being migrated out of thrust::detail:: and <thrust/detail/*>; their new home will be thrust:: and <thrust/type_traits/*>.
- thrust::is_execution_policy.
- thrust::is_operator_less_or_greater_function_object, which detects thrust::less, thrust::greater, std::less, and std::greater.
- thrust::is_operator_plus_function_object``, which detects thrust::plusandstd::plus`.
- thrust::remove_cvref(_t)?, a C++11 implementation of C++20's thrust::remove_cvref(_t)?.
- thrust::void_t, and various other new type traits.
- thrust::integer_sequence and friends, a C++11 implementation of C++20's std::integer_sequence
- thrust::conjunction, thrust::disjunction, and thrust::disjunction, a C++11 implementation of C++17's logical metafunctions.
- Some Thrust type traits (such as thrust::is_constructible) have been redefined in terms of C++11's type traits when they are available.
<thrust/detail/tuple_algorithms.h>, new std::tuple algorithms:
- thrust::tuple_transform.
- thrust::tuple_for_each.
- thrust::tuple_subset.
Miscellaneous new std::-like facilities:
- thrust::optional, a C++11 implementation of C++17's std::optional.
- thrust::addressof, an implementation of C++11's std::addressof.
- thrust::next and thrust::prev, an implementation of C++11's std::next and std::prev.
- thrust::square, a <functional> style unary function object that multiplies its argument by itself.
- <thrust/limits.h> and thrust::numeric_limits, a customized version of <limits> and std::numeric_limits.
<thrust/detail/preprocessor.h>, new general purpose preprocessor facilities:
- THRUST_PP_CAT[2-5], concatenates two to five tokens.
- THRUST_PP_EXPAND(_ARGS)?, performs double expansion.
- THRUST_PP_ARITY and THRUST_PP_DISPATCH, tools for macro overloading.
- THRUST_PP_BOOL, boolean conversion.
- THRUST_PP_INC and THRUST_PP_DEC, increment/decrement.
- THRUST_PP_HEAD, a variadic macro that expands to the first argument.
- THRUST_PP_TAIL, a variadic macro that expands to all its arguments after the first.
- THRUST_PP_IIF, bitwise conditional.
- THRUST_PP_COMMA_IF, and THRUST_PP_HAS_COMMA, facilities for adding and detecting comma tokens.
- THRUST_PP_IS_VARIADIC_NULLARY, returns true if called with a nullary __VA_ARGS__.
- THRUST_CURRENT_FUNCTION, expands to the name of the current function.
New C++11 compatibility macros:
- THRUST_NODISCARD, expands to [[nodiscard]] when available and the best equivalent otherwise.
- THRUST_CONSTEXPR, expands to constexpr when available and the best equivalent otherwise.
- THRUST_OVERRIDE, expands to override when available and the best equivalent otherwise.
- THRUST_DEFAULT, expands to = default; when available and the best equivalent otherwise.
- THRUST_NOEXCEPT, expands to noexcept when available and the best equivalent otherwise.
- THRUST_FINAL, expands to final when available and the best equivalent otherwise.
- THRUST_INLINE_CONSTANT, expands to inline constexpr when available and the best equivalent otherwise.
<thrust/detail/type_deduction.h>, new C++11-only type deduction helpers:
- THRUST_DECLTYPE_RETURNS*, expand to function definitions with suitable conditional noexcept qualifiers and trailing return types.
- THRUST_FWD(x), expands to ::std::forward<decltype(x)>(x).
- THRUST_MVCAP, expands to a lambda move capture.
- THRUST_RETOF, expands to a decltype computing the return type of an invocable.

New ...

Assets 2

16 May 10:07

brycelelbach

1.9.3

17a8f8c

Thrust 1.9.3 (CUDA Toolkit 10.0)

Thrust 1.9.3 unifies and integrates CUDA Thrust and GitHub Thrust.

Bug Fixes

#725, #850, #855, #859, #860: Unify the thrust::iter_swap interface and fix thrust::device_reference swapping.
NVBug 2004663: Add a data method to thrust::detail::temporary_array and refactor temporary memory allocation in the CUDA backend to be exception and leak safe.
#886, #894, #914: Various documentation typo fixes.
#724: Provide NVVMIR_LIBRARY_DIR environment variable to NVCC.
#878: Optimize thrust::min/max_element to only use thrust::detail::get_iterator_value for non-numeric types.
#899: Make thrust::cuda::experimental::pinned_allocator's comparison operators const.
NVBug 2092152: Remove all includes of <cuda.h>.
#911: Fix default comparator element type for thrust::merge_by_key.

Acknowledgments

Thanks to Andrew Corrigan for contributing fixes for swapping interfaces.
Thanks to Francisco Facioni for contributing optimizations for thrust::min/max_element.

Assets 2

16 May 10:06

brycelelbach

1.9.2

8b5620a

Thrust 1.9.2 (CUDA Toolkit 9.2)

Thrust 1.9.2 brings a variety of performance enhancements, bug fixes and test improvements. CUB 1.7.5 was integrated, enhancing the performance of thrust::sort on small data types and thrust::reduce. Changes were applied to complex to optimize memory access. Thrust now compiles with compiler warnings enabled and treated as errors. Additionally, the unit test suite and framework was enhanced to increase coverage.

Breaking Changes

The fallback_allocator example was removed, as it was buggy and difficult to support.

New Features

<thrust/detail/alignment.h>, utilities for memory alignment:
- thrust::aligned_reinterpret_cast.
- thrust::aligned_storage_size, which computes the amount of storage needed for an object of a particular size and alignment.
- thrust::alignment_of, a C++03 implementation of C++11's std::alignment_of.
- thrust::aligned_storage, a C++03 implementation of C++11's std::aligned_storage.
- thrust::max_align_t, a C++03 implementation of C++11's std::max_align_t.

Bug Fixes

NVBug 200385527, NVBug 200385119, NVBug 200385113, NVBug 200349350, NVBug 2058778: Various compiler warning issues.
NVBug 200355591: thrust::reduce performance issues.
NVBug 2053727: Fixed an ADL bug that caused user-supplied allocate to be overlooked but deallocate to be called with GCC <= 4.3.
NVBug 1777043: Fixed thrust::complex to work with thrust::sequence.

Assets 2

18 May 18:12

brycelelbach

1.9.1-2

1d58371

Thrust 1.9.1-2 (CUDA Toolkit 9.1)

Thrust 1.9.1 integrates version 1.7.4 of CUB and introduces a new CUDA backend for thrust::reduce based on CUB.

Bug Fixes

NVBug 1965743: Remove unnecessary static qualifiers.
NVBug 1940974: Fix regression causing a compilation error when using thrust::merge_by_key with thrust::constant_iterators.
NVBug 1904217: Allow callables that take non-const refs to be used with thrust::reduce and thrust::*_scan.

Assets 2

18 May 18:12

brycelelbach

1.9.0-5

7bd79fb

Thrust 1.9.0-5 (CUDA Toolkit 9.0)

Thrust 1.9.0 replaces the original CUDA backend (bulk) with a new one written using CUB, a high performance CUDA collectives library. This brings a substantial performance improvement to the CUDA backend across the board.

Breaking Changes

Any code depending on CUDA backend implementation details will likely be broken.

New Features

New CUDA backend based on CUB which delivers substantially higher performance.
thrust::transform_output_iterator, a fancy iterator that applies a function to the output before storing the result.

New Examples

transform_output_iterator demonstrates use of the new fancy iterator thrust::transform_output_iterator.

Other Enhancements

When C++11 is enabled, functors do not have to inherit from thrust::(unary|binary)_function anymore to be used with thrust::transform_iterator.
Added C++11 only move constructors and move assignment operators for thrust::detail::vector_base-based classes, e.g. thrust::host_vector, thrust::device_vector, and friends.

Bug Fixes

sin(thrust::complex<double>) no longer has precision loss to float.

Acknowledgments

Thanks to Manuel Schiller for contributing a C++11 based enhancement regarding the deduction of functor return types, improving the performance of thrust::unique and implementing thrust::transform_output_iterator.
Thanks to Thibault Notargiacomo for the implementation of move semantics for the thrust::vector_base-based classes.
Thanks to Duane Merrill for developing CUB and helping to integrate it into Thrust's backend.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Fixes

Bug Fixes

Bug Fixes

Bug Fixes

Breaking API Changes

New Features

New ...

Bug Fixes

Acknowledgments

Breaking Changes

New Features

Bug Fixes

Bug Fixes

Breaking Changes

New Features

New Examples

Other Enhancements

Bug Fixes

Acknowledgments

Releases: NVIDIA/thrust

Thrust 1.9.7-1 (CUDA Toolkit 10.2 for Tegra)

Bug Fixes

Thrust 1.9.7 (CUDA Toolkit 10.2)

Bug Fixes

Thrust 1.9.6-1 (NVIDIA HPC SDK 20.3)

Thrust 1.9.6 (CUDA Toolkit 10.1 Update 2)

Bug Fixes

Thrust 1.9.5 (CUDA Toolkit 10.1 Update 1)

Bug Fixes

Thrust 1.9.4 (CUDA Toolkit 10.1)

Breaking API Changes

New Features

New ...

Thrust 1.9.3 (CUDA Toolkit 10.0)

Bug Fixes

Acknowledgments

Thrust 1.9.2 (CUDA Toolkit 9.2)

Breaking Changes

New Features

Bug Fixes

Thrust 1.9.1-2 (CUDA Toolkit 9.1)

Bug Fixes

Thrust 1.9.0-5 (CUDA Toolkit 9.0)

Breaking Changes

New Features

New Examples

Other Enhancements

Bug Fixes

Acknowledgments