Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Releases: NVIDIA/thrust

Thrust 1.9.7-1 (CUDA Toolkit 10.2 for Tegra)

18 May 19:09
Compare
Choose a tag to compare

Thrust 1.9.7-1 is a minor release accompanying the CUDA Toolkit 10.2 release for Tegra. It is nearly identical to 1.9.7.

Bug Fixes

  • Remove support for GCC's broken nodiscard-like attribute.

Thrust 1.9.7 (CUDA Toolkit 10.2)

16 May 08:22
Compare
Choose a tag to compare

Thrust 1.9.7 is a minor release accompanying the CUDA Toolkit 10.2 release. Unfortunately, although the version and patch numbers are identical, one bug fix present in Thrust 1.9.7 (NVBug 2646034: Fix incorrect dependency handling for stream acquisition in thrust::future) was not included in the CUDA Toolkit 10.2 preview release for AArch64 SBSA. The tag cuda-10.2aarch64sbsa contains the exact version of Thrust present in the CUDA Toolkit 10.2 preview release for AArch64 SBSA.

Bug Fixes

  • #967, NVBug 2448170: Fix the CUDA backend thrust::for_each so that it supports large input sizes with 64-bit indices.
  • NVBug 2646034: Fix incorrect dependency handling for stream acquisition in thrust::future.
    • Not present in the CUDA Toolkit 10.2 preview release for AArch64 SBSA.
  • #968, NVBug 2612102: Fix the thrust::mr::polymorphic_adaptor to actually use its template parameter.

Thrust 1.9.6-1 (NVIDIA HPC SDK 20.3)

18 May 21:52
Compare
Choose a tag to compare

Thrust 1.9.6-1 is a variant of 1.9.6 accompanying the NVIDIA HPC SDK 20.3 release. It contains modifications necessary to serve as the implementation of NVC++'s GPU-accelerated C++17 Parallel Algorithms when using the CUDA Toolkit 10.1 Update 2 release.

Thrust 1.9.6 (CUDA Toolkit 10.1 Update 2)

16 May 08:21
Compare
Choose a tag to compare

Thrust 1.9.6 is a minor release accompanying the CUDA Toolkit 10.1 Update 2 release.

Bug Fixes

  • NVBug 2509847: Inconsistent alignment of thrust::complex
  • NVBug 2586774: Compilation failure with Clang + older libstdc++ that doesn't have std::is_trivially_copyable
  • NVBug 200488234: CUDA header files contain unicode characters which leads compiling errors on Windows
  • #949, #973, NVBug 2422333, NVBug 2522259, NVBug 2528822: thrust::detail::aligned_reinterpret_cast must be annotated with __host__ __device__.
  • NVBug 2599629: Missing include in the OpenMP sort implementation
  • NVBug 200513211: Truncation warning in test code under VC142

Thrust 1.9.5 (CUDA Toolkit 10.1 Update 1)

14 May 16:15
Compare
Choose a tag to compare

Thrust v1.9.5 is a minor bugfix release accompanying the CUDA 10.1 Update 1 CUDA Toolkit release.

Bug Fixes

  • 2502854 Assignment of complex vector between host and device fails to compile in CUDA >=9.1 with GCC 6.

Thrust 1.9.4 (CUDA Toolkit 10.1)

01 Mar 03:37
Compare
Choose a tag to compare

Thrust 1.9.4 adds asynchronous interfaces for parallel algorithms, a new allocator system including caching allocators and unified memory support, as well as a variety of other enhancements, mostly related to C++11/C++14/C++17/C++20 support. The new asynchronous algorithms in the thrust::async namespace return thrust::event or thrust::future objects, which can be waited upon to synchronize with the completion of the parallel operation.

Breaking API Changes

Synchronous Thrust algorithms now block until all of their operations have completed. Use the new asynchronous Thrust algorithms for non-blocking behavior.

New Features

  • thrust::event and thrust::future<T>, uniquely-owned asynchronous handles consisting of a state (ready or not ready), content (some value; for thrust::future only), and an optional set of objects that should be destroyed only when the future's value is ready and has been consumed.

    • The design is loosely based on C++11's std::future.
    • They can be .wait'd on, and the value of a future can be waited on and retrieved with .get or .extract.
    • Multiple thrust::events and thrust::futures can be combined with thrust::when_all.
    • thrust::futures can be converted to thrust::events.
    • Currently, these primitives are only implemented for the CUDA backend and are C++11 only.
  • New asynchronous algorithms that return thrust::event/thrust::futures, implemented as C++20 range style customization points:

    • thrust::async::reduce.
    • thrust::async::reduce_into, which takes a target location to store the reduction result into.
    • thrust::async::copy, including a two-policy overload that allows explicit cross system copies which execution policy properties can be attached to.
    • thrust::async::transform.
    • thrust::async::for_each.
    • thrust::async::stable_sort.
    • thrust::async::sort.
    • By default the asynchronous algorithms use the new caching allocators. Deallocation of temporary storage is deferred until the destruction of the returned thrust::future. The content of thrust::futures is stored in either device or universal memory and transferred to the host only upon request to prevent unnecessary data migration.
    • Asynchronous algorithms are currently only implemented for the CUDA system and are C++11 only.
  • exec.after(f, g, ...), a new execution policy method that takes a set of thrust::event/thrust::futures and returns an execution policy that operations on that execution policy should depend upon.

  • New logic and mindset for the type requirements for cross-system sequence copies (currently only used by thrust::async::copy), based on:

    • thrust::is_contiguous_iterator and THRUST_PROCLAIM_CONTIGUOUS_ITERATOR for detecting/indicating that an iterator points to contiguous storage.
    • thrust::is_trivially_relocatable and THRUST_PROCLAIM_TRIVIALLY_RELOCATABLE for detecting/indicating that a type is memcpyable (based on principles from https://wg21.link/P1144).
    • The new approach reduces buffering, increases performance, and increases correctness.
    • The fast path is now enabled when copying fp16 and CUDA vector types with thrust::async::copy.
  • All Thrust synchronous algorithms for the CUDA backend now actually synchronize. Previously, any algorithm that did not allocate temporary storage (counterexample: thrust::sort) and did not have a computation-dependent result (counterexample: thrust::reduce) would actually be launched asynchronously. Additionally, synchronous algorithms that allocated temporary storage would become asynchronous if a custom allocator was supplied that did not synchronize on allocation/deallocation, unlike cudaMalloc/cudaFree. So, now thrust::for_each, thrust::transform, thrust::sort, etc are truly synchronous. In some cases this may be a performance regression; if you need asynchrony, use the new asynchronous algorithms.

  • Thrust's allocator framework has been rewritten. It now uses a memory resource system, similar to C++17's std::pmr but supporting static polymorphism. Memory resources are objects that allocate untyped storage and allocators are cheap handles to memory resources in this new model. The new facilities live in <thrust/mr/*>.

    • thrust::mr::memory_resource<Pointer>, the memory resource base class, which takes a (possibly tagged) pointer to void type as a parameter.
    • thrust::mr::allocator<T, MemoryResource>, an allocator backed by a memory resource object.
    • thrust::mr::polymorphic_adaptor_resource<Pointer>, a type-erased memory resource adaptor.
    • thrust::mr::polymorphic_allocator<T>, a C++17-style polymorphic allocator backed by a type-erased memory resource object.
    • New tunable C++17-style caching memory resources, thrust::mr::(disjoint_)?(un)?synchronized_pool_resource, designed to cache both small object allocations and large repetitive temporary allocations. The disjoint variants use separate storage for management of the pool, which is necessary if the memory being allocated cannot be accessed on the host (e.g. device memory).
    • System-specific allocators were rewritten to use the new memory resource framework.
    • New thrust::device_memory_resource for allocating device memory.
    • New thrust::universal_memory_resource for allocating memory that can be accessed from both the host and device (e.g. cudaMallocManaged).
    • New thrust::universal_host_pinned_memory_resource for allocating memory that can be accessed from the host and the device but always resides in host memory (e.g. cudaMallocHost).
    • thrust::get_per_device_resource and thrust::per_device_allocator, which lazily create and retrieve a per-device singleton memory resource.
    • Rebinding mechanisms (rebind_traits and rebind_alloc) for thrust::allocator_traits.
    • thrust::device_make_unique, a factory function for creating a std::unique_ptr to a newly allocated object in device memory.
    • <thrust/detail/memory_algorithms>, a C++11 implementation of the C++17 uninitialized memory algorithms.
    • thrust::allocate_unique and friends, based on the proposed C++23 std::allocate_unique (https://wg21.link/P0211).
  • New type traits and metaprogramming facilities. Type traits are slowly being migrated out of thrust::detail:: and <thrust/detail/*>; their new home will be thrust:: and <thrust/type_traits/*>.

    • thrust::is_execution_policy.
    • thrust::is_operator_less_or_greater_function_object, which detects thrust::less, thrust::greater, std::less, and std::greater.
    • thrust::is_operator_plus_function_object``, which detects thrust::plusandstd::plus`.
    • thrust::remove_cvref(_t)?, a C++11 implementation of C++20's thrust::remove_cvref(_t)?.
    • thrust::void_t, and various other new type traits.
    • thrust::integer_sequence and friends, a C++11 implementation of C++20's std::integer_sequence
    • thrust::conjunction, thrust::disjunction, and thrust::disjunction, a C++11 implementation of C++17's logical metafunctions.
    • Some Thrust type traits (such as thrust::is_constructible) have been redefined in terms of C++11's type traits when they are available.
  • <thrust/detail/tuple_algorithms.h>, new std::tuple algorithms:

    • thrust::tuple_transform.
    • thrust::tuple_for_each.
    • thrust::tuple_subset.
  • Miscellaneous new std::-like facilities:

    • thrust::optional, a C++11 implementation of C++17's std::optional.
    • thrust::addressof, an implementation of C++11's std::addressof.
    • thrust::next and thrust::prev, an implementation of C++11's std::next and std::prev.
    • thrust::square, a <functional> style unary function object that multiplies its argument by itself.
    • <thrust/limits.h> and thrust::numeric_limits, a customized version of <limits> and std::numeric_limits.
  • <thrust/detail/preprocessor.h>, new general purpose preprocessor facilities:

    • THRUST_PP_CAT[2-5], concatenates two to five tokens.
    • THRUST_PP_EXPAND(_ARGS)?, performs double expansion.
    • THRUST_PP_ARITY and THRUST_PP_DISPATCH, tools for macro overloading.
    • THRUST_PP_BOOL, boolean conversion.
    • THRUST_PP_INC and THRUST_PP_DEC, increment/decrement.
    • THRUST_PP_HEAD, a variadic macro that expands to the first argument.
    • THRUST_PP_TAIL, a variadic macro that expands to all its arguments after the first.
    • THRUST_PP_IIF, bitwise conditional.
    • THRUST_PP_COMMA_IF, and THRUST_PP_HAS_COMMA, facilities for adding and detecting comma tokens.
    • THRUST_PP_IS_VARIADIC_NULLARY, returns true if called with a nullary __VA_ARGS__.
    • THRUST_CURRENT_FUNCTION, expands to the name of the current function.
  • New C++11 compatibility macros:

    • THRUST_NODISCARD, expands to [[nodiscard]] when available and the best equivalent otherwise.
    • THRUST_CONSTEXPR, expands to constexpr when available and the best equivalent otherwise.
    • THRUST_OVERRIDE, expands to override when available and the best equivalent otherwise.
    • THRUST_DEFAULT, expands to = default; when available and the best equivalent otherwise.
    • THRUST_NOEXCEPT, expands to noexcept when available and the best equivalent otherwise.
    • THRUST_FINAL, expands to final when available and the best equivalent otherwise.
    • THRUST_INLINE_CONSTANT, expands to inline constexpr when available and the best equivalent otherwise.
  • <thrust/detail/type_deduction.h>, new C++11-only type deduction helpers:

    • THRUST_DECLTYPE_RETURNS*, expand to function definitions with suitable conditional noexcept qualifiers and trailing return types.
    • THRUST_FWD(x), expands to ::std::forward<decltype(x)>(x).
    • THRUST_MVCAP, expands to a lambda move capture.
    • THRUST_RETOF, expands to a decltype computing the return type of an invocable.

New ...

Read more

Thrust 1.9.3 (CUDA Toolkit 10.0)

16 May 10:07
Compare
Choose a tag to compare

Thrust 1.9.3 unifies and integrates CUDA Thrust and GitHub Thrust.

Bug Fixes

  • #725, #850, #855, #859, #860: Unify the thrust::iter_swap interface and fix thrust::device_reference swapping.
  • NVBug 2004663: Add a data method to thrust::detail::temporary_array and refactor temporary memory allocation in the CUDA backend to be exception and leak safe.
  • #886, #894, #914: Various documentation typo fixes.
  • #724: Provide NVVMIR_LIBRARY_DIR environment variable to NVCC.
  • #878: Optimize thrust::min/max_element to only use thrust::detail::get_iterator_value for non-numeric types.
  • #899: Make thrust::cuda::experimental::pinned_allocator's comparison operators const.
  • NVBug 2092152: Remove all includes of <cuda.h>.
  • #911: Fix default comparator element type for thrust::merge_by_key.

Acknowledgments

  • Thanks to Andrew Corrigan for contributing fixes for swapping interfaces.
  • Thanks to Francisco Facioni for contributing optimizations for thrust::min/max_element.

Thrust 1.9.2 (CUDA Toolkit 9.2)

16 May 10:06
Compare
Choose a tag to compare

Thrust 1.9.2 brings a variety of performance enhancements, bug fixes and test improvements. CUB 1.7.5 was integrated, enhancing the performance of thrust::sort on small data types and thrust::reduce. Changes were applied to complex to optimize memory access. Thrust now compiles with compiler warnings enabled and treated as errors. Additionally, the unit test suite and framework was enhanced to increase coverage.

Breaking Changes

  • The fallback_allocator example was removed, as it was buggy and difficult to support.

New Features

  • <thrust/detail/alignment.h>, utilities for memory alignment:
    • thrust::aligned_reinterpret_cast.
    • thrust::aligned_storage_size, which computes the amount of storage needed for an object of a particular size and alignment.
    • thrust::alignment_of, a C++03 implementation of C++11's std::alignment_of.
    • thrust::aligned_storage, a C++03 implementation of C++11's std::aligned_storage.
    • thrust::max_align_t, a C++03 implementation of C++11's std::max_align_t.

Bug Fixes

  • NVBug 200385527, NVBug 200385119, NVBug 200385113, NVBug 200349350, NVBug 2058778: Various compiler warning issues.
  • NVBug 200355591: thrust::reduce performance issues.
  • NVBug 2053727: Fixed an ADL bug that caused user-supplied allocate to be overlooked but deallocate to be called with GCC <= 4.3.
  • NVBug 1777043: Fixed thrust::complex to work with thrust::sequence.

Thrust 1.9.1-2 (CUDA Toolkit 9.1)

18 May 18:12
Compare
Choose a tag to compare

Thrust 1.9.1 integrates version 1.7.4 of CUB and introduces a new CUDA backend for thrust::reduce based on CUB.

Bug Fixes

  • NVBug 1965743: Remove unnecessary static qualifiers.
  • NVBug 1940974: Fix regression causing a compilation error when using thrust::merge_by_key with thrust::constant_iterators.
  • NVBug 1904217: Allow callables that take non-const refs to be used with thrust::reduce and thrust::*_scan.

Thrust 1.9.0-5 (CUDA Toolkit 9.0)

18 May 18:12
Compare
Choose a tag to compare

Thrust 1.9.0 replaces the original CUDA backend (bulk) with a new one written using CUB, a high performance CUDA collectives library. This brings a substantial performance improvement to the CUDA backend across the board.

Breaking Changes

  • Any code depending on CUDA backend implementation details will likely be broken.

New Features

  • New CUDA backend based on CUB which delivers substantially higher performance.
  • thrust::transform_output_iterator, a fancy iterator that applies a function to the output before storing the result.

New Examples

  • transform_output_iterator demonstrates use of the new fancy iterator thrust::transform_output_iterator.

Other Enhancements

  • When C++11 is enabled, functors do not have to inherit from thrust::(unary|binary)_function anymore to be used with thrust::transform_iterator.
  • Added C++11 only move constructors and move assignment operators for thrust::detail::vector_base-based classes, e.g. thrust::host_vector, thrust::device_vector, and friends.

Bug Fixes

  • sin(thrust::complex<double>) no longer has precision loss to float.

Acknowledgments

  • Thanks to Manuel Schiller for contributing a C++11 based enhancement regarding the deduction of functor return types, improving the performance of thrust::unique and implementing thrust::transform_output_iterator.
  • Thanks to Thibault Notargiacomo for the implementation of move semantics for the thrust::vector_base-based classes.
  • Thanks to Duane Merrill for developing CUB and helping to integrate it into Thrust's backend.