Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Thrust 1.3.0

Compare
Choose a tag to compare
@brycelelbach brycelelbach released this 16 May 09:41
· 3513 commits to master since this release

Thrust 1.3.0 provides support for CUDA Toolkit 3.2 in addition to many feature and performance enhancements. Performance of the sort and sort_by_key algorithms is improved by as much as 3x in certain situations. The performance of stream compaction algorithms, such as copy_if, is improved by as much as 2x. CUDA errors are now converted to runtime exceptions using the system_error interface. Combined with a debug mode, also new in 1.3, runtime errors can be located with greater precision. Lastly, a few header files have been consolidated or renamed for clarity. See the deprecations section below for additional details.

Breaking Changes

  • Promotions
    • thrust::experimental::inclusive_segmented_scan has been renamed thrust::inclusive_scan_by_key and exposes a different interface
    • thrust::experimental::exclusive_segmented_scan has been renamed thrust::exclusive_scan_by_key and exposes a different interface
    • thrust::experimental::partition_copy has been renamed thrust::partition_copy and exposes a different interface
    • thrust::next::gather has been renamed thrust::gather
    • thrust::next::gather_if has been renamed thrust::gather_if
    • thrust::unique_copy_by_key has been renamed thrust::unique_by_key_copy
  • Deprecations
    • thrust::copy_when has been renamed thrust::deprecated::copy_when
    • thrust::absolute_value has been renamed thrust::deprecated::absolute_value
    • The header thrust/set_intersection.h is now deprecated; use thrust/set_operations.h instead
    • The header thrust/utility.h is now deprecated; use thrust/swap.h instead
    • The header thrust/swap_ranges.h is now deprecated; use thrust/swap.h instead
  • Eliminations
    • thrust::deprecated::gather
    • thrust::deprecated::gather_if
    • thrust/experimental/arch.h and the functions therein
    • thrust/sorting/merge_sort.h
    • thrust/sorting/radix_sort.h
  • NVCC 2.3 is no longer supported

New Features

  • Algorithms:

    • thrust::exclusive_scan_by_key
    • thrust::find
    • thrust::find_if
    • thrust::find_if_not
    • thrust::inclusive_scan_by_key
    • thrust::is_partitioned
    • thrust::is_sorted_until
    • thrust::mismatch
    • thrust::partition_point
    • thrust::reverse
    • thrust::reverse_copy
    • thrust::stable_partition_copy
  • Types:

    • thrust::system_error and related types.
    • thrust::experimental::cuda::ogl_interop_allocator.
    • thrust::bit_and, thrust::bit_or, and thrust::bit_xor.
  • Device Support:

    • GF104-based GPUs.

New Examples

  • opengl_interop.cu
  • repeated_range.cu
  • simple_moving_average.cu
  • sparse_vector.cu
  • strided_range.cu

Other Enhancements

  • Performance of thrust::sort and thrust::sort_by_key is substantially improved for primitive key types
  • Performance of thrust::copy_if is substantially improved
  • Performance of thrust::reduce and related reductions is improved
  • THRUST_DEBUG mode added
  • Callers of Thrust functions may detect error conditions by catching thrust::system_error, which derives from std::runtime_error
  • The number of compiler warnings generated by Thrust has been substantially reduced
  • Comparison sort now works correctly for input sizes > 32M
  • min & max usage no longer collides with <windows.h> definitions
  • Compiling against the OpenMP backend no longer requires nvcc
  • Performance of device_vector initialized in .cpp files is substantially improved in common cases
  • Performance of thrust::sort_by_key on the host is substantially improved

Bug Fixes

  • Debug device code now compiles correctly
  • thrust::uninitialized_copy and thrust::uninitialized_fill now dispatch constructors on the device rather than the host

Known Issues

  • #212 set_intersection is known to fail for large input sizes
  • partition_point is known to fail for 64b types with nvcc 3.2

Acknowledgments

  • Thanks to Duane Merrill for contributing a fast CUDA radix sort implementation
  • Thanks to Erich Elsen for contributing an implementation of find_if
  • Thanks to Andrew Corrigan for contributing changes which allow the OpenMP backend to compile in the absence of nvcc
  • Thanks to Andrew Corrigan, Cliff Wooley, David Coeurjolly, Janick Martinez Esturo, John Bowers, Maxim Naumov, Michael Garland, and Ryuta Suzuki for bug reports
  • Thanks to Cliff Woolley for help with testing