Releases · NVIDIA/thrust

This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

16 May 09:48

brycelelbach

1.5.1

8242bd3

Thrust 1.5.1 (CUDA Toolkit 4.1)

Thrust 1.5.1 is a minor bug fix release.

Bug Fixes

Sorting data referenced by permutation_iterators on CUDA produces invalid results

Assets 2

16 May 09:44

brycelelbach

1.5.0

037b1b7

Thrust 1.5.0

Thrust 1.5.0 provides introduces new programmer productivity and performance enhancements. New functionality for creating anonymous "lambda" functions has been added. A faster host sort provides 2-10x faster performance for sorting arithmetic types on (single-threaded) CPUs. A new OpenMP sort provides 2.5x-3.0x speedup over the host sort using a quad-core CPU. When sorting arithmetic types with the OpenMP backend the combined performance improvement is 5.9x for 32-bit integers and ranges from 3.0x (64-bit types) to 14.2x (8-bit types). A new CUDA reduce_by_key implementation provides 2-3x faster performance.

Breaking Changes

device_ptr no longer unsafely converts to device_ptr without an explicit cast. Use the expression device_pointer_cast(static_cast<int*>(void_ptr.get())) to convert, for example, device_ptr to device_ptr.

New Features

Algorithms:
- Stencil-less thrust::transform_if.
Lambda placeholders

New Examples

lambda

Other Enhancements

Host sort is 2-10x faster for arithmetic types
OMP sort provides speedup over host sort
reduce_by_key is 2-3x faster
reduce_by_key no longer requires O(N) temporary storage
CUDA scan algorithms are 10-40% faster
host_vector and device_vector are now documented
out-of-memory exceptions now provide detailed information from CUDART
improved histogram example
device_reference now has a specialized swap
reduce_by_key and scan algorithms are compatible with discard_iterator

Bug Fixes

#44 allow host_vector to compile when value_type uses __align__
#198 allow adjacent_difference to permit safe in-situ operation
#303 make thrust thread-safe
#313 avoid race conditions in device_vector::insert
#314 avoid unintended adl invocation when dispatching copy
#365 fix merge and set operation failures

Known Issues

None

Acknowledgments

Thanks to Manjunath Kudlur for contributing his Carbon library, from which the lambda functionality is derived.
Thanks to Jean-Francois Bastien for suggesting a fix for #303.

Assets 2

16 May 09:42

brycelelbach

1.4.0

cdb0604

Thrust 1.4.0 (CUDA Toolkit 4.0)

Thrust 1.4.0 is the first release of Thrust to be included in the CUDA Toolkit. Additionally, it brings many feature and performance improvements. New set theoretic algorithms operating on sorted sequences have been added. Additionally, a new fancy iterator allows discarding redundant or otherwise unnecessary output from algorithms, conserving memory storage and bandwidth.

Breaking Changes

Eliminations
- thrust/is_sorted.h
- thrust/utility.h
- thrust/set_intersection.h
- thrust/experimental/cuda/ogl_interop_allocator.h and the functionality therein
- thrust::deprecated::copy_when
- thrust::deprecated::absolute_value
- thrust::deprecated::copy_when
- thrust::deprecated::absolute_value
- thrust::gather and thrust::scatter from host to device and vice versa are no longer supported.
- Operations which modify the elements of a thrust::device_vector are no longer available from source code compiled without nvcc when the device backend is CUDA. Instead, use the idiom from the cpp_interop example.

New Features

Algorithms:
- thrust::copy_n
- thrust::merge
- thrust::set_difference
- thrust::set_symmetric_difference
- thrust::set_union
Types
- thrust::discard_iterator
Device Support:
- Compute Capability 2.1 GPUs.

New Examples

run_length_decoding

Other Enhancements

Compilation warnings are substantially reduced in various contexts.
The compilation time of thrust::sort, thrust::stable_sort, thrust::sort_by_key, and thrust::stable_sort_by_key are substantially reduced.
A fast sort implementation is used when sorting primitive types with thrust::greater.
The performance of thrust::set_intersection is improved.
The performance of thrust::fill is improved on SM 1.x devices.
A code example is now provided in each algorithm's documentation.
thrust::reverse now operates in-place

Bug Fixes

#212: thrust::set_intersection works correctly for large input sizes.
#275: thrust::counting_iterator and thrust::constant_iterator work correctly with OpenMP as the backend when compiling with optimization.
#256: min and max correctly return their first argument as a tie-breaker
#248: NDEBUG is interpreted incorrectly

Known Issues

NVCC may generate code containing warnings when compiling some Thrust algorithms.
When compiling with -arch=sm_1x, some Thrust algorithms may cause NVCC to issue benign pointer advisories.
When compiling with -arch=sm_1x and -G, some Thrust algorithms may fail to execute correctly.
thrust::inclusive_scan, thrust::exclusive_scan, thrust::inclusive_scan_by_key, and thrust::exclusive_scan_by_key are currently incompatible with thrust::discard_iterator.

Acknowledgments

Thanks to David Tarjan for improving the performance of set_intersection.
Thanks to Duane Merrill for continued help with sort.
Thanks to Nathan Whitehead for help with CUDA Toolkit integration.

Assets 2

16 May 09:41

brycelelbach

1.3.0

8b7aac2

Thrust 1.3.0

Thrust 1.3.0 provides support for CUDA Toolkit 3.2 in addition to many feature and performance enhancements. Performance of the sort and sort_by_key algorithms is improved by as much as 3x in certain situations. The performance of stream compaction algorithms, such as copy_if, is improved by as much as 2x. CUDA errors are now converted to runtime exceptions using the system_error interface. Combined with a debug mode, also new in 1.3, runtime errors can be located with greater precision. Lastly, a few header files have been consolidated or renamed for clarity. See the deprecations section below for additional details.

Breaking Changes

Promotions
- thrust::experimental::inclusive_segmented_scan has been renamed thrust::inclusive_scan_by_key and exposes a different interface
- thrust::experimental::exclusive_segmented_scan has been renamed thrust::exclusive_scan_by_key and exposes a different interface
- thrust::experimental::partition_copy has been renamed thrust::partition_copy and exposes a different interface
- thrust::next::gather has been renamed thrust::gather
- thrust::next::gather_if has been renamed thrust::gather_if
- thrust::unique_copy_by_key has been renamed thrust::unique_by_key_copy
Deprecations
- thrust::copy_when has been renamed thrust::deprecated::copy_when
- thrust::absolute_value has been renamed thrust::deprecated::absolute_value
- The header thrust/set_intersection.h is now deprecated; use thrust/set_operations.h instead
- The header thrust/utility.h is now deprecated; use thrust/swap.h instead
- The header thrust/swap_ranges.h is now deprecated; use thrust/swap.h instead
Eliminations
- thrust::deprecated::gather
- thrust::deprecated::gather_if
- thrust/experimental/arch.h and the functions therein
- thrust/sorting/merge_sort.h
- thrust/sorting/radix_sort.h
NVCC 2.3 is no longer supported

New Features

Algorithms:
- thrust::exclusive_scan_by_key
- thrust::find
- thrust::find_if
- thrust::find_if_not
- thrust::inclusive_scan_by_key
- thrust::is_partitioned
- thrust::is_sorted_until
- thrust::mismatch
- thrust::partition_point
- thrust::reverse
- thrust::reverse_copy
- thrust::stable_partition_copy
Types:
- thrust::system_error and related types.
- thrust::experimental::cuda::ogl_interop_allocator.
- thrust::bit_and, thrust::bit_or, and thrust::bit_xor.
Device Support:
- GF104-based GPUs.

New Examples

opengl_interop.cu
repeated_range.cu
simple_moving_average.cu
sparse_vector.cu
strided_range.cu

Other Enhancements

Performance of thrust::sort and thrust::sort_by_key is substantially improved for primitive key types
Performance of thrust::copy_if is substantially improved
Performance of thrust::reduce and related reductions is improved
THRUST_DEBUG mode added
Callers of Thrust functions may detect error conditions by catching thrust::system_error, which derives from std::runtime_error
The number of compiler warnings generated by Thrust has been substantially reduced
Comparison sort now works correctly for input sizes > 32M
min & max usage no longer collides with <windows.h> definitions
Compiling against the OpenMP backend no longer requires nvcc
Performance of device_vector initialized in .cpp files is substantially improved in common cases
Performance of thrust::sort_by_key on the host is substantially improved

Bug Fixes

Debug device code now compiles correctly
thrust::uninitialized_copy and thrust::uninitialized_fill now dispatch constructors on the device rather than the host

Known Issues

#212 set_intersection is known to fail for large input sizes
partition_point is known to fail for 64b types with nvcc 3.2

Acknowledgments

Thanks to Duane Merrill for contributing a fast CUDA radix sort implementation
Thanks to Erich Elsen for contributing an implementation of find_if
Thanks to Andrew Corrigan for contributing changes which allow the OpenMP backend to compile in the absence of nvcc
Thanks to Andrew Corrigan, Cliff Wooley, David Coeurjolly, Janick Martinez Esturo, John Bowers, Maxim Naumov, Michael Garland, and Ryuta Suzuki for bug reports
Thanks to Cliff Woolley for help with testing

Assets 2

16 May 09:32

brycelelbach

1.2.1

42fc4c9

Thrust 1.2.1

Thrust 1.2.1 is a small bug fix release that is compatible with the CUDA Toolkit 3.1 release.

Known Issues

thrust::inclusive_scan and thrust::exclusive_scan may fail with very large types.
MSVC may fail to compile code using both sort and binary search algorithms.
thrust::uninitialized_fill and thrust::uninitialized_copy dispatch constructors on the host rather than the device.
#109: Some algorithms may exhibit poor performance with the OpenMP backend with large numbers (>= 6) of CPU threads.
thrust::default_random_engine::discard is not accelerated with NVCC 2.3
NVCC 3.1 may fail to compile code using types derived from thrust::subtract_with_carry_engine, such as thrust::ranlux24 and thrust::ranlux48.

Assets 2

16 May 09:17

brycelelbach

1.2.0

3c71298

Thrust 1.2.0

Thrust 1.2.0 introduces support for compilation to multicore CPUs and the Ocelot virtual machine, and several new facilities for pseudo-random number generation. New algorithms such as set intersection and segmented reduction have also been added. Lastly, improvements to the robustness of the CUDA backend ensure correctness across a broad set of (uncommon) use cases.

Breaking Changes

thrust::gather's interface was incorrect and has been removed. The old interface is deprecated but will be preserved for Thrust version 1.2 at thrust::deprecated::gather & thrust::deprecated::gather_if. The new interface is provided at thrust::next::gather & thrust::next::gather_if. The new interface will be promoted to thrust:: in Thrust version 1.3. For more details, please refer to this thread.
The thrust::sorting namespace has been deprecated in favor of the top-level sorting functions, such as thrust::sort and thrust::sort_by_key.
Removed support for thrust::equal between host & device sequences.
Removed support for thrust::scatter between host & device sequences.

New Features

Algorithms:
- thrust::reduce_by_key
- thrust::set_intersection
- thrust::unique_copy
- thrust::unique_by_key
- thrust::unique_copy_by_key
Types
Random Number Generation:
- thrust::discard_block_engine
- thrust::default_random_engine
- thrust::linear_congruential_engine
- thrust::linear_feedback_shift_engine
- thrust::subtract_with_carry_engine
- thrust::xor_combine_engine
- thrust::minstd_rand
- thrust::minstd_rand0
- thrust::ranlux24
- thrust::ranlux48
- thrust::ranlux24_base
- thrust::ranlux48_base
- thrust::taus88
- thrust::uniform_int_distribution
- thrust::uniform_real_distribution
- thrust::normal_distribution (experimental)
Function Objects:
- thrust::project1st
- thrust::project2nd
thrust::tie
Fancy Iterators:
- thrust::permutation_iterator
- thrust::reverse_iterator
Vector Functions:
- operator!=
- rbegin
- crbegin
- rend
- crend
- data
- shrink_to_fit
Device Support:
- Multicore CPUs via OpenMP.
- Fermi-class GPUs.
- Ocelot virtual machines.
Support for NVCC 3.0.

New Examples

cpp_integration
histogram
mode
monte_carlo
monte_carlo_disjoint_sequences
padded_grid_reduction
permutation_iterator
row_sum
run_length_encoding
segmented_scan
stream_compaction
summary_statistics
transform_iterator
word_count

Other Enhancements

Integer sorting performance is improved when max is large but (max - min) is
small and when min is negative
Performance of thrust::inclusive_scan and thrust::exclusive_scan is
improved by 20-25% for primitive types.

Bug Fixes

#8 cause a compiler error if the required compiler is not found rather than a mysterious error at link time
#42 device_ptr & device_reference are classes rather than structs, eliminating warnings on certain platforms
#46 gather & scatter handle any space iterators correctly
#51 thrust::experimental::arch functions gracefully handle unrecognized GPUs
#52 avoid collisions with common user macros such as BLOCK_SIZE
#62 provide better documentation for device_reference
#68 allow built-in CUDA vector types to work with device_vector in pure C++ mode
#102 eliminated a race condition in device_vector::erase
various compilation warnings eliminated

Known Issues

inclusive_scan & exclusive_scan may fail with very large types
the Microsoft compiler may fail to compile code using both sort and binary search algorithms
uninitialized_fill & uninitialized_copy dispatch constructors on the host rather than the device
#109 some algorithms may exhibit poor performance with the OpenMP backend with large numbers (>= 6) of CPU threads
default_random_engine::discard is not accelerated with nvcc 2.3

Acknowledgments

Thanks to Gregory Diamos for contributing a CUDA implementation of set_intersection
Thanks to Ryuta Suzuki & Gregory Diamos for rigorously testing Thrust's unit tests and examples against Ocelot
Thanks to Tom Bradley for contributing an implementation of normal_distribution
Thanks to Joseph Rhoads for contributing the example summary_statistics

Assets 2

16 May 09:15

brycelelbach

1.1.1

655664e

Thrust 1.1.1

Thrust 1.1.1 is a small bug fix release that is compatible with the CUDA Toolkit 2.3a release and Mac OSX Snow Leopard.

Assets 2

16 May 09:14

brycelelbach

1.1.0

fde5b70

Thrust 1.1.0

Thrust 1.1.0 introduces fancy iterators, binary search functions, and several specialized reduction functions. Experimental support for segmented scans has also been added.

Breaking Changes

thrust::counting_iterator has been moved into the thrust namespace (previously thrust::experimental).

New Features

Algorithms:
- thrust::copy_if
- thrust::lower_bound
- thrust::upper_bound
- thrust::vectorized lower_bound
- thrust::vectorized upper_bound
- thrust::equal_range
- thrust::binary_search
- thrust::vectorized binary_search
- thrust::all_of
- thrust::any_of
- thrust::none_of
- thrust::minmax_element
- thrust::advance
- thrust::inclusive_segmented_scan (experimental)
- thrust::exclusive_segmented_scan (experimental)
Types:
- thrust::pair
- thrust::tuple
- thrust::device_malloc_allocator
Fancy Iterators:
- thrust::constant_iterator
- thrust::counting_iterator
- thrust::transform_iterator
- thrust::zip_iterator

New Examples

Computing the maximum absolute difference between vectors.
Computing the bounding box of a two-dimensional point set.
Sorting multiple arrays together (lexicographical sorting).
Constructing a summed area table.
Using thrust::zip_iterator to mimic an array of structs.
Using thrust::constant_iterator to increment array values.

Other Enhancements

Added pinned memory allocator (experimental).
Added more methods to host_vector & device_vector (issue #4).
Added variant of remove_if with a stencil argument (issue #29).
Scan and reduce use cudaFuncGetAttributes to determine grid size.
Exceptions are reported when temporary device arrays cannot be allocated.

Bug Fixes

#5: Make vector work for larger data types
#9: stable_partition_copy doesn't respect OutputIterator concept semantics
#10: scans should return OutputIterator
#16: make algorithms work for larger data types
#27: Dispatch radix_sort even when comp=less is explicitly provided

Assets 2

16 May 09:11

brycelelbach

1.0.0

3157456

Thrust 1.0.0

First production release of Thrust.

Breaking Changes

Rename top level namespace komrade to thrust.
Move thrust::partition_copy & thrust::stable_partition_copy into thrust::experimental namespace until we can easily provide the standard interface.
Rename thrust::range to thrust::sequence to avoid collision with Boost.Range.
Rename thrust::copy_if to thrust::copy_when due to semantic differences with C++0x copy_if.

New Features

Add C++0x style cbegin & cend methods to thrust::host_vector and thrust::device_vector.
Add thrust::transform_if function.
Add stencil versions of thrust::replace_if & thrust::replace_copy_if.
Allow counting_iterator to work with thrust::for_each.
Allow types with constructors in comparison thrust::sort and thrust::reduce.

Other Enhancements

thrust::merge_sort and thrust::stable_merge_sort are now 2x to 5x faster when executed on the parallel device.

Bug Fixes

Komrade 6: Workaround an issue where an incremented iterator causes NVCC to crash.
Komrade 7: Fix an issue where const_iterators could not be passed to thrust::transform.

Assets 2

Releases: NVIDIA/thrust

Thrust 1.5.1 (CUDA Toolkit 4.1)

Bug Fixes

Thrust 1.5.0

Breaking Changes

New Features

New Examples

Other Enhancements

Bug Fixes

Known Issues

Acknowledgments

Thrust 1.4.0 (CUDA Toolkit 4.0)

Breaking Changes

New Features

New Examples

Other Enhancements

Bug Fixes

Known Issues

Acknowledgments

Thrust 1.3.0

Breaking Changes

New Features

New Examples

Other Enhancements

Bug Fixes

Known Issues

Thrust 1.2.1

Known Issues

Thrust 1.2.0

Breaking Changes

New Features

New Examples

Other Enhancements

Bug Fixes

Known Issues

Acknowledgments

Thrust 1.1.1

Thrust 1.1.0

Breaking Changes

New Features

New Examples

Other Enhancements

Bug Fixes

Thrust 1.0.0

Breaking Changes

New Features

Other Enhancements

Bug Fixes