Documentation for rocThrust available at https://rocm.docs.amd.com/projects/rocThrust/en/latest/.
- Merged changes from upstream CCCL/thrust 2.3.2
- Only the NVIDIA backend uses
tuple
andpair
types from libcu++, other backends continue to use the original Thrust implementations and hence do not require libcu++ (CCCL) as a dependency.
- Only the NVIDIA backend uses
- Added the
thrust::hip::par_det
execution policy to enable bitwise reproducibility on algorithms that are not bitwise reproducible by default. - Fix tests failing when compiling with
-D_GLIBCXX_ASSERTIONS=ON
.
- Enabled the upstream (thrust) test suite for execution by default. It can still be disabled by CMake option
-DENABLE_UPSTREAM_TESTS=OFF
.
- Fixed the HIP backend not passing
TestCopyIfNonTrivial
from the upstream (thrust) test suite.
- Merged changes from upstream CCCL/thrust 2.2.0
- Updated the contents of
system/hip
andtest
with the upstream changes tosystem/cuda
andtesting
- Updated the contents of
- Updated internal calls to
rocprim::detail::invoke_result
to use the public APIrocprim::invoke_result
. - Use
rocprim::device_adjacent_difference
foradjacent_difference
API call. - Updated internal use of custom iterator in
thrust::detail::unique_by_key
to use rocPRIM'srocprim::unique_by_key
. - Updated
adjecent_difference
to make use ofrocprim:adjecent_difference
when iterators are comparable and not equal otherwise userocprim:adjacent_difference_inplace
.
- Fixed incorrect implementation of
thrust::optional<T&>::emplace()
.
thrust::reduce_by_key
outputs are not bit-wise reproducible, as run-to-run results for pseudo-associative reduction operators (e.g. floating-point arithmetic operators) are not deterministic on the same device.- Note that currently, rocThrust memory allocation is performed in such a way that most algorithmic API functions cannot be called from within hipGraphs.
- Updated to match upstream Thrust 2.0.1
- NV_IF_TARGET macro from libcu++ for NVIDIA backend and HIP implementation for HIP backend.
- The cmake build system now additionally accepts
GPU_TARGETS
in addition toAMDGPU_TARGETS
for setting the targeted gpu architectures.GPU_TARGETS=all
will compile for all supported architectures.AMDGPU_TARGETS
is only provided for backwards compatibility,GPU_TARGETS
should be preferred. - Removed cub symlink from the root of the repository.
- Removed support for deprecated macros (THRUST_DEVICE_BACKEND and THRUST_HOST_BACKEND).
- Fixed a segmentation fault when binary search / upper bound / lower bound / equal range was invoked with
hip_rocprim::execute_on_stream_base
policy.
- The
THRUST_HAS_CUDART
macro, which is no longer used in Thrust (it's provided only for legacy support) is replaced withNV_IF_TARGET
andTHRUST_RDC_ENABLED
in the NVIDIA backend. The HIP backend doesn't have aTHRUST_RDC_ENABLED
macro, so some branches in Thrust code may be unreachable in the HIP backend.
lower_bound
,upper_bound
, andbinary_search
failed to compile for certain types.- Fixed issue where
transform_iterator
would not compile with__device__
-only operators.
- Updated
docs
directory structure to match the standard of rocm-docs-core. - Removed references to and workarounds for deprecated hcc
- Updates to match upstream Thrust 1.17.2
partition_copy
now usesrocprim::partition_two_way
for increased performance
set_difference
andset_intersection
no longer hang if the number of items is aboveUINT_MAX
(the unit tests forset_difference
andset_intersection
used to fail theTestSetDifferenceWithBigIndexes
)
- Updates to match upstream Thrust 1.16.0
- rocThrust functionality dependent on device malloc is functional (ROCm 5.2 reenabled device malloc); you can now use device launched
thrust::sort
andthrust::sort_by_key
- Packages for tests and benchmark executables on all supported operating systems using CPack
async_copy
,partition
, andstable_sort_by_key
unit tests are failing for HIP on Windows
- Updates to match upstream Thrust 1.15.0
async_copy
,partition
, andstable_sort_by_key
unit tests are failing for HIP on Windows
- Updates to match upstream Thrust 1.13.0
- Updates to match upstream Thrust 1.14.0
- Added async scan
- Scan algorithms:
inclusive_scan
now uses theinput-type
asaccumulator-type
;exclusive_scan
usesinitial-value-type
- This changes the behavior of small-size input types with large-size output types (e.g.
short
input,int
output) and low-res input with high-res output (e.g.float
input,double
output)
- This changes the behavior of small-size input types with large-size output types (e.g.
- Initial HIP on Windows support
- Packaging has changed to a development package (called
rocthrust-dev
for.deb
packages androcthrust-devel
for.rpm
packages). Because rocThrust is a header-only library, there is no runtime package. To aid in the transition, the development package sets theprovides
field torocthrust
, so that existing packages that are dependent on rocThrust can continue to work. Thisprovides
feature is introduced as a deprecated feature because it will be removed in a future ROCm release.
async_copy
,partition
, andstable_sort_by_key
unit tests are failing for HIP on Windows- Mixed-type exclusive scan algorithm is not using the initial value type for the results type
- gfx1030 support
- AddressSanitizer build option
- async_transform unit test failure
- Updates to match upstream Thrust 1.11
- gfx90a support
- gfx803 support re-enabled
- Updates to match upstream Thrust 1.10
- rocThrust now requires CMake version 3.10.2 or greater
- Size zero inputs are now properly handled with newer ROCm builds, which no longer allow zero-size kernel grid/block dimensions
- Warning of unused results
- There are no changes with this release
- Updated to upstream Thrust 1.10.0
- Implemented runtime error for unsupported algorithms and disabled respective tests
- Updated CMake to use downloaded rocPRIM
copy_if
on device test case
- We've disabled ROCm support for device malloc. As a result, rocThrust functionality dependent on
device malloc does not work--avoid using device launched
thrust::sort
andthrust::sort_by_key
. Note that Host launched functionality is not impacted.- A partial enablement of device malloc is possible by setting
HIP_ENABLE_DEVICE_MALLOC
to 1. thrust::sort
andthrust::sort_by_key
may work on certain input sizes but we don't recommended this for production code.
- A partial enablement of device malloc is possible by setting
- Updated to upstream Thrust 1.9.8
- New test cases for device-side algorithms
- Bug for binary search
- Implemented workarounds for
hipStreamDefault
hang
- We've disabled ROCm support for device malloc. As a result, rocThrust functionality dependent on
device malloc does not work--avoid using device launched
thrust::sort
andthrust::sort_by_key
. Note that Host launched functionality is not impacted.- A partial enablement of device malloc is possible by setting
HIP_ENABLE_DEVICE_MALLOC
to 1. thrust::sort
andthrust::sort_by_key
may work on certain input sizes but we don't recommended this for production code.
- A partial enablement of device malloc is possible by setting
- We've disabled ROCm support for device malloc. As a result, rocThrust functionality dependent on
device malloc does not work--avoid using device launched
thrust::sort
andthrust::sort_by_key
. Note that Host launched functionality is not impacted.- A partial enablement of device malloc is possible by setting
HIP_ENABLE_DEVICE_MALLOC
to 1. thrust::sort
andthrust::sort_by_key
may work on certain input sizes but we don't recommended this for production code.
- A partial enablement of device malloc is possible by setting
- Updated to upstream Thrust 1.9.4
- Package dependency has changed to rocPRIM only
- We've disabled ROCm support for device malloc. As a result, rocThrust functionality dependent on
device malloc does not work--avoid using device launched
thrust::sort
andthrust::sort_by_key
. Note that Host launched functionality is not impacted.- A partial enablement of device malloc is possible by setting
HIP_ENABLE_DEVICE_MALLOC
to 1. thrust::sort
andthrust::sort_by_key
may work on certain input sizes but we don't recommended this for production code.
- A partial enablement of device malloc is possible by setting
- We've disabled ROCm support for device malloc. As a result, rocThrust functionality dependent on
device malloc does not work--avoid using device launched
thrust::sort
andthrust::sort_by_key
. Note that Host launched functionality is not impacted.- A partial enablement of device malloc is possible by setting
HIP_ENABLE_DEVICE_MALLOC
to 1. thrust::sort
andthrust::sort_by_key
may work on certain input sizes but we don't recommended this for production code.
- A partial enablement of device malloc is possible by setting
- Improved tests with fixed and random seeds for test data
- CMake searches for rocThrust locally first; if it isn't found, CMake downloads it from GitHub
- HCC build has been deprecated