Skip to content

Commit

Permalink
Release notes for 2022.7.0 (#1862)
Browse files Browse the repository at this point in the history
---------
Co-authored-by: Dmitriy Sobolev <[email protected]>
Co-authored-by: Adam Fidel <[email protected]>
Co-authored-by: Matthew Michel <[email protected]>
Co-authored-by: Alexey Kukanov <[email protected]>
Co-authored-by: Ruslan Arutyunyan <[email protected]>
  • Loading branch information
timmiesmith committed Nov 15, 2024
1 parent d52a56d commit 02e472a
Showing 1 changed file with 101 additions and 0 deletions.
101 changes: 101 additions & 0 deletions documentation/release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,107 @@ The Intel® oneAPI DPC++ Library (oneDPL) accompanies the Intel® oneAPI DPC++/C
and provides high-productivity APIs aimed to minimize programming efforts of C++ developers
creating efficient heterogeneous applications.

New in 2022.7.0
===============

New Features
------------
- Improved performance of the ``adjacent_find``, ``all_of``, ``any_of``, ``copy_if``, ``exclusive_scan``, ``equal``,
``find``, ``find_if``, ``find_end``, ``find_first_of``, ``find_if_not``, ``inclusive_scan``, ``includes``,
``is_heap``, ``is_heap_until``, ``is_partitioned``, ``is_sorted``, ``is_sorted_until``, ``lexicographical_compare``,
``max_element``, ``min_element``, ``minmax_element``, ``mismatch``, ``none_of``, ``partition``, ``partition_copy``,
``reduce``, ``remove``, ``remove_copy``, ``remove_copy_if``, ``remove_if``, ``search``, ``search_n``,
``stable_partition``, ``transform_exclusive_scan``, ``transform_inclusive_scan``, ``unique``, and ``unique_copy``
algorithms with device policies.
- Improved performance of ``sort``, ``stable_sort`` and ``sort_by_key`` algorithms with device policies when using Merge
sort [#fnote1]_.
- Added ``stable_sort_by_key`` algorithm in ``namespace oneapi::dpl``.
- Added parallel range algorithms in ``namespace oneapi::dpl::ranges``: ``all_of``, ``any_of``,
``none_of``, ``for_each``, ``find``, ``find_if``, ``find_if_not``, ``adjacent_find``, ``search``, ``search_n``,
``transform``, ``sort``, ``stable_sort``, ``is_sorted``, ``merge``, ``count``, ``count_if``, ``equal``, ``copy``,
``copy_if``, ``min_element``, ``max_element``. These algorithms operate with C++20 random access ranges
and views while also taking an execution policy similarly to other oneDPL algorithms.
- Added support for operators ==, !=, << and >> for RNG engines and distributions.
- Added experimental support for the Philox RNG engine in ``namespace oneapi::dpl::experimental``.
- Added the ``<oneapi/dpl/version>`` header containing oneDPL version macros and new feature testing macros.

Fixed Issues
------------
- Fixed unused variable and unused type warnings.
- Fixed memory leaks when using ``sort`` and ``stable_sort`` algorithms with the oneTBB backend.
- Fixed a build error for ``oneapi::dpl::begin`` and ``oneapi::dpl::end`` functions used with
the Microsoft* Visual C++ standard library and with C++20.
- Reordered template parameters of the ``histogram`` algorithm to match its function parameter order.
For affected ``histogram`` calls we recommend to remove explicit specification of template parameters
and instead add explicit type conversions of the function arguments as necessary.
- ``gpu::esimd::radix_sort`` and ``gpu::esimd::radix_sort_by_key`` kernel templates now throw ``std::bad_alloc``
if they fail to allocate global memory.
- Fixed a potential hang occurring with ``gpu::esimd::radix_sort`` and
``gpu::esimd::radix_sort_by_key`` kernel templates.
- Fixed documentation for ``sort_by_key`` algorithm, which used to be mistakenly described as stable, despite being
possibly unstable for some execution policies. If stability is required, use ``stable_sort_by_key`` instead.
- Fixed an error when calling ``sort`` with device execution policies on CUDA devices.
- Allow passing C++20 random access iterators to oneDPL algorithms.
- Fixed issues caused by initialization of SYCL queues in the predefined device execution policies.
These policies have been updated to be immutable (``const``) objects.

Known Issues and Limitations
----------------------------
New in This Release
^^^^^^^^^^^^^^^^^^^
- ``histogram`` may provide incorrect results with device policies in a program built with -O0 option.
- Inclusion of ``<oneapi/dpl/dynamic_selection>`` prior to ``<oneapi/dpl/random>`` may result in compilation errors.
Include ``<oneapi/dpl/random>`` first as a workaround.
- Incorrect results may occur when using ``oneapi::dpl::experimental::philox_engine`` with no predefined template
parameters and with `word_size` values other than 64 and 32.
- Incorrect results or a synchronous SYCL exception may be observed with the following algorithms built
with -O0 option and executed on a GPU device: ``exclusive_scan``, ``inclusive_scan``, ``transform_exclusive_scan``,
``transform_inclusive_scan``, ``copy_if``, ``remove``, ``remove_copy``, ``remove_copy_if``, ``remove_if``,
``partition``, ``partition_copy``, ``stable_partition``, ``unique``, ``unique_copy``, and ``sort``.
- The value type of the input sequence should be convertible to the type of the initial element for the following
algorithms with device execution policies: ``transform_inclusive_scan``, ``transform_exclusive_scan``,
``inclusive_scan``, and ``exclusive_scan``.
- The following algorithms with device execution policies may exceed the C++ standard requirements on the number
of applications of user-provided predicates or equality operators: ``copy_if``, ``remove``, ``remove_copy``,
``remove_copy_if``, ``remove_if``, ``partition_copy``, ``unique``, and ``unique_copy``. In all cases,
the predicate or equality operator is applied ``O(n)`` times.
- The ``adjacent_find``, ``all_of``, ``any_of``, ``equal``, ``find``, ``find_if``, ``find_end``, ``find_first_of``,
``find_if_not``, ``includes``, ``is_heap``, ``is_heap_until``, ``is_sorted``, ``is_sorted_until``, ``mismatch``,
``none_of``, ``search``, and ``search_n`` algorithms may cause a segmentation fault when used with a device execution
policy on a CPU device, and built on Linux with Intel® oneAPI DPC++/C++ Compiler 2025.0.0 and -O0 -g compiler options.

Existing Issues
^^^^^^^^^^^^^^^
See oneDPL Guide for other `restrictions and known limitations`_.

- ``histogram`` algorithm requires the output value type to be an integral type no larger than 4 bytes
when used with an FPGA policy.
- Compilation issues may be encountered when passing zip iterators to ``exclusive_scan_by_segment`` on Windows.
- For ``transform_exclusive_scan`` and ``exclusive_scan`` to run in-place (that is, with the same data
used for both input and destination) and with an execution policy of ``unseq`` or ``par_unseq``,
it is required that the provided input and destination iterators are equality comparable.
Furthermore, the equality comparison of the input and destination iterator must evaluate to true.
If these conditions are not met, the result of these algorithm calls is undefined.
- ``sort``, ``stable_sort``, ``sort_by_key``, ``stable_sort_by_key``, ``partial_sort_copy`` algorithms
may work incorrectly or cause a segmentation fault when used a device execution policy on a CPU device,
and built on Linux with Intel® oneAPI DPC++/C++ Compiler and -O0 -g compiler options.
To avoid the issue, pass ``-fsycl-device-code-split=per_kernel`` option to the compiler.
- Incorrect results may be produced by ``exclusive_scan``, ``inclusive_scan``, ``transform_exclusive_scan``,
``transform_inclusive_scan``, ``exclusive_scan_by_segment``, ``inclusive_scan_by_segment``, ``reduce_by_segment``
with ``unseq`` or ``par_unseq`` policy when compiled by Intel® oneAPI DPC++/C++ Compiler
with ``-fiopenmp``, ``-fiopenmp-simd``, ``-qopenmp``, ``-qopenmp-simd`` options on Linux.
To avoid the issue, pass ``-fopenmp`` or ``-fopenmp-simd`` option instead.
- Incorrect results may be produced by ``reduce``, ``reduce_by_segment``, and ``transform_reduce``
with 64-bit data types when compiled by Intel® oneAPI DPC++/C++ Compiler versions 2021.3 and newer
and executed on a GPU device. For a workaround, define the ``ONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTION``
macro to ``1`` before including oneDPL header files.
- ``std::tuple``, ``std::pair`` cannot be used with SYCL buffers to transfer data between host and device.
- ``std::array`` cannot be swapped in DPC++ kernels with ``std::swap`` function or ``swap`` member function
in the Microsoft* Visual C++ standard library.
- The ``oneapi::dpl::experimental::ranges::reverse`` algorithm is not available with ``-fno-sycl-unnamed-lambda`` option.
- STL algorithm functions (such as ``std::for_each``) used in DPC++ kernels do not compile with the debug version of
the Microsoft* Visual C++ standard library.

New in 2022.6.0
===============
News
Expand Down

0 comments on commit 02e472a

Please sign in to comment.