Skip to content

Commit

Permalink
Documentation sync for 2021.6 (#446)
Browse files Browse the repository at this point in the history
* Adding link to Find More table (#406)

* Updated for Open Source Rules

I updated a number of files and deleted the Intel specific Notices/Disclaimers file.

Signed-off-by: Dylan Benito <[email protected]>

* resolving merge conflicts

(cherry picked from commit 5926596)

* Adding Visual Studio 2022 Support (#410)

Updated tested_standard_cpp_api with VS22 support.

Signed-off-by: Dylan Benito <[email protected]>
(cherry picked from commit 327f262)

* Update release notes and library guide for oneDPL 2021.6 release (#412)

* Update release notes for oneDPL 2021.6 release

* Align format

* Moved several issues to Library Guide

* Fix format

* Remove note about hangs

Co-authored-by: Dmitriy Sobolev <[email protected]>

* Attempt to fix cross-page link

* Attempt to fix a link

* Address review feedback

* Add reduce_by_segment to the list of range based api

* Address review feedback

* Fix format issue

* Remove redundant parentheses

Co-authored-by: Dmitriy Sobolev <[email protected]>

* Note for device USM allocations

Signed-off-by: Sobolev, Dmitriy <[email protected]>

* Improve note for device USM allocations

Signed-off-by: Sobolev, Dmitriy <[email protected]>

* Address review feedback

* Fix links

* Fix link

* Fix link one more time

* Add note about C++17

* Address review feedback

* Add information about OpenMP backend to documentation (#421)

* Add information about OpenMP backend to documentation

* Fix different typos

Co-authored-by: Valentina Kats <[email protected]>

* Rewrite the documentation part with backends

* More review suggestions applied

* Add rendering compiler options as code

* Change macros page to refer to par and par_unseq twice

* More review comments applied

* Apply suggestions for backend macros

Co-authored-by: Valentina Kats <[email protected]>

* Add note about calling the API

Co-authored-by: Dmitriy Sobolev <[email protected]>
Co-authored-by: Ruslan Arutyunyan <[email protected]>
(cherry picked from commit 6fc4ffc)

* Update CHANGES and align mentions of oneDPL Guide in Release Notes (#445)

* Aligned mentions of the Library Guide

* Fix a typo

* Update CHANGES with 2021.6 changes

* Address review feedback

* Fix typos

(cherry picked from commit 2fc2879)

Co-authored-by: Dylan <[email protected]>
  • Loading branch information
ValentinaKats and dcbenito authored Dec 15, 2021
1 parent bf208d3 commit 4eb72e3
Show file tree
Hide file tree
Showing 11 changed files with 186 additions and 44 deletions.
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ your change directly to the repository:

# Coding Conventions

clang-format is required, except the [test folder](https://github.com/oneapi-src/oneDPL/tree/main/test).
Running clang-format is required, except in the [test folder](https://github.com/oneapi-src/oneDPL/tree/main/test).

# License

Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ You can also view the [Security Policy](SECURITY.md).
See [CONTRIBUTING.md](https://github.com/oneapi-src/oneDPL/blob/release_oneDPL/CONTRIBUTING.md) for details.

## Documentation

See the full documentation set for [oneDPL](https://oneapi-src.github.io/oneDPL).

## Samples
Expand All @@ -39,4 +40,4 @@ Please report issues and suggestions via [GitHub issues](https://github.com/onea
------------------------------------------------------------------------
Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.

\* Other names and brands may be claimed as the property of others.
\* Other names and brands may be claimed as the property of others.
35 changes: 34 additions & 1 deletion documentation/CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,36 @@ Overview

The list of the most significant changes made over time in oneDPL.

New in 2021.6
=============

New Features
------------
- Added a new implementation for ``par`` and ``par_unseq`` execution policies based on OpenMP* 4.5 pragmas.
It can be enabled with the ``ONEDPL_USE_OPENMP_BACKEND`` macro.
For more details, see `Macros`_ page in oneDPL Guide.
- Added the range-based version of the ``reduce_by_segment`` algorithm and improved performance of
the iterator-based ``reduce_by_segment`` APIs.
Please note that the use of the ``reduce_by_segment`` algorithm requires C++17.
- Added the following algorithms (serial versions) to `Tested Standard C++ API`_: ``for_each_n``, ``copy``,
``copy_backward``, ``copy_if``, ``copy_n``, ``is_permutation``, ``fill``, ``fill_n``, ``move``, ``move_backward``.

Changes affecting backward compatibility
----------------------------------------
- Fixed ``param_type`` API of random number distributions to satisfy C++ standard requirements.
The new definitions of ``param_type`` are not compatible with incorrect definitions in previous library versions.
Recompilation is recommended for all codes that might use ``param_type``.

Fixed Issues
------------
- Fixed hangs and errors when oneDPL is used together with oneAPI Math Kernel Library (oneMKL) in DPC++ programs.
- Fixed possible data races in the following algorithms used with DPC++ execution
policies: ``sort``, ``stable_sort``, ``partial_sort``, ``nth_element``.

Known Issues and Limitations
----------------------------
- No new issues in this release.

New in 2021.5
=============

Expand All @@ -15,7 +45,7 @@ New Features
``geometric_distribution``, ``lognormal_distribution``, ``weibull_distribution``, ``cachy_distribution``, ``extreme_value_distribution``.
- Added the serial-based versions of the following algorithms: ``all_of``, ``any_of``,
``none_of``, ``count``, ``count_if``, ``for_each``, ``find``, ``find_if``, ``find_if_not``.
For the detailed list, please refer to `Tested Standard C++ API Reference`_.
For the detailed list, please refer to `Tested Standard C++ API`_.
- Improved performance of ``search`` and ``find_end`` algorithms on GPU devices.

Fixed Issues
Expand Down Expand Up @@ -514,3 +544,6 @@ Known Issues and Limitations
``std::less`` or ``std::greater``, otherwise Merge sort.
.. _`the oneDPL Specification`: https://spec.oneapi.com/versions/latest/elements/oneDPL/source/index.html
.. _`Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes`: https://software.intel.com/content/www/us/en/develop/articles/intel-oneapi-threading-building-blocks-release-notes.html
.. _`oneDPL Guide`: https://oneapi-src.github.io/oneDPL/index.html
.. _`Tested Standard C++ API`: https://oneapi-src.github.io/oneDPL/api_for_dpcpp_kernels/tested_standard_cpp_api.html#tested-standard-c-api-reference
.. _`Macros`: https://oneapi-src.github.io/oneDPL/macros.html
Original file line number Diff line number Diff line change
Expand Up @@ -367,6 +367,14 @@ libstdc++(GNU) Provided with GCC*-7.5.0, GCC*-9.3
--------------------------------------------- ---------------------------------------------
libc++(LLVM) Provided with Clang*-11.0
--------------------------------------------- ---------------------------------------------
Microsoft Visual C++* (MSVC) Standard Library Provided with Microsoft Visual Studio* 2017,
and Microsoft Visual Studio 2019
Microsoft Visual C++* (MSVC) Standard Library Provided with Microsoft Visual Studio* 2017;
Microsoft Visual Studio 2019; and Microsoft
Visual Studio 2022, version 17.0, preview 4.1.

.. Note::

Support for Microsoft Visual Studio 2017 is
deprecated as of the Intel® oneAPI 2022.1
release, and will be removed in a future
release.
============================================= =============================================
33 changes: 25 additions & 8 deletions documentation/library_guide/macros.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,13 +62,30 @@ Macro Description
Using this macro may have the same effect on the implementation of parallel
algorithms in the C++ standard libraries of GCC and LLVM.
---------------------------------- ------------------------------
``ONEDPL_USE_TBB_BACKEND`` This macro controls the use of |onetbb_long| or
|tbb_long| for parallel policies.
When the macro is set to 0, algorithms with the ``par`` and ``par_unseq`` policies are only
executed by the calling thread. This is recommended for code that should not depend on the
presence of the |onetbb_short| or |tbb_short| library. When the macro is not defined (by default)
or evaluates to a non-zero value,
parallel policies are executed using the |onetbb_short| or |tbb_short| library.
``ONEDPL_USE_TBB_BACKEND`` This macro controls the use of |onetbb_long| or |tbb_long| for parallel
execution policies (``par`` and ``par_unseq``).

When the macro evaluates to a non-zero value, or when it is not defined (by default)
and no other parallel backends are explicitly chosen, algorithms with parallel policies
are executed using the |onetbb_short| or |tbb_short| library.
Setting the macro to 0 disables use of TBB API for parallel execution and is recommended
for code that should not depend on the presence of the |onetbb_short| or |tbb_short| library.

If all parallel backends are disabled by setting respective macros to 0, algorithms
with parallel policies are executed sequentially by the calling thread.
---------------------------------- ------------------------------
``ONEDPL_USE_OPENMP_BACKEND`` This macro controls the use of OpenMP* for parallel execution policies (``par`` and ``par_unseq``).

When the macro evaluates to a non-zero value, algorithms with parallel policies are executed
using OpenMP unless the TBB backend is explicitly enabled (that is, the TBB backend takes
precedence over the OpenMP backend).
When the macro is not defined (by default) and no other parallel backends are chosen,
a dedicated compiler option to enable OpenMP (such as ``-fopenmp``) also enables its use
for algorithms with parallel policies.
Setting the macro to 0 disables use of OpenMP for parallel execution.

If all parallel backends are disabled by setting respective macros to 0, algorithms
with parallel policies are executed sequentially by the calling thread.
---------------------------------- ------------------------------
``ONEDPL_USE_DPCPP_BACKEND`` This macro enables the use of the |dpcpp_short| policies.
When the macro is not defined (by default)
Expand All @@ -84,7 +101,7 @@ Macro Description
without arguments, when ``make_device_policy()``,
``make_fpga_policy()``, are not available.
---------------------------------- ------------------------------
``ONEDPL_ALLOW_DEFERRED_WAITING`` This macro allows waiting for completion of certain algorithms executed with
``ONEDPL_ALLOW_DEFERRED_WAITING`` This macro allows waiting for completion of certain algorithms executed with
|dpcpp_short| policies to be deferred. (Disabled by default.)
---------------------------------- ------------------------------
``ONEDPL_FPGA_DEVICE`` Use this macro to build your code containing |onedpl_short| parallel
Expand Down
27 changes: 13 additions & 14 deletions documentation/library_guide/onedpl_gsg.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@ Usage Examples
`oneAPI GitHub samples repository <https://github.com/oneapi-src/oneAPI-samples/tree/master/Libraries/oneDPL>`_.
Each sample includes a readme with build instructions.

oneapi/dpl/random Usage Example
-------------------------------
\<oneapi/dpl/random\> Header Usage Example
------------------------------------------

This example illustrates |onedpl_short| Random Number Generators (RNGs) usage.
The sample below shows you how to create an RNG engine object (the source of pseudo-randomness),
Expand All @@ -63,22 +63,19 @@ This example performs its computations on your default DPC++ device. You can set
template<int VecSize>
void random_fill(float* usmptr, std::size_t n) {
auto zero = oneapi::dpl::counting_iterator<std::size_t>(0);
std::for_each(oneapi::dpl::execution::dpcpp_default,
zero, zero + n/VecSize,
[usmptr](std::size_t i){
auto offset = i * VecSize;
oneapi::dpl::minstd_rand_vec<VecSize> engine(seed, offset);
oneapi::dpl::uniform_real_distribution<sycl::vec<float, VecSize>> distr;
zero, zero + n/VecSize,
[usmptr](std::size_t i) {
auto offset = i * VecSize;
auto res = distr(engine);
res.store(i, sycl::global_ptr<float>(usmptr));
oneapi::dpl::minstd_rand_vec<VecSize> engine(seed, offset);
oneapi::dpl::uniform_real_distribution<sycl::vec<float, VecSize>> distr;
});
auto res = distr(engine);
res.store(i, sycl::global_ptr<float>(usmptr));
});
}
oneDPL RNG Pi Benchmark Usage Example
Expand Down Expand Up @@ -144,4 +141,6 @@ Find More
* - `Intel® oneAPI DPC++ Library (oneDPL) Release Notes <https://software.intel.com/content/www/us/en/develop/articles/intel-oneapi-dpcpp-library-release-notes.html>`_
- Refer to release notes to learn about new updates in the latest release.
* - `oneDPL Samples <https://github.com/oneapi-src/oneAPI-samples/tree/master/Libraries/oneDPL>`_
- Learn how to use |onedpl_short| with samples.
- Learn how to use |onedpl_short| with samples.
* - `Layers for Yocto* Project <https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-intel-oneapi-iot-linux/top/adding-oneapi-components-to-yocto-project-builds.html>`_
- Add oneAPI components to a Yocto project build using the meta-intel layers.
42 changes: 35 additions & 7 deletions documentation/library_guide/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,19 @@ and use the ``std`` namespace.
Prerequisites
=============

C++11 is the minimal version of the C++ standard that |onedpl_short| requires. That means, any use of |onedpl_short|
requires at least a C++11 compiler. Some APIs of the library may require a higher version of C++.
Since |onedpl_short| 2021.6, C++17 is the minimal supported version of the C++ standard.
That means, any use of |onedpl_short| may require a C++17 compiler.
While some APIs of the library may accidentally work with earlier versions of the C++ standard, it is no more guaranteed.

To call Parallel API with the C++ standard policies, you need to install the following software:

* A C++ compiler with support for OpenMP* 4.0 (or higher) SIMD constructs
* |onetbb_long| or |tbb_long| 2019 and later
* Depending on what parallel backend you want to use install either:

* |onetbb_long| or |tbb_long| 2019 and later
* A C++ compiler with support for OpenMP 4.5 (or higher)

For more information about parallel backends, see :doc:`Execution Policies <parallel_api/execution_policies>`

To use Parallel API with the |dpcpp_short| execution policies, you need to install the following software:

Expand All @@ -57,13 +64,33 @@ does (see the |dpcpp_short| specification and the SYCL specification for details
* Adding buffers to a lambda capture list is not allowed for lambdas passed to an algorithm.
* Passing data types, which are not trivially copyable, is only allowed via USM,
but not via buffers or host-allocated containers.
* The definition of lambda functions used with parallel algorithms should not depend on preprocessor macros
that makes it different for the host and the device. Otherwise, the behavior is undefined.
* When used within DPC++ kernels or transferred to/from a device, a container class can only hold objects
whose type meets DPC++ requirements for use in kernels and for data transfer, respectively.
* Calling the API that throws exception is not allowed within callable objects passed to an algorithm.

Known Limitations
=================

For ``transform_exclusive_scan``, ``transform_inclusive_scan`` algorithms, the result of the unary operation should be
convertible to the type of the initial value if (one is provided), otherwise it is convertible to the type of values
in the processed data sequence: (``std::iterator_traits<IteratorType>::value_type``).
* For ``transform_exclusive_scan``, ``transform_inclusive_scan`` algorithms, the result of the unary operation should be
convertible to the type of the initial value if one is provided, otherwise it is convertible to the type of values
in the processed data sequence: ``std::iterator_traits<IteratorType>::value_type``.
* ``exclusive_scan`` and ``transform_exclusive_scan`` algorithms may provide wrong results with
vector execution policies when building a program with GCC 10 and using ``-O0`` option.
* The use of oneDPL together with the GNU C++ standard library (libstdc++) version 9 or 10 may lead to
compilation errors (caused by oneTBB API changes).
To overcome these issues, include oneDPL header files before the standard C++ header files,
or disable parallel algorithms support in the standard library.
For more information, please see `Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes`_.
* The ``using namespace oneapi;`` directive in a oneDPL program code may result in compilation errors
with some compilers including GCC 7 and earlier. Instead of this directive, explicitly use
``oneapi::dpl`` namespace, or create a namespace alias.
* ``std::array::at`` member function cannot be used in kernels because it may throw an exception;
use ``std::array::operator[]`` instead.
* Due to specifics of Microsoft* Visual C++, some standard floating-point math functions
(including ``std::ldexp``, ``std::frexp``, ``std::sqrt(std::complex<float>)``) require device support
for double precision.

Build Your Code with |onedpl_short|
===================================
Expand All @@ -81,5 +108,6 @@ Below is an example of a command line used to compile code that contains

.. code:: cpp
dpcpp [-fsycl-unnamed-lambda] test.cpp [-ltbb] -o test
dpcpp [-fsycl-unnamed-lambda] test.cpp [-ltbb|-fopenmp] -o test
.. _`Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes`: https://software.intel.com/content/www/us/en/develop/articles/intel-oneapi-threading-building-blocks-release-notes.html
9 changes: 6 additions & 3 deletions documentation/library_guide/parallel_api/buffers_and_usm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Use oneapi::dpl::begin and oneapi::dpl::end Functions
allow you to pass SYCL* buffers to parallel algorithms. These functions accept
a SYCL buffer and return an object of an unspecified type that provides the following API:

* It satisfies ``CopyConstructible`` and ``CopyAssignable`` C++ named requirements and comparable with
* It satisfies ``CopyConstructible`` and ``CopyAssignable`` C++ named requirements and comparable with
``operator==`` and ``operator!=``.
* It gives the following valid expressions: ``a + n``, ``a - n``, and ``a - b``, where ``a`` and ``b``
are objects of the type, and ``n`` is an integer value. The effect of those operations is the same as for the type
Expand Down Expand Up @@ -98,6 +98,9 @@ Alternatively, use ``std::vector`` with a USM allocator. For example:
return 0;
}
When using device USM, such as allocated by ``malloc_device``, manually copy data to this memory
before calling oneDPL algorithms, and copy it back once the algorithms have finished execution.

Use Host-side std::vector
-----------------------------

Expand All @@ -114,8 +117,8 @@ For example:
#include <oneapi/dpl/algorithm>
#include <vector>
int main(){
std::vector<int> v( 1000 );
std::fill(oneapi::dpl::execution::dpcpp_default, v.begin(), v.end(), 42);
std::vector<int> vec( 1000 );
std::fill(oneapi::dpl::execution::dpcpp_default, vec.begin(), vec.end(), 42);
// each element of vec equals to 42
return 0;
}
11 changes: 9 additions & 2 deletions documentation/library_guide/parallel_api/execution_policies.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,13 @@ Execution Policy Value Description
The implementation is based on Parallel STL from the
`LLVM Project <https://github.com/llvm/llvm-project/tree/main/pstl>`_.

|onedpl_short| supports two parallel backends for execution with ``par`` and ``par_unseq`` policies:

#. TBB backend (enabled by default) uses |onetbb_long| or |tbb_long| for parallel execution.

#. OpenMP backend uses OpenMP* pragmas for parallel execution. Visit
:doc:`Macros <../macros>` for the information how to enable the OpenMP backend.

Follow these steps to add Parallel API to your application:

#. Add ``#include <oneapi/dpl/execution>`` to your code.
Expand All @@ -47,8 +54,8 @@ Follow these steps to add Parallel API to your application:
namespace, to a parallel algorithm.
#. Use the C++ Standard Execution Policies:

#. Compile the code with options that enable OpenMP* vectorization pragmas.
#. Link with the |onetbb_long| or |tbb_long| dynamic library for parallelism.
#. Compile the code with options that enable OpenMP parallelism and/or vectorization pragmas.
#. Link with the |onetbb_long| or |tbb_long| dynamic library for TBB-based parallelism.

#. Use the |dpcpp_short| Execution Policies:

Expand Down
5 changes: 5 additions & 0 deletions documentation/library_guide/parallel_api/range_based_api.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
Range-based API Algorithms
##########################
.. Note::

The use of the range-based API requires C++17 and the C++ standard libraries coming with GCC 8.1 (or higher)
or Clang 7 (or higher).

C++20 introduces the Ranges library. C++20 standard splits ranges into two categories: factories and adaptors.
A range factory does not have underlying data. An element is generated on success by an index or by dereferencing an iterator.
Expand Down Expand Up @@ -49,6 +53,7 @@ The following algorithms are available to use with the ranges:
* ``move``
* ``none_of``
* ``reduce``
* ``reduce_by_segment``
* ``remove``
* ``remove_if``
* ``remove_copy``
Expand Down
Loading

0 comments on commit 4eb72e3

Please sign in to comment.