Documentation sync for 2021.6 (#446)

* Adding link to Find More table (#406) * Updated for Open Source Rules I updated a number of files and deleted the Intel specific Notices/Disclaimers file. Signed-off-by: Dylan Benito <[email protected]> * resolving merge conflicts (cherry picked from commit 5926596) * Adding Visual Studio 2022 Support (#410) Updated tested_standard_cpp_api with VS22 support. Signed-off-by: Dylan Benito <[email protected]> (cherry picked from commit 327f262) * Update release notes and library guide for oneDPL 2021.6 release (#412) * Update release notes for oneDPL 2021.6 release * Align format * Moved several issues to Library Guide * Fix format * Remove note about hangs Co-authored-by: Dmitriy Sobolev <[email protected]> * Attempt to fix cross-page link * Attempt to fix a link * Address review feedback * Add reduce_by_segment to the list of range based api * Address review feedback * Fix format issue * Remove redundant parentheses Co-authored-by: Dmitriy Sobolev <[email protected]> * Note for device USM allocations Signed-off-by: Sobolev, Dmitriy <[email protected]> * Improve note for device USM allocations Signed-off-by: Sobolev, Dmitriy <[email protected]> * Address review feedback * Fix links * Fix link * Fix link one more time * Add note about C++17 * Address review feedback * Add information about OpenMP backend to documentation (#421) * Add information about OpenMP backend to documentation * Fix different typos Co-authored-by: Valentina Kats <[email protected]> * Rewrite the documentation part with backends * More review suggestions applied * Add rendering compiler options as code * Change macros page to refer to par and par_unseq twice * More review comments applied * Apply suggestions for backend macros Co-authored-by: Valentina Kats <[email protected]> * Add note about calling the API Co-authored-by: Dmitriy Sobolev <[email protected]> Co-authored-by: Ruslan Arutyunyan <[email protected]> (cherry picked from commit 6fc4ffc) * Update CHANGES and align mentions of oneDPL Guide in Release Notes (#445) * Aligned mentions of the Library Guide * Fix a typo * Update CHANGES with 2021.6 changes * Address review feedback * Fix typos (cherry picked from commit 2fc2879) Co-authored-by: Dylan <[email protected]>
uxlfoundation · Dec 15, 2021 · 4eb72e3 · 4eb72e3
1 parent bf208d3
commit 4eb72e3
Show file tree

Hide file tree

Showing 11 changed files with 186 additions and 44 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -19,7 +19,7 @@ your change directly to the repository:
 
 # Coding Conventions
 
-clang-format is required, except the [test folder](https://github.com/oneapi-src/oneDPL/tree/main/test).
+Running clang-format is required, except in the [test folder](https://github.com/oneapi-src/oneDPL/tree/main/test).
 
 # License
 

diff --git a/README.md b/README.md
@@ -28,6 +28,7 @@ You can also view the [Security Policy](SECURITY.md).
 See [CONTRIBUTING.md](https://github.com/oneapi-src/oneDPL/blob/release_oneDPL/CONTRIBUTING.md) for details.
 
 ## Documentation
+
 See the full documentation set for [oneDPL](https://oneapi-src.github.io/oneDPL).
 
 ## Samples
@@ -39,4 +40,4 @@ Please report issues and suggestions via [GitHub issues](https://github.com/onea
 ------------------------------------------------------------------------
 Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
 
-\* Other names and brands may be claimed as the property of others.
+\* Other names and brands may be claimed as the property of others.
diff --git a/documentation/CHANGES.rst b/documentation/CHANGES.rst
@@ -6,6 +6,36 @@ Overview
 
 The list of the most significant changes made over time in oneDPL.
 
+New in 2021.6
+=============
+
+New Features
+------------
+- Added a new implementation for ``par`` and ``par_unseq`` execution policies based on OpenMP* 4.5 pragmas.
+  It can be enabled with the ``ONEDPL_USE_OPENMP_BACKEND`` macro.
+  For more details, see `Macros`_ page in oneDPL Guide.
+- Added the range-based version of the ``reduce_by_segment`` algorithm and improved performance of
+  the iterator-based ``reduce_by_segment`` APIs. 
+  Please note that the use of the ``reduce_by_segment`` algorithm requires C++17.
+- Added the following algorithms (serial versions) to `Tested Standard C++ API`_: ``for_each_n``, ``copy``,
+  ``copy_backward``, ``copy_if``, ``copy_n``, ``is_permutation``, ``fill``, ``fill_n``, ``move``, ``move_backward``.
+
+Changes affecting backward compatibility
+----------------------------------------
+- Fixed ``param_type`` API of random number distributions to satisfy C++ standard requirements.
+  The new definitions of ``param_type`` are not compatible with incorrect definitions in previous library versions.
+  Recompilation is recommended for all codes that might use ``param_type``.
+
+Fixed Issues
+------------
+- Fixed hangs and errors when oneDPL is used together with oneAPI Math Kernel Library (oneMKL) in DPC++ programs.
+- Fixed possible data races in the following algorithms used with DPC++ execution
+  policies: ``sort``, ``stable_sort``, ``partial_sort``, ``nth_element``.
+
+Known Issues and Limitations
+----------------------------
+- No new issues in this release.
+
 New in 2021.5
 =============
 
@@ -15,7 +45,7 @@ New Features
   ``geometric_distribution``, ``lognormal_distribution``, ``weibull_distribution``, ``cachy_distribution``, ``extreme_value_distribution``.
 - Added the serial-based versions of the following algorithms: ``all_of``, ``any_of``, 
   ``none_of``, ``count``, ``count_if``, ``for_each``, ``find``, ``find_if``, ``find_if_not``.
-  For the detailed list, please refer to `Tested Standard C++ API Reference`_. 
+  For the detailed list, please refer to `Tested Standard C++ API`_. 
 - Improved performance of ``search`` and ``find_end`` algorithms on GPU devices.
 
 Fixed Issues
@@ -514,3 +544,6 @@ Known Issues and Limitations
    ``std::less`` or ``std::greater``, otherwise Merge sort.
 .. _`the oneDPL Specification`: https://spec.oneapi.com/versions/latest/elements/oneDPL/source/index.html
 .. _`Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes`: https://software.intel.com/content/www/us/en/develop/articles/intel-oneapi-threading-building-blocks-release-notes.html
+.. _`oneDPL Guide`: https://oneapi-src.github.io/oneDPL/index.html
+.. _`Tested Standard C++ API`: https://oneapi-src.github.io/oneDPL/api_for_dpcpp_kernels/tested_standard_cpp_api.html#tested-standard-c-api-reference
+.. _`Macros`: https://oneapi-src.github.io/oneDPL/macros.html
diff --git a/documentation/library_guide/api_for_dpcpp_kernels/tested_standard_cpp_api.rst b/documentation/library_guide/api_for_dpcpp_kernels/tested_standard_cpp_api.rst
@@ -367,6 +367,14 @@ libstdc++(GNU)                                Provided with GCC*-7.5.0, GCC*-9.3
 --------------------------------------------- ---------------------------------------------
 libc++(LLVM)                                  Provided with Clang*-11.0
 --------------------------------------------- ---------------------------------------------
-Microsoft Visual C++* (MSVC) Standard Library Provided with Microsoft Visual Studio* 2017,
-                                              and Microsoft Visual Studio 2019
+Microsoft Visual C++* (MSVC) Standard Library Provided with Microsoft Visual Studio* 2017;
+                                              Microsoft Visual Studio 2019; and Microsoft 
+                                              Visual Studio 2022, version 17.0, preview 4.1.
+
+                                              .. Note::
+
+                                                 Support for Microsoft Visual Studio 2017 is
+                                                 deprecated as of the Intel® oneAPI 2022.1
+                                                 release, and will be removed in a future
+                                                 release.
 ============================================= =============================================
diff --git a/documentation/library_guide/macros.rst b/documentation/library_guide/macros.rst
@@ -62,13 +62,30 @@ Macro                              Description
                                    Using this macro may have the same effect on the implementation of parallel
                                    algorithms in the C++ standard libraries of GCC and LLVM.
 ---------------------------------- ------------------------------
-``ONEDPL_USE_TBB_BACKEND``         This macro controls the use of |onetbb_long| or
-                                   |tbb_long| for parallel policies.
-                                   When the macro is set to 0, algorithms with the ``par`` and ``par_unseq`` policies are only
-                                   executed by the calling thread. This is recommended for code that should not depend on the
-                                   presence of the |onetbb_short| or |tbb_short| library. When the macro is not defined (by default)
-                                   or evaluates to a non-zero value,
-                                   parallel policies are executed using the |onetbb_short| or |tbb_short| library.
+``ONEDPL_USE_TBB_BACKEND``         This macro controls the use of |onetbb_long| or |tbb_long| for parallel
+                                   execution policies (``par`` and ``par_unseq``).
+
+                                   When the macro evaluates to a non-zero value, or when it is not defined (by default)
+                                   and no other parallel backends are explicitly chosen, algorithms with parallel policies
+                                   are executed using the |onetbb_short| or |tbb_short| library.
+                                   Setting the macro to 0 disables use of TBB API for parallel execution and is recommended
+                                   for code that should not depend on the presence of the |onetbb_short| or |tbb_short| library.
+
+                                   If all parallel backends are disabled by setting respective macros to 0, algorithms
+                                   with parallel policies are executed sequentially by the calling thread.
+---------------------------------- ------------------------------
+``ONEDPL_USE_OPENMP_BACKEND``      This macro controls the use of OpenMP* for parallel execution policies (``par`` and ``par_unseq``).
+
+                                   When the macro evaluates to a non-zero value, algorithms with parallel policies are executed
+                                   using OpenMP unless the TBB backend is explicitly enabled (that is, the TBB backend takes
+                                   precedence over the OpenMP backend).
+                                   When the macro is not defined (by default) and no other parallel backends are chosen,
+                                   a dedicated compiler option to enable OpenMP (such as ``-fopenmp``) also enables its use
+                                   for algorithms with parallel policies.
+                                   Setting the macro to 0 disables use of OpenMP for parallel execution.
+
+                                   If all parallel backends are disabled by setting respective macros to 0, algorithms
+                                   with parallel policies are executed sequentially by the calling thread.
 ---------------------------------- ------------------------------
 ``ONEDPL_USE_DPCPP_BACKEND``       This macro enables the use of the |dpcpp_short| policies.
                                    When the macro is not defined (by default)
@@ -84,7 +101,7 @@ Macro                              Description
                                    without arguments, when ``make_device_policy()``,
                                    ``make_fpga_policy()``, are not available.
 ---------------------------------- ------------------------------
-``ONEDPL_ALLOW_DEFERRED_WAITING``  This macro allows waiting for completion of certain algorithms executed with 
+``ONEDPL_ALLOW_DEFERRED_WAITING``  This macro allows waiting for completion of certain algorithms executed with
                                    |dpcpp_short| policies to be deferred. (Disabled by default.)
 ---------------------------------- ------------------------------
 ``ONEDPL_FPGA_DEVICE``             Use this macro to build your code containing |onedpl_short| parallel

diff --git a/documentation/library_guide/onedpl_gsg.rst b/documentation/library_guide/onedpl_gsg.rst
@@ -47,8 +47,8 @@ Usage Examples
 `oneAPI GitHub samples repository <https://github.com/oneapi-src/oneAPI-samples/tree/master/Libraries/oneDPL>`_.
 Each sample includes a readme with build instructions.
 
-oneapi/dpl/random Usage Example
--------------------------------
+\<oneapi/dpl/random\> Header Usage Example
+------------------------------------------
 
 This example illustrates |onedpl_short| Random Number Generators (RNGs) usage.
 The sample below shows you how to create an RNG engine object (the source of pseudo-randomness),
@@ -63,22 +63,19 @@ This example performs its computations on your default DPC++ device. You can set
 
     template<int VecSize>
     void random_fill(float* usmptr, std::size_t n) {
-
         auto zero = oneapi::dpl::counting_iterator<std::size_t>(0);
 
         std::for_each(oneapi::dpl::execution::dpcpp_default,
-    zero, zero + n/VecSize,
-          [usmptr](std::size_t i){
-
-            auto offset = i * VecSize;
-
-            oneapi::dpl::minstd_rand_vec<VecSize> engine(seed, offset);
-            oneapi::dpl::uniform_real_distribution<sycl::vec<float, VecSize>> distr;
+            zero, zero + n/VecSize,
+            [usmptr](std::size_t i) {
+                auto offset = i * VecSize;
 
-            auto res = distr(engine);
-            res.store(i, sycl::global_ptr<float>(usmptr));
+                oneapi::dpl::minstd_rand_vec<VecSize> engine(seed, offset);
+                oneapi::dpl::uniform_real_distribution<sycl::vec<float, VecSize>> distr;
 
-           });
+                auto res = distr(engine);
+                res.store(i, sycl::global_ptr<float>(usmptr));
+            });
     }
 
 oneDPL RNG Pi Benchmark Usage Example
@@ -144,4 +141,6 @@ Find More
    * - `Intel® oneAPI DPC++ Library (oneDPL) Release Notes <https://software.intel.com/content/www/us/en/develop/articles/intel-oneapi-dpcpp-library-release-notes.html>`_
      - Refer to release notes to learn about new updates in the latest release.
    * - `oneDPL Samples <https://github.com/oneapi-src/oneAPI-samples/tree/master/Libraries/oneDPL>`_
-     - Learn how to use |onedpl_short| with samples.
+     - Learn how to use |onedpl_short| with samples.
+   * - `Layers for Yocto* Project <https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-intel-oneapi-iot-linux/top/adding-oneapi-components-to-yocto-project-builds.html>`_
+     - Add oneAPI components to a Yocto project build using the meta-intel layers.
diff --git a/documentation/library_guide/overview.rst b/documentation/library_guide/overview.rst
@@ -37,12 +37,19 @@ and use the ``std`` namespace.
 Prerequisites
 =============
 
-C++11 is the minimal version of the C++ standard that |onedpl_short| requires. That means, any use of |onedpl_short|
-requires at least a C++11 compiler. Some APIs of the library may require a higher version of C++.
+Since |onedpl_short| 2021.6, C++17 is the minimal supported version of the C++ standard.
+That means, any use of |onedpl_short| may require a C++17 compiler.
+While some APIs of the library may accidentally work with earlier versions of the C++ standard, it is no more guaranteed.
+
 To call Parallel API with the C++ standard policies, you need to install the following software:
 
 * A C++ compiler with support for OpenMP* 4.0 (or higher) SIMD constructs
-* |onetbb_long| or |tbb_long| 2019 and later
+* Depending on what parallel backend you want to use install either:
+
+  * |onetbb_long| or |tbb_long| 2019 and later
+  * A C++ compiler with support for OpenMP 4.5 (or higher)
+
+For more information about parallel backends, see :doc:`Execution Policies <parallel_api/execution_policies>`
 
 To use Parallel API with the |dpcpp_short| execution policies, you need to install the following software:
 
@@ -57,13 +64,33 @@ does (see the |dpcpp_short| specification and the SYCL specification for details
 * Adding buffers to a lambda capture list is not allowed for lambdas passed to an algorithm.
 * Passing data types, which are not trivially copyable, is only allowed via USM,
   but not via buffers or host-allocated containers.
+* The definition of lambda functions used with parallel algorithms should not depend on preprocessor macros
+  that makes it different for the host and the device. Otherwise, the behavior is undefined.
+* When used within DPC++ kernels or transferred to/from a device, a container class can only hold objects
+  whose type meets DPC++ requirements for use in kernels and for data transfer, respectively.
+* Calling the API that throws exception is not allowed within callable objects passed to an algorithm.
 
 Known Limitations
 =================
 
-For ``transform_exclusive_scan``, ``transform_inclusive_scan`` algorithms, the result of the unary operation should be
-convertible to the type of the initial value if (one is provided), otherwise it is convertible to the type of values
-in the processed data sequence: (``std::iterator_traits<IteratorType>::value_type``).
+* For ``transform_exclusive_scan``, ``transform_inclusive_scan`` algorithms, the result of the unary operation should be
+  convertible to the type of the initial value if one is provided, otherwise it is convertible to the type of values
+  in the processed data sequence: ``std::iterator_traits<IteratorType>::value_type``.
+* ``exclusive_scan`` and ``transform_exclusive_scan`` algorithms may provide wrong results with
+  vector execution policies when building a program with GCC 10 and using ``-O0`` option.
+* The use of oneDPL together with the GNU C++ standard library (libstdc++) version 9 or 10 may lead to
+  compilation errors (caused by oneTBB API changes). 
+  To overcome these issues, include oneDPL header files before the standard C++ header files,
+  or disable parallel algorithms support in the standard library. 
+  For more information, please see `Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes`_.
+* The ``using namespace oneapi;`` directive in a oneDPL program code may result in compilation errors
+  with some compilers including GCC 7 and earlier. Instead of this directive, explicitly use
+  ``oneapi::dpl`` namespace, or create a namespace alias. 
+* ``std::array::at`` member function cannot be used in kernels because it may throw an exception;
+  use ``std::array::operator[]`` instead.
+* Due to specifics of Microsoft* Visual C++, some standard floating-point math functions
+  (including ``std::ldexp``, ``std::frexp``, ``std::sqrt(std::complex<float>)``) require device support
+  for double precision. 
 
 Build Your Code with |onedpl_short|
 ===================================
@@ -81,5 +108,6 @@ Below is an example of a command line used to compile code that contains
 
 .. code:: cpp
 
-  dpcpp [-fsycl-unnamed-lambda] test.cpp [-ltbb] -o test
+  dpcpp [-fsycl-unnamed-lambda] test.cpp [-ltbb|-fopenmp] -o test
 
+.. _`Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes`: https://software.intel.com/content/www/us/en/develop/articles/intel-oneapi-threading-building-blocks-release-notes.html
diff --git a/documentation/library_guide/parallel_api/buffers_and_usm.rst b/documentation/library_guide/parallel_api/buffers_and_usm.rst
@@ -17,7 +17,7 @@ Use oneapi::dpl::begin and oneapi::dpl::end Functions
 allow you to pass SYCL* buffers to parallel algorithms. These functions accept
 a SYCL buffer and return an object of an unspecified type that provides the following API:
 
-* It satisfies ``CopyConstructible`` and ``CopyAssignable`` C++ named requirements and comparable with 
+* It satisfies ``CopyConstructible`` and ``CopyAssignable`` C++ named requirements and comparable with
   ``operator==`` and ``operator!=``.
 * It gives the following valid expressions: ``a + n``, ``a - n``, and ``a - b``, where ``a`` and ``b``
   are objects of the type, and ``n`` is an integer value. The effect of those operations is the same as for the type
@@ -98,6 +98,9 @@ Alternatively, use ``std::vector`` with a USM allocator. For example:
     return 0;
   }
 
+When using device USM, such as allocated by ``malloc_device``, manually copy data to this memory
+before calling oneDPL algorithms, and copy it back once the algorithms have finished execution.
+
 Use Host-side std::vector
 -----------------------------
 
@@ -114,8 +117,8 @@ For example:
   #include <oneapi/dpl/algorithm>
   #include <vector>
   int main(){
-    std::vector<int> v( 1000 );
-    std::fill(oneapi::dpl::execution::dpcpp_default, v.begin(), v.end(), 42);
+    std::vector<int> vec( 1000 );
+    std::fill(oneapi::dpl::execution::dpcpp_default, vec.begin(), vec.end(), 42);
     // each element of vec equals to 42
     return 0;
   }
diff --git a/documentation/library_guide/parallel_api/execution_policies.rst b/documentation/library_guide/parallel_api/execution_policies.rst
@@ -30,6 +30,13 @@ Execution Policy Value            Description
 The implementation is based on Parallel STL from the
 `LLVM Project <https://github.com/llvm/llvm-project/tree/main/pstl>`_.
 
+|onedpl_short| supports two parallel backends for execution with ``par`` and ``par_unseq`` policies:
+
+#. TBB backend (enabled by default) uses |onetbb_long| or |tbb_long| for parallel execution.
+
+#. OpenMP backend uses OpenMP* pragmas for parallel execution. Visit
+   :doc:`Macros <../macros>` for the information how to enable the OpenMP backend.
+
 Follow these steps to add Parallel API to your application:
 
 #. Add ``#include <oneapi/dpl/execution>`` to your code.
@@ -47,8 +54,8 @@ Follow these steps to add Parallel API to your application:
    namespace, to a parallel algorithm.
 #. Use the C++ Standard Execution Policies:
 
-   #. Compile the code with options that enable OpenMP* vectorization pragmas.
-   #. Link with the |onetbb_long| or |tbb_long| dynamic library for parallelism.
+   #. Compile the code with options that enable OpenMP parallelism and/or vectorization pragmas.
+   #. Link with the |onetbb_long| or |tbb_long| dynamic library for TBB-based parallelism.
 
 #. Use the |dpcpp_short| Execution Policies:
 

diff --git a/documentation/library_guide/parallel_api/range_based_api.rst b/documentation/library_guide/parallel_api/range_based_api.rst
@@ -1,5 +1,9 @@
 Range-based API Algorithms
 ##########################
+.. Note::
+
+  The use of the range-based API requires C++17 and the C++ standard libraries coming with GCC 8.1 (or higher)
+  or Clang 7 (or higher).
 
 C++20 introduces the Ranges library. C++20 standard splits ranges into two categories: factories and adaptors.
 A range factory does not have underlying data. An element is generated on success by an index or by dereferencing an iterator.
@@ -49,6 +53,7 @@ The following algorithms are available to use with the ranges:
 * ``move``
 * ``none_of``
 * ``reduce``
+* ``reduce_by_segment``
 * ``remove``
 * ``remove_if``
 * ``remove_copy``