Merge branch 'microsoft:gh-pages' into gh-pages

MaanavD · Oct 9, 2024 · 1275346 · 1275346
2 parents c7e55b5 + c27ec3e
commit 1275346
Show file tree

Hide file tree

Showing 62 changed files with 1,293 additions and 600 deletions.
diff --git a/docs/build/eps.md b/docs/build/eps.md
@@ -260,13 +260,13 @@ See more information on the OpenVINO™ Execution Provider [here](../execution-p
 ### Prerequisites
 {: .no_toc }
 
-1. Install the OpenVINO™ offline/online installer from Intel<sup>®</sup> Distribution of OpenVINO™<sup>TM</sup> Toolkit **Release 2024.1** for the appropriate OS and target hardware:
-   * [Windows - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?VERSION=v_2023_1_0&OP_SYSTEM=WINDOWS&DISTRIBUTION=ARCHIVE).
-   * [Linux - CPU, GPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?VERSION=v_2023_1_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE)
+1. Install the OpenVINO™ offline/online installer from Intel<sup>®</sup> Distribution of OpenVINO™<sup>TM</sup> Toolkit **Release 2024.3** for the appropriate OS and target hardware:
+   * [Windows - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=WINDOWS&DISTRIBUTION=ARCHIVE).
+   * [Linux - CPU, GPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE)
 
    Follow [documentation](https://docs.openvino.ai/2024/home.html) for detailed instructions.
 
-  *2024.1 is the current recommended OpenVINO™ version. [OpenVINO™ 2023.1](https://docs.openvino.ai/archive/2023.1/home.html) is minimal OpenVINO™ version requirement.*
+  *2024.3 is the current recommended OpenVINO™ version. [OpenVINO™ 2023.3](https://docs.openvino.ai/2023.3/home.html) is minimal OpenVINO™ version requirement.*
 
 2. Configure the target hardware with specific follow on instructions:
    * To configure Intel<sup>®</sup> Processor Graphics(GPU) please follow these instructions: [Windows](https://docs.openvino.ai/latest/openvino_docs_install_guides_configurations_for_intel_gpu.html#gpu-guide-windows), [Linux](https://docs.openvino.ai/latest/openvino_docs_install_guides_configurations_for_intel_gpu.html#linux)
@@ -396,75 +396,24 @@ The DirectML execution provider supports building for both x64 and x86 architect
 
 ---
 
-## ARM Compute Library
+## Arm Compute Library
 See more information on the ACL Execution Provider [here](../execution-providers/community-maintained/ACL-ExecutionProvider.md).
 
-### Prerequisites
-{: .no_toc }
-
-* Supported backend: i.MX8QM Armv8 CPUs
-* Supported BSP: i.MX8QM BSP
-  * Install i.MX8QM BSP: `source fsl-imx-xwayland-glibc-x86_64-fsl-image-qt5-aarch64-toolchain-4*.sh`
-* Set up the build environment
-```
-source /opt/fsl-imx-xwayland/4.*/environment-setup-aarch64-poky-linux
-alias cmake="/usr/bin/cmake -DCMAKE_TOOLCHAIN_FILE=$OECORE_NATIVE_SYSROOT/usr/share/cmake/OEToolchainConfig.cmake"
-```
-* See [Build ARM](inferencing.md#arm) below for information on building for ARM devices
-
 ### Build Instructions
 {: .no_toc }
 
-1. Configure ONNX Runtime with ACL support:
-```
-cmake ../onnxruntime-arm-upstream/cmake -DONNX_CUSTOM_PROTOC_EXECUTABLE=/usr/bin/protoc -Donnxruntime_RUN_ONNX_TESTS=OFF -Donnxruntime_GENERATE_TEST_REPORTS=ON -Donnxruntime_DEV_MODE=ON -DPYTHON_EXECUTABLE=/usr/bin/python3 -Donnxruntime_USE_CUDA=OFF -Donnxruntime_USE_NSYNC=OFF -Donnxruntime_CUDNN_HOME= -Donnxruntime_USE_JEMALLOC=OFF -Donnxruntime_ENABLE_PYTHON=OFF -Donnxruntime_BUILD_CSHARP=OFF -Donnxruntime_BUILD_SHARED_LIB=ON -Donnxruntime_USE_EIGEN_FOR_BLAS=ON -Donnxruntime_USE_OPENBLAS=OFF -Donnxruntime_USE_ACL=ON -Donnxruntime_USE_DNNL=OFF -Donnxruntime_USE_MKLML=OFF -Donnxruntime_USE_OPENMP=ON -Donnxruntime_USE_TVM=OFF -Donnxruntime_USE_LLVM=OFF -Donnxruntime_ENABLE_MICROSOFT_INTERNAL=OFF -Donnxruntime_USE_BRAINSLICE=OFF -Donnxruntime_USE_EIGEN_THREADPOOL=OFF -Donnxruntime_BUILD_UNIT_TESTS=ON -DCMAKE_BUILD_TYPE=RelWithDebInfo
-```
-The ```-Donnxruntime_USE_ACL=ON``` option will use, by default, the 19.05 version of the Arm Compute Library. To set the right version you can use:
-```-Donnxruntime_USE_ACL_1902=ON```, ```-Donnxruntime_USE_ACL_1905=ON```, ```-Donnxruntime_USE_ACL_1908=ON``` or ```-Donnxruntime_USE_ACL_2002=ON```;
-
-To use a library outside the normal environment you can set a custom path by using ```-Donnxruntime_ACL_HOME``` and ```-Donnxruntime_ACL_LIBS``` tags that defines the path to the ComputeLibrary directory and the build directory respectively.
+You must first build Arm Compute Library 24.07 for your platform as described in the [documentation](https://github.com/ARM-software/ComputeLibrary).
+See [here](inferencing.md#arm) for information on building for Arm®-based devices.
 
-```-Donnxruntime_ACL_HOME=/path/to/ComputeLibrary```, ```-Donnxruntime_ACL_LIBS=/path/to/build```
+Add the following options to `build.sh` to enable the ACL Execution Provider:
 
-
-2. Build ONNX Runtime library, test and performance application:
-```
-make -j 6
-```
-
-3. Deploy ONNX runtime on the i.MX 8QM board
 ```
-libonnxruntime.so.0.5.0
-onnxruntime_perf_test
-onnxruntime_test_all
+--use_acl --acl_home=/path/to/ComputeLibrary --acl_libs=/path/to/ComputeLibrary/build
 ```
 
-### Native Build Instructions 
-{: .no_toc }
-
-*Validated on Jetson Nano and Jetson Xavier*
-
-1. Build ACL Library (skip if already built)
-
-    ```bash
-    cd ~
-    git clone -b v20.02 https://github.com/Arm-software/ComputeLibrary.git
-    cd ComputeLibrary
-    sudo apt-get install -y scons g++-arm-linux-gnueabihf
-    scons -j8 arch=arm64-v8a  Werror=1 debug=0 asserts=0 neon=1 opencl=1 examples=1 build=native
-    ```
-
-1. Cmake is needed to build ONNX Runtime. Because the minimum required version is 3.13,
-   it is necessary to build CMake from source. Download Unix/Linux sources from https://cmake.org/download/
-   and follow https://cmake.org/install/ to build from source. Version 3.17.5 and 3.18.4 have been tested on Jetson.
-
-1. Build onnxruntime with --use_acl flag with one of the supported ACL version flags. (ACL_1902 | ACL_1905 | ACL_1908 | ACL_2002)
-
----
-
-## ArmNN
+## Arm NN
 
-See more information on the ArmNN Execution Provider [here](../execution-providers/community-maintained/ArmNN-ExecutionProvider.md).
+See more information on the Arm NN Execution Provider [here](../execution-providers/community-maintained/ArmNN-ExecutionProvider.md).
 
 ### Prerequisites
 {: .no_toc }
@@ -480,7 +429,7 @@ source /opt/fsl-imx-xwayland/4.*/environment-setup-aarch64-poky-linux
 alias cmake="/usr/bin/cmake -DCMAKE_TOOLCHAIN_FILE=$OECORE_NATIVE_SYSROOT/usr/share/cmake/OEToolchainConfig.cmake"
 ```
 
-* See [Build ARM](inferencing.md#arm) below for information on building for ARM devices
+* See [here](inferencing.md#arm) for information on building for Arm-based devices
 
 ### Build Instructions
 {: .no_toc }
@@ -490,20 +439,20 @@ alias cmake="/usr/bin/cmake -DCMAKE_TOOLCHAIN_FILE=$OECORE_NATIVE_SYSROOT/usr/sh
 ./build.sh --use_armnn
 ```
 
-The Relu operator is set by default to use the CPU execution provider for better performance. To use the ArmNN implementation build with --armnn_relu flag
+The Relu operator is set by default to use the CPU execution provider for better performance. To use the Arm NN implementation build with --armnn_relu flag
 
 ```bash
 ./build.sh --use_armnn --armnn_relu
 ```
 
-The Batch Normalization operator is set by default to use the CPU execution provider. To use the ArmNN implementation build with --armnn_bn flag
+The Batch Normalization operator is set by default to use the CPU execution provider. To use the Arm NN implementation build with --armnn_bn flag
 
 ```bash
 ./build.sh --use_armnn --armnn_bn
 ```
 
-To use a library outside the normal environment you can set a custom path by providing the --armnn_home and --armnn_libs parameters to define the path to the ArmNN home directory and build directory respectively. 
-The ARM Compute Library home directory and build directory must also be available, and can be specified if needed using --acl_home and --acl_libs respectively.
+To use a library outside the normal environment you can set a custom path by providing the --armnn_home and --armnn_libs parameters to define the path to the Arm NN home directory and build directory respectively. 
+The Arm Compute Library home directory and build directory must also be available, and can be specified if needed using --acl_home and --acl_libs respectively.
 
 ```bash
 ./build.sh --use_armnn --armnn_home /path/to/armnn --armnn_libs /path/to/armnn/build  --acl_home /path/to/ComputeLibrary --acl_libs /path/to/acl/build
@@ -519,7 +468,7 @@ See more information on the RKNPU Execution Provider [here](../execution-provide
 
 
 * Supported platform: RK1808 Linux
-* See [Build ARM](inferencing.md#arm) below for information on building for ARM devices
+* See [here](inferencing.md#arm) for information on building for Arm-based devices
 * Use gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu instead of gcc-linaro-6.3.1-2017.05-x86_64_arm-linux-gnueabihf, and modify CMAKE_CXX_COMPILER & CMAKE_C_COMPILER in tool.cmake:
   
 ```

diff --git a/docs/build/inferencing.md b/docs/build/inferencing.md
@@ -88,7 +88,8 @@ If you would like to use [Xcode](https://developer.apple.com/xcode/) to build th
 
 Without this flag, the cmake build generator will be Unix makefile by default.
 
-Today, Mac computers are either Intel-Based or Apple silicon(aka. ARM) based. By default, ONNX Runtime's build script only generate bits for the CPU ARCH that the build machine has. If you want to do cross-compiling: generate ARM binaries on a Intel-Based Mac computer, or generate x86 binaries on a Mac ARM computer, you can set the "CMAKE_OSX_ARCHITECTURES" cmake variable. For example:
+Today, Mac computers are either Intel-Based or Apple silicon-based. By default, ONNX Runtime's build script only generate bits for the CPU ARCH that the build machine has. If you want to do cross-compiling: generate arm64 binaries on a Intel-Based Mac computer, or generate x86 binaries on a Mac
+system with Apple silicon, you can set the "CMAKE_OSX_ARCHITECTURES" cmake variable. For example:
 
 Build for Intel CPUs:
 ```bash
@@ -107,6 +108,61 @@ The last command will generate a fat-binary for both CPU architectures.
 
 Note: unit tests will be skipped due to the incompatible CPU instruction set when doing cross-compiling.
 
+#### AIX
+In AIX, you can build ONNX Runtime for 64bit using
+
+* IBM Open XL compiler tool chain.
+  Minimum required AIX OS version is 7.2. You need to have 17.1.2 compiler PTF5 (17.1.2.5) version.
+* GNU GCC compiler tool chain.
+  Minimum required AIX OS version is 7.3. GCC version 10.3+ is required.
+
+For IBM Open XL, export below environment settings.
+```bash
+ulimit -m unlimited
+ulimit -d unlimited
+ulimit -n 2000
+ulimit -f unlimited
+export OBJECT_MODE=64
+export BUILD_TYPE="Release"
+export CC="/opt/IBM/openxlC/17.1.2/bin/ibm-clang" 
+export CXX="/opt/IBM/openxlC/17.1.2/bin/ibm-clang++_r"
+export CFLAGS="-pthread -m64 -D_ALL_SOURCE -mcmodel=large -Wno-deprecate-lax-vec-conv-all  -Wno-unused-but-set-variable -Wno-unused-command-line-argument -maltivec -mvsx  -Wno-unused-variable -Wno-unused-parameter -Wno-sign-compare"
+export CXXFLAGS="-pthread -m64 -D_ALL_SOURCE -mcmodel=large -Wno-deprecate-lax-vec-conv-all -Wno-unused-but-set-variable -Wno-unused-command-line-argument -maltivec -mvsx  -Wno-unused-variable -Wno-unused-parameter -Wno-sign-compare"
+export LDFLAGS="-L$PWD/build/Linux/$BUILD_TYPE/ -lpthread"
+export LIBPATH="$PWD/build/Linux/$BUILD_TYPE/"
+```
+For GCC, export below environment settings.
+```bash
+ulimit -m unlimited
+ulimit -d unlimited
+ulimit -n 2000
+ulimit -f unlimited
+export OBJECT_MODE=64
+export BUILD_TYPE="Release"
+export CC="gcc" 
+export CXX="g++"
+export CFLAGS="-maix64 -pthread -DFLATBUFFERS_LOCALE_INDEPENDENT=0 -maltivec -mvsx   -Wno-unused-function -Wno-unused-variable -Wno-unused-parameter -Wno-sign-compare -fno-extern-tls-init -Wl,-berok "
+export CXXFLAGS="-maix64 -pthread -DFLATBUFFERS_LOCALE_INDEPENDENT=0 -maltivec -mvsx  -Wno-unused-function -Wno-unused-variable -Wno-unused-parameter -Wno-sign-compare -fno-extern-tls-init -Wl,-berok "
+export LDFLAGS="-L$PWD/build/Linux/$BUILD_TYPE/ -Wl,-bbigtoc -lpython3.9"
+export LIBPATH="$PWD/build/Linux/$BUILD_TYPE"
+```
+To initiate build, run the below command
+```bash
+./build.sh \
+--config $BUILD_TYPE\
+  --build_shared_lib \
+  --skip_submodule_sync \
+  --cmake_extra_defines CMAKE_INSTALL_PREFIX=$PWD/install \
+  --parallel  
+```
+
+* If you want to install the package in a custom directory, then mention the directory location as value of CMAKE_INSTALL_PREFIX.
+* In case of IBM Open XL compiler tool chain, It is possible that in AIX 7.2 some of the runtime libraries like libunwind.a  needed for onnxruntime, will be missing. To fix this, you can install the relevant file-sets.
+* --parallel option in build option.
+  As name suggest, this option is for parallel building and resource intensive option. So, if your system is not having good amount of memory for each CPU core, then this option can be skipped. 
+* --allow_running_as_root  is needed if root user is triggering the build.
+
+
 #### Notes
 
 * Please note that these instructions build the debug build, which may have performance tradeoffs. The "--config" parameter has four valid values: Debug, Release, RelWithDebInfo and MinSizeRel. Compared to "Release", "RelWithDebInfo" not only has debug info, it also disables some inlines to make the binary easier to debug. Thus RelWithDebInfo is slower than Release.
@@ -131,13 +187,14 @@ Note: unit tests will be skipped due to the incompatible CPU instruction set whe
 ### Architectures
 {: .no_toc }
 
-|           | x86_32       | x86_64       | ARM32v7      | ARM64        | PPC64LE | RISCV64 |
-|-----------|:------------:|:------------:|:------------:|:------------:|:-------:|:-------:|
-|Windows    | YES          | YES          |  YES         | YES          | NO      | NO      |
-|Linux      | YES          | YES          |  YES         | YES          | YES     | YES     |
-|macOS      | NO           | YES          |  NO          | NO           | NO      | NO      |
-|Android      | NO           | NO          |  YES          | YES           | NO      | NO      |
-|iOS      | NO           | NO          |  NO          | YES           | NO      | NO      |
+|           | x86_32       | x86_64       | ARM32v7      | ARM64        | PPC64LE | RISCV64 | PPC64BE |
+|-----------|:------------:|:------------:|:------------:|:------------:|:-------:|:-------:| :------:|
+|Windows    | YES          | YES          |  YES         | YES          | NO      | NO      | NO      |
+|Linux      | YES          | YES          |  YES         | YES          | YES     | YES     | NO      |
+|macOS      | NO           | YES          |  NO          | NO           | NO      | NO      | NO      |
+|Android      | NO           | NO          |  YES          | YES           | NO      | NO      | NO     |
+|iOS      | NO           | NO          |  NO          | YES           | NO      | NO      |  NO     |
+|AIX        | NO           | NO          |  NO          | NO           | NO      | NO      |  YES     |
 
 ### Build Environments(Host)
 {: .no_toc }
@@ -311,21 +368,21 @@ ORT_DEBUG_NODE_IO_DUMP_DATA_TO_FILES=1
     ```
 
 
-### ARM
+### Arm
 
-There are a few options for building ONNX Runtime for ARM. 
+There are a few options for building ONNX Runtime for Arm®-based devices. 
 
-First, you may do it on a real ARM device, or on a x86_64 device with an emulator(like qemu), or on a x86_64 device with a docker container with an emulator(you can run an ARM container on a x86_64 PC). Then the build instructions are essentially the same as the instructions for Linux x86_64. However, it wouldn't work if your the CPU you are targeting is not 64-bit since the build process needs more than 2GB memory.  
+First, you may do it on a real Arm-based device, or on a x86_64 device with an emulator(like qemu), or on a x86_64 device with a docker container with an emulator(you can run an Arm-based container on a x86_64 PC). Then the build instructions are essentially the same as the instructions for Linux x86_64. However, it wouldn't work if your the CPU you are targeting is not 64-bit since the build process needs more than 2GB memory.  
 
-* [Cross compiling for ARM with simulation (Linux/Windows)](#cross-compiling-for-arm-with-simulation-linuxwindows) - **Recommended**;  Easy, slow, ARM64 only(no support for ARM32)
+* [Cross compiling for Arm-based devices with simulation (Linux/Windows)](#cross-compiling-for-arm-based-devices-with-simulation-linuxwindows) - **Recommended**;  Easy, slow, ARM64 only(no support for ARM32)
 * [Cross compiling on Linux](#cross-compiling-on-linux) - Difficult, fast
 * [Cross compiling on Windows](#cross-compiling-on-windows)
 
-#### Cross compiling for ARM with simulation (Linux/Windows)
+#### Cross compiling for Arm-based devices with simulation (Linux/Windows)
 
 *EASY, SLOW, RECOMMENDED*
 
-This method relies on qemu user mode emulation. It allows you to compile using a desktop or cloud VM through instruction level simulation. You'll run the build on x86 CPU and translate every ARM instruction to x86. This is much faster than compiling natively on a low-end ARM device. The resulting ONNX Runtime Python wheel (.whl) file is then deployed to an ARM device where it can be invoked in Python 3 scripts. The build process can take hours, and may run of memory if the target CPU is 32-bit.
+This method relies on qemu user mode emulation. It allows you to compile using a desktop or cloud VM through instruction level simulation. You'll run the build on x86 CPU and translate every Arm architecture instruction to x86. This is potentially much faster than compiling natively on a low-end device. The resulting ONNX Runtime Python wheel (.whl) file is then deployed to an Arm-based device where it can be invoked in Python 3 scripts. The build process can take hours, and may run of memory if the target CPU is 32-bit.
 
 #### Cross compiling on Linux
 
@@ -364,12 +421,12 @@ This option is very fast and allows the package to be built in minutes, but is c
 
     You must also know what kind of flags your target hardware need, which can differ greatly. For example, if you just get the normal ARMv7 compiler and use it for Raspberry Pi V1 directly, it won't work because Raspberry Pi only has ARMv6. Generally every hardware vendor will provide a toolchain; check how that one was built.
 
-    A target env is identifed by:
+    A target env is identified by:
 
     * Arch: x86_32, x86_64, armv6,armv7,arvm7l,aarch64,...
     * OS: bare-metal or linux.
     * Libc: gnu libc/ulibc/musl/...
-    * ABI: ARM has mutilple ABIs like eabi, eabihf...
+    * ABI: Arm has multiple ABIs like eabi, eabihf...
 
     You can get all these information from the previous output, please be sure they are all correct.
    
@@ -528,8 +585,8 @@ This option is very fast and allows the package to be built in minutes, but is c
 
 **Using Visual C++ compilers**
 
-1. Download and install Visual C++ compilers and libraries for ARM(64).
-   If you have Visual Studio installed, please use the Visual Studio Installer (look under the section `Individual components` after choosing to `modify` Visual Studio) to download and install the corresponding ARM(64) compilers and libraries.
+1. Download and install Visual C++ compilers and libraries for Arm(64).
+   If you have Visual Studio installed, please use the Visual Studio Installer (look under the section `Individual components` after choosing to `modify` Visual Studio) to download and install the corresponding Arm(64) compilers and libraries.
 
 2. Use `.\build.bat` and specify `--arm` or `--arm64` as the build option to start building. Preferably use `Developer Command Prompt for VS` or make sure all the installed cross-compilers are findable from the command prompt being used to build using the PATH environmant variable.