diff --git a/docs/build/eps.md b/docs/build/eps.md
index 12fc4d3235bb3..40bf99be46bff 100644
--- a/docs/build/eps.md
+++ b/docs/build/eps.md
@@ -260,13 +260,13 @@ See more information on the OpenVINO™ Execution Provider [here](../execution-p
### Prerequisites
{: .no_toc }
-1. Install the OpenVINO™ offline/online installer from Intel® Distribution of OpenVINO™TM Toolkit **Release 2024.1** for the appropriate OS and target hardware:
- * [Windows - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?VERSION=v_2023_1_0&OP_SYSTEM=WINDOWS&DISTRIBUTION=ARCHIVE).
- * [Linux - CPU, GPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?VERSION=v_2023_1_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE)
+1. Install the OpenVINO™ offline/online installer from Intel® Distribution of OpenVINO™TM Toolkit **Release 2024.3** for the appropriate OS and target hardware:
+ * [Windows - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=WINDOWS&DISTRIBUTION=ARCHIVE).
+ * [Linux - CPU, GPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE)
Follow [documentation](https://docs.openvino.ai/2024/home.html) for detailed instructions.
- *2024.1 is the current recommended OpenVINO™ version. [OpenVINO™ 2023.1](https://docs.openvino.ai/archive/2023.1/home.html) is minimal OpenVINO™ version requirement.*
+ *2024.3 is the current recommended OpenVINO™ version. [OpenVINO™ 2023.3](https://docs.openvino.ai/2023.3/home.html) is minimal OpenVINO™ version requirement.*
2. Configure the target hardware with specific follow on instructions:
* To configure Intel® Processor Graphics(GPU) please follow these instructions: [Windows](https://docs.openvino.ai/latest/openvino_docs_install_guides_configurations_for_intel_gpu.html#gpu-guide-windows), [Linux](https://docs.openvino.ai/latest/openvino_docs_install_guides_configurations_for_intel_gpu.html#linux)
@@ -396,75 +396,24 @@ The DirectML execution provider supports building for both x64 and x86 architect
---
-## ARM Compute Library
+## Arm Compute Library
See more information on the ACL Execution Provider [here](../execution-providers/community-maintained/ACL-ExecutionProvider.md).
-### Prerequisites
-{: .no_toc }
-
-* Supported backend: i.MX8QM Armv8 CPUs
-* Supported BSP: i.MX8QM BSP
- * Install i.MX8QM BSP: `source fsl-imx-xwayland-glibc-x86_64-fsl-image-qt5-aarch64-toolchain-4*.sh`
-* Set up the build environment
-```
-source /opt/fsl-imx-xwayland/4.*/environment-setup-aarch64-poky-linux
-alias cmake="/usr/bin/cmake -DCMAKE_TOOLCHAIN_FILE=$OECORE_NATIVE_SYSROOT/usr/share/cmake/OEToolchainConfig.cmake"
-```
-* See [Build ARM](inferencing.md#arm) below for information on building for ARM devices
-
### Build Instructions
{: .no_toc }
-1. Configure ONNX Runtime with ACL support:
-```
-cmake ../onnxruntime-arm-upstream/cmake -DONNX_CUSTOM_PROTOC_EXECUTABLE=/usr/bin/protoc -Donnxruntime_RUN_ONNX_TESTS=OFF -Donnxruntime_GENERATE_TEST_REPORTS=ON -Donnxruntime_DEV_MODE=ON -DPYTHON_EXECUTABLE=/usr/bin/python3 -Donnxruntime_USE_CUDA=OFF -Donnxruntime_USE_NSYNC=OFF -Donnxruntime_CUDNN_HOME= -Donnxruntime_USE_JEMALLOC=OFF -Donnxruntime_ENABLE_PYTHON=OFF -Donnxruntime_BUILD_CSHARP=OFF -Donnxruntime_BUILD_SHARED_LIB=ON -Donnxruntime_USE_EIGEN_FOR_BLAS=ON -Donnxruntime_USE_OPENBLAS=OFF -Donnxruntime_USE_ACL=ON -Donnxruntime_USE_DNNL=OFF -Donnxruntime_USE_MKLML=OFF -Donnxruntime_USE_OPENMP=ON -Donnxruntime_USE_TVM=OFF -Donnxruntime_USE_LLVM=OFF -Donnxruntime_ENABLE_MICROSOFT_INTERNAL=OFF -Donnxruntime_USE_BRAINSLICE=OFF -Donnxruntime_USE_EIGEN_THREADPOOL=OFF -Donnxruntime_BUILD_UNIT_TESTS=ON -DCMAKE_BUILD_TYPE=RelWithDebInfo
-```
-The ```-Donnxruntime_USE_ACL=ON``` option will use, by default, the 19.05 version of the Arm Compute Library. To set the right version you can use:
-```-Donnxruntime_USE_ACL_1902=ON```, ```-Donnxruntime_USE_ACL_1905=ON```, ```-Donnxruntime_USE_ACL_1908=ON``` or ```-Donnxruntime_USE_ACL_2002=ON```;
-
-To use a library outside the normal environment you can set a custom path by using ```-Donnxruntime_ACL_HOME``` and ```-Donnxruntime_ACL_LIBS``` tags that defines the path to the ComputeLibrary directory and the build directory respectively.
+You must first build Arm Compute Library 24.07 for your platform as described in the [documentation](https://github.com/ARM-software/ComputeLibrary).
+See [here](inferencing.md#arm) for information on building for Arm®-based devices.
-```-Donnxruntime_ACL_HOME=/path/to/ComputeLibrary```, ```-Donnxruntime_ACL_LIBS=/path/to/build```
+Add the following options to `build.sh` to enable the ACL Execution Provider:
-
-2. Build ONNX Runtime library, test and performance application:
-```
-make -j 6
-```
-
-3. Deploy ONNX runtime on the i.MX 8QM board
```
-libonnxruntime.so.0.5.0
-onnxruntime_perf_test
-onnxruntime_test_all
+--use_acl --acl_home=/path/to/ComputeLibrary --acl_libs=/path/to/ComputeLibrary/build
```
-### Native Build Instructions
-{: .no_toc }
-
-*Validated on Jetson Nano and Jetson Xavier*
-
-1. Build ACL Library (skip if already built)
-
- ```bash
- cd ~
- git clone -b v20.02 https://github.com/Arm-software/ComputeLibrary.git
- cd ComputeLibrary
- sudo apt-get install -y scons g++-arm-linux-gnueabihf
- scons -j8 arch=arm64-v8a Werror=1 debug=0 asserts=0 neon=1 opencl=1 examples=1 build=native
- ```
-
-1. Cmake is needed to build ONNX Runtime. Because the minimum required version is 3.13,
- it is necessary to build CMake from source. Download Unix/Linux sources from https://cmake.org/download/
- and follow https://cmake.org/install/ to build from source. Version 3.17.5 and 3.18.4 have been tested on Jetson.
-
-1. Build onnxruntime with --use_acl flag with one of the supported ACL version flags. (ACL_1902 | ACL_1905 | ACL_1908 | ACL_2002)
-
----
-
-## ArmNN
+## Arm NN
-See more information on the ArmNN Execution Provider [here](../execution-providers/community-maintained/ArmNN-ExecutionProvider.md).
+See more information on the Arm NN Execution Provider [here](../execution-providers/community-maintained/ArmNN-ExecutionProvider.md).
### Prerequisites
{: .no_toc }
@@ -480,7 +429,7 @@ source /opt/fsl-imx-xwayland/4.*/environment-setup-aarch64-poky-linux
alias cmake="/usr/bin/cmake -DCMAKE_TOOLCHAIN_FILE=$OECORE_NATIVE_SYSROOT/usr/share/cmake/OEToolchainConfig.cmake"
```
-* See [Build ARM](inferencing.md#arm) below for information on building for ARM devices
+* See [here](inferencing.md#arm) for information on building for Arm-based devices
### Build Instructions
{: .no_toc }
@@ -490,20 +439,20 @@ alias cmake="/usr/bin/cmake -DCMAKE_TOOLCHAIN_FILE=$OECORE_NATIVE_SYSROOT/usr/sh
./build.sh --use_armnn
```
-The Relu operator is set by default to use the CPU execution provider for better performance. To use the ArmNN implementation build with --armnn_relu flag
+The Relu operator is set by default to use the CPU execution provider for better performance. To use the Arm NN implementation build with --armnn_relu flag
```bash
./build.sh --use_armnn --armnn_relu
```
-The Batch Normalization operator is set by default to use the CPU execution provider. To use the ArmNN implementation build with --armnn_bn flag
+The Batch Normalization operator is set by default to use the CPU execution provider. To use the Arm NN implementation build with --armnn_bn flag
```bash
./build.sh --use_armnn --armnn_bn
```
-To use a library outside the normal environment you can set a custom path by providing the --armnn_home and --armnn_libs parameters to define the path to the ArmNN home directory and build directory respectively.
-The ARM Compute Library home directory and build directory must also be available, and can be specified if needed using --acl_home and --acl_libs respectively.
+To use a library outside the normal environment you can set a custom path by providing the --armnn_home and --armnn_libs parameters to define the path to the Arm NN home directory and build directory respectively.
+The Arm Compute Library home directory and build directory must also be available, and can be specified if needed using --acl_home and --acl_libs respectively.
```bash
./build.sh --use_armnn --armnn_home /path/to/armnn --armnn_libs /path/to/armnn/build --acl_home /path/to/ComputeLibrary --acl_libs /path/to/acl/build
@@ -519,7 +468,7 @@ See more information on the RKNPU Execution Provider [here](../execution-provide
* Supported platform: RK1808 Linux
-* See [Build ARM](inferencing.md#arm) below for information on building for ARM devices
+* See [here](inferencing.md#arm) for information on building for Arm-based devices
* Use gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu instead of gcc-linaro-6.3.1-2017.05-x86_64_arm-linux-gnueabihf, and modify CMAKE_CXX_COMPILER & CMAKE_C_COMPILER in tool.cmake:
```
diff --git a/docs/build/inferencing.md b/docs/build/inferencing.md
index 4f9886913d078..125623ef28399 100644
--- a/docs/build/inferencing.md
+++ b/docs/build/inferencing.md
@@ -88,7 +88,8 @@ If you would like to use [Xcode](https://developer.apple.com/xcode/) to build th
Without this flag, the cmake build generator will be Unix makefile by default.
-Today, Mac computers are either Intel-Based or Apple silicon(aka. ARM) based. By default, ONNX Runtime's build script only generate bits for the CPU ARCH that the build machine has. If you want to do cross-compiling: generate ARM binaries on a Intel-Based Mac computer, or generate x86 binaries on a Mac ARM computer, you can set the "CMAKE_OSX_ARCHITECTURES" cmake variable. For example:
+Today, Mac computers are either Intel-Based or Apple silicon-based. By default, ONNX Runtime's build script only generate bits for the CPU ARCH that the build machine has. If you want to do cross-compiling: generate arm64 binaries on a Intel-Based Mac computer, or generate x86 binaries on a Mac
+system with Apple silicon, you can set the "CMAKE_OSX_ARCHITECTURES" cmake variable. For example:
Build for Intel CPUs:
```bash
@@ -107,6 +108,61 @@ The last command will generate a fat-binary for both CPU architectures.
Note: unit tests will be skipped due to the incompatible CPU instruction set when doing cross-compiling.
+#### AIX
+In AIX, you can build ONNX Runtime for 64bit using
+
+* IBM Open XL compiler tool chain.
+ Minimum required AIX OS version is 7.2. You need to have 17.1.2 compiler PTF5 (17.1.2.5) version.
+* GNU GCC compiler tool chain.
+ Minimum required AIX OS version is 7.3. GCC version 10.3+ is required.
+
+For IBM Open XL, export below environment settings.
+```bash
+ulimit -m unlimited
+ulimit -d unlimited
+ulimit -n 2000
+ulimit -f unlimited
+export OBJECT_MODE=64
+export BUILD_TYPE="Release"
+export CC="/opt/IBM/openxlC/17.1.2/bin/ibm-clang"
+export CXX="/opt/IBM/openxlC/17.1.2/bin/ibm-clang++_r"
+export CFLAGS="-pthread -m64 -D_ALL_SOURCE -mcmodel=large -Wno-deprecate-lax-vec-conv-all -Wno-unused-but-set-variable -Wno-unused-command-line-argument -maltivec -mvsx -Wno-unused-variable -Wno-unused-parameter -Wno-sign-compare"
+export CXXFLAGS="-pthread -m64 -D_ALL_SOURCE -mcmodel=large -Wno-deprecate-lax-vec-conv-all -Wno-unused-but-set-variable -Wno-unused-command-line-argument -maltivec -mvsx -Wno-unused-variable -Wno-unused-parameter -Wno-sign-compare"
+export LDFLAGS="-L$PWD/build/Linux/$BUILD_TYPE/ -lpthread"
+export LIBPATH="$PWD/build/Linux/$BUILD_TYPE/"
+```
+For GCC, export below environment settings.
+```bash
+ulimit -m unlimited
+ulimit -d unlimited
+ulimit -n 2000
+ulimit -f unlimited
+export OBJECT_MODE=64
+export BUILD_TYPE="Release"
+export CC="gcc"
+export CXX="g++"
+export CFLAGS="-maix64 -pthread -DFLATBUFFERS_LOCALE_INDEPENDENT=0 -maltivec -mvsx -Wno-unused-function -Wno-unused-variable -Wno-unused-parameter -Wno-sign-compare -fno-extern-tls-init -Wl,-berok "
+export CXXFLAGS="-maix64 -pthread -DFLATBUFFERS_LOCALE_INDEPENDENT=0 -maltivec -mvsx -Wno-unused-function -Wno-unused-variable -Wno-unused-parameter -Wno-sign-compare -fno-extern-tls-init -Wl,-berok "
+export LDFLAGS="-L$PWD/build/Linux/$BUILD_TYPE/ -Wl,-bbigtoc -lpython3.9"
+export LIBPATH="$PWD/build/Linux/$BUILD_TYPE"
+```
+To initiate build, run the below command
+```bash
+./build.sh \
+--config $BUILD_TYPE\
+ --build_shared_lib \
+ --skip_submodule_sync \
+ --cmake_extra_defines CMAKE_INSTALL_PREFIX=$PWD/install \
+ --parallel
+```
+
+* If you want to install the package in a custom directory, then mention the directory location as value of CMAKE_INSTALL_PREFIX.
+* In case of IBM Open XL compiler tool chain, It is possible that in AIX 7.2 some of the runtime libraries like libunwind.a needed for onnxruntime, will be missing. To fix this, you can install the relevant file-sets.
+* --parallel option in build option.
+ As name suggest, this option is for parallel building and resource intensive option. So, if your system is not having good amount of memory for each CPU core, then this option can be skipped.
+* --allow_running_as_root is needed if root user is triggering the build.
+
+
#### Notes
* Please note that these instructions build the debug build, which may have performance tradeoffs. The "--config" parameter has four valid values: Debug, Release, RelWithDebInfo and MinSizeRel. Compared to "Release", "RelWithDebInfo" not only has debug info, it also disables some inlines to make the binary easier to debug. Thus RelWithDebInfo is slower than Release.
@@ -131,13 +187,14 @@ Note: unit tests will be skipped due to the incompatible CPU instruction set whe
### Architectures
{: .no_toc }
-| | x86_32 | x86_64 | ARM32v7 | ARM64 | PPC64LE | RISCV64 |
-|-----------|:------------:|:------------:|:------------:|:------------:|:-------:|:-------:|
-|Windows | YES | YES | YES | YES | NO | NO |
-|Linux | YES | YES | YES | YES | YES | YES |
-|macOS | NO | YES | NO | NO | NO | NO |
-|Android | NO | NO | YES | YES | NO | NO |
-|iOS | NO | NO | NO | YES | NO | NO |
+| | x86_32 | x86_64 | ARM32v7 | ARM64 | PPC64LE | RISCV64 | PPC64BE |
+|-----------|:------------:|:------------:|:------------:|:------------:|:-------:|:-------:| :------:|
+|Windows | YES | YES | YES | YES | NO | NO | NO |
+|Linux | YES | YES | YES | YES | YES | YES | NO |
+|macOS | NO | YES | NO | NO | NO | NO | NO |
+|Android | NO | NO | YES | YES | NO | NO | NO |
+|iOS | NO | NO | NO | YES | NO | NO | NO |
+|AIX | NO | NO | NO | NO | NO | NO | YES |
### Build Environments(Host)
{: .no_toc }
@@ -311,21 +368,21 @@ ORT_DEBUG_NODE_IO_DUMP_DATA_TO_FILES=1
```
-### ARM
+### Arm
-There are a few options for building ONNX Runtime for ARM.
+There are a few options for building ONNX Runtime for Arm®-based devices.
-First, you may do it on a real ARM device, or on a x86_64 device with an emulator(like qemu), or on a x86_64 device with a docker container with an emulator(you can run an ARM container on a x86_64 PC). Then the build instructions are essentially the same as the instructions for Linux x86_64. However, it wouldn't work if your the CPU you are targeting is not 64-bit since the build process needs more than 2GB memory.
+First, you may do it on a real Arm-based device, or on a x86_64 device with an emulator(like qemu), or on a x86_64 device with a docker container with an emulator(you can run an Arm-based container on a x86_64 PC). Then the build instructions are essentially the same as the instructions for Linux x86_64. However, it wouldn't work if your the CPU you are targeting is not 64-bit since the build process needs more than 2GB memory.
-* [Cross compiling for ARM with simulation (Linux/Windows)](#cross-compiling-for-arm-with-simulation-linuxwindows) - **Recommended**; Easy, slow, ARM64 only(no support for ARM32)
+* [Cross compiling for Arm-based devices with simulation (Linux/Windows)](#cross-compiling-for-arm-based-devices-with-simulation-linuxwindows) - **Recommended**; Easy, slow, ARM64 only(no support for ARM32)
* [Cross compiling on Linux](#cross-compiling-on-linux) - Difficult, fast
* [Cross compiling on Windows](#cross-compiling-on-windows)
-#### Cross compiling for ARM with simulation (Linux/Windows)
+#### Cross compiling for Arm-based devices with simulation (Linux/Windows)
*EASY, SLOW, RECOMMENDED*
-This method relies on qemu user mode emulation. It allows you to compile using a desktop or cloud VM through instruction level simulation. You'll run the build on x86 CPU and translate every ARM instruction to x86. This is much faster than compiling natively on a low-end ARM device. The resulting ONNX Runtime Python wheel (.whl) file is then deployed to an ARM device where it can be invoked in Python 3 scripts. The build process can take hours, and may run of memory if the target CPU is 32-bit.
+This method relies on qemu user mode emulation. It allows you to compile using a desktop or cloud VM through instruction level simulation. You'll run the build on x86 CPU and translate every Arm architecture instruction to x86. This is potentially much faster than compiling natively on a low-end device. The resulting ONNX Runtime Python wheel (.whl) file is then deployed to an Arm-based device where it can be invoked in Python 3 scripts. The build process can take hours, and may run of memory if the target CPU is 32-bit.
#### Cross compiling on Linux
@@ -364,12 +421,12 @@ This option is very fast and allows the package to be built in minutes, but is c
You must also know what kind of flags your target hardware need, which can differ greatly. For example, if you just get the normal ARMv7 compiler and use it for Raspberry Pi V1 directly, it won't work because Raspberry Pi only has ARMv6. Generally every hardware vendor will provide a toolchain; check how that one was built.
- A target env is identifed by:
+ A target env is identified by:
* Arch: x86_32, x86_64, armv6,armv7,arvm7l,aarch64,...
* OS: bare-metal or linux.
* Libc: gnu libc/ulibc/musl/...
- * ABI: ARM has mutilple ABIs like eabi, eabihf...
+ * ABI: Arm has multiple ABIs like eabi, eabihf...
You can get all these information from the previous output, please be sure they are all correct.
@@ -528,8 +585,8 @@ This option is very fast and allows the package to be built in minutes, but is c
**Using Visual C++ compilers**
-1. Download and install Visual C++ compilers and libraries for ARM(64).
- If you have Visual Studio installed, please use the Visual Studio Installer (look under the section `Individual components` after choosing to `modify` Visual Studio) to download and install the corresponding ARM(64) compilers and libraries.
+1. Download and install Visual C++ compilers and libraries for Arm(64).
+ If you have Visual Studio installed, please use the Visual Studio Installer (look under the section `Individual components` after choosing to `modify` Visual Studio) to download and install the corresponding Arm(64) compilers and libraries.
2. Use `.\build.bat` and specify `--arm` or `--arm64` as the build option to start building. Preferably use `Developer Command Prompt for VS` or make sure all the installed cross-compilers are findable from the command prompt being used to build using the PATH environmant variable.
diff --git a/docs/execution-providers/CUDA-ExecutionProvider.md b/docs/execution-providers/CUDA-ExecutionProvider.md
index 97374ff6e096d..81c0c4d270de3 100644
--- a/docs/execution-providers/CUDA-ExecutionProvider.md
+++ b/docs/execution-providers/CUDA-ExecutionProvider.md
@@ -35,12 +35,13 @@ Because of [Nvidia CUDA Minor Version Compatibility](https://docs.nvidia.com/dep
ONNX Runtime built with cuDNN 8.x is not compatible with cuDNN 9.x, and vice versa. You can choose the package based on CUDA and cuDNN major versions that match your runtime environment (For example, PyTorch 2.3 uses cuDNN 8.x, while PyTorch 2.4 or later used cuDNN 9.x).
-### CUDA 12.x
+Note: starting ORT 1.19, **CUDA 12.x** becomes default version when distributing ONNX Runtime GPU packages in pypi.
-To install CUDA 12 package, please look at [Install ORT](../install).
+### CUDA 12.x
| ONNX Runtime | CUDA | cuDNN | Notes |
|---------------|--------|-------|----------------------------------------------------------------------|
+| 1.19.x | 12.x | 9.x | Avaiable in pypi. Compatible with PyTorch >= 2.4.0 for cuda 12.x. |
| 1.18.1 | 12.x | 9.x | cuDNN 9 is required. No Java package. |
| 1.18.0 | 12.x | 8.x | Java package is added. |
| 1.17.x | 12.x | 8.x | Only C++/C# Nuget and Python packages are released. No Java package. |
@@ -49,7 +50,8 @@ To install CUDA 12 package, please look at [Install ORT](../install).
| ONNX Runtime | CUDA | cuDNN | Notes |
|----------------------|--------|-----------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|
-| 1.18.x | 11.8 | 8.x | |
+| 1.19.x | 11.8 | 8.x | Not available in pypi. See [Install ORT](../install) for detail. Compatible with PyTorch <= 2.3.1 for CUDA 11.8. |
+| 1.18.x | 11.8 | 8.x | Available in pypi |
| 1.17
1.16
1.15 | 11.8 | 8.2.4 (Linux)
8.5.0.96 (Windows) | Tested with CUDA versions from 11.6 up to 11.8, and cuDNN from 8.2 up to 8.9 |
| 1.14
1.13 | 11.6 | 8.2.4 (Linux)
8.5.0.96 (Windows) | libcudart 11.4.43
libcufft 10.5.2.100
libcurand 10.2.5.120
libcublasLt 11.6.5.2
libcublas 11.6.5.2
libcudnn 8.2.4 |
| 1.12
1.11 | 11.4 | 8.2.4 (Linux)
8.2.2.26 (Windows) | libcudart 11.4.43
libcufft 10.5.2.100
libcurand 10.2.5.120
libcublasLt 11.6.5.2
libcublas 11.6.5.2
libcudnn 8.2.4 |
diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md
index af752b1a85e7e..6ffa77edc60b5 100644
--- a/docs/execution-providers/CoreML-ExecutionProvider.md
+++ b/docs/execution-providers/CoreML-ExecutionProvider.md
@@ -128,10 +128,12 @@ Operators that are supported by the CoreML Execution Provider when a NeuralNetwo
|ai.onnx.ReduceSum||
|ai.onnx:Relu||
|ai.onnx:Reshape||
-|ai.onnx:Resize||
+|ai.onnx:Resize|4D input.
`coordinate_transformation_mode` == `asymmetric`.
`mode` == `linear` or `nearest`.
`nearest_mode` == `floor`.
`exclude_outside` == false
`scales` or `sizes` must be constant.|
|ai.onnx:Shape|Attribute `start` with non-default value is not supported.
Attribute `end` is not supported.|
|ai.onnx:Sigmoid||
|ai.onnx:Slice|Inputs `starts`, `ends`, `axes`, and `steps` should be constant. Empty slice is not supported.|
+|ai.onnx:Softmax||
+|ai.onnx:Split|If provided, `splits` must be constant.|
|ai.onnx:Squeeze||
|ai.onnx:Sqrt||
|ai.onnx:Sub||
@@ -147,15 +149,26 @@ Operators that are supported by the CoreML Execution Provider when a MLProgram m
|ai.onnx:Add||
|ai.onnx:AveragePool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.|
|ai.onnx:Clip||
+|ai.onnx:Concat||
|ai.onnx:Conv|Only 1D/2D Conv is supported.
Bias if provided must be constant.|
+|ai.onnx:ConvTranspose|Weight and bias must be constant.
padding_type of SAME_UPPER/SAME_LOWER is not supported.
kernel_shape must have default values.
output_shape is not supported.
output_padding must have default values.|
+|ai.onnx.DepthToSpace|If 'mode' is 'CRD' the input must have a fixed shape.|
|ai.onnx:Div||
|ai.onnx:Gemm|Input B must be constant.|
|ai.onnx:GlobalAveragePool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.|
|ai.onnx:GlobalMaxPool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.|
+|ai.onnx:GridSample|4D input.
'mode' of 'linear' or 'zeros'.
(mode==linear && padding_mode==reflection && align_corners==0) is not supported.|
+|ai.onnx.LeakyRelu||
|ai.onnx:MatMul|Only support for transA == 0, alpha == 1.0 and beta == 1.0 is currently implemented.|
|ai.onnx:MaxPool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.|
|ai.onnx:Mul||
|ai.onnx:Pow|Only supports cases when both inputs are fp32.|
|ai.onnx:Relu||
|ai.onnx:Reshape||
+|ai.onnx:Resize|See [resize_op_builder.cc](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/coreml/builders/impl/resize_op_builder.cc) implementation. There are too many permutations to describe the valid combinations.|
+|ai.onnx.Slice|starts/ends/axes/steps must be constant initializers.|
+|ai.onnx.Split|If provided, `splits` must be constant.|
|ai.onnx:Sub||
+|ai.onnx:Sigmoid||
+|ai.onnx:Tanh||
+|ai.onnx:Transpose||
diff --git a/docs/execution-providers/EP-Context-Design.md b/docs/execution-providers/EP-Context-Design.md
new file mode 100644
index 0000000000000..8e5ffcbb962dd
--- /dev/null
+++ b/docs/execution-providers/EP-Context-Design.md
@@ -0,0 +1,82 @@
+---
+title: EP context design
+description: ONNX Runtime EP context cache feature design
+parent: Execution Providers
+nav_order: 16
+redirect_from: /docs/reference/execution-providers/EP-Context-Design
+---
+
+# OnnxRuntime EP context cache feature design
+{: .no_toc }
+
+## Contents
+{: .no_toc }
+
+* TOC placeholder
+{:toc}
+
+## Background
+
+OnnxRuntime Execution Providers enable users to inference Onnx model on different kinds of hardware accelerators empowered by backend SDKs (like QNN, OpenVINO, Vitis AI, etc). The Execution Providers converts the Onnx model into graph format required by the backend SDK, and compiles it into the format required by the hardware. Specific to NPU world, the converting and compiling process takes a long time to complete, especially for LLM models. The session creation time costs tens of minutes for some cases which impacts the user experience badly.
+To avoid the converting and compiling cost, most of the backend SDKs provide the feature to dump the pre-compiled model into binary file. The pre-compiled model can be loaded by backend SDK directly and executed on the target device. It improves the session creation time greatly by using this way. In order to achieve this, OnnxRuntime defined a contribute Op called EPContext in MS domain.
+
+## EPContext Op Schema
+
+Op domain: com.microsoft
+Node inputs & outputs: variadic
+Domain: com.microsoft
+Atrribures:
+
+|Attributes |Data type|Description |
+|---------------------|---------|----------------------------------------------------------------------------------------------------------|
+|main_context |int64 |1 (default): This node points to an EP context content that contains the graph referred to by this node.
0: The node does not point to any EP context content. Expect to get the graph from node with this field is 1.
Some EPs support 1 single context contains multiple graphs. The EPContext node with main_context=1 refers to the real context. And the context contains graphs that are referred by other nodes with main_context=0.|
+|ep_cache_context |string |Payload of the EP context if embed_mode=1, or path to the context file if embed_mode=0.
The path is a relative path to the Onnx model file. It can be a file name, or subfolder/filename|
+|embed_mode |int64 |1(default): ep_cache_context contains the payload of context content.
0: ep_cache_context is the context binary file path.|
+|ep_sdk_version |string |Optional. SDK version that used to generate the node. |
+|onnx_model_filename |string |Optional. Original Onnx model file name. |
+|hardware_architecture|string |Optional. Hardware architecture.|
+|partition_name |string |Optional. OnnxRuntime partitioned graph name.|
+|source |string |Optional. The source used to generate the node. Should be a key identified by the EP so that OnnxRuntime can support multiple EPContext nodes run with different EPs. For example, QNN EP only accepts nodes with source=QNN or QnnExecutionProvider, OpenVINO EP only accepts nodes with source=OpenVINOExecutionProvider.|
+|notes |string |Optional. Additional information required by specific EP. |
+
+
- By: Natalie Kershaw + By: Natalie Kershaw and - Prasanth Pulavarthi
@@ -217,12 +217,12 @@ fun run(audioTensor: OnnxTensor): Result { anywhere that is outside of the cloud, ranging from large, well-resourced personal computers to small footprint devices such as mobile phones. This has been a challenging task to accomplish in the past, but new advances in model optimization and software like - ONNX Runtime + ONNX Runtime make it more feasible - even for new generative AI and large language models like Stable Diffusion, Whisper, and Llama2. -There are several factors to keep in mind when thinking about running a PyTorch model on the @@ -292,7 +292,7 @@ fun run(audioTensor: OnnxTensor): Result { -
We mentioned ONNX Runtime several times above. ONNX Runtime is a compact, standards-based @@ -305,7 +305,7 @@ fun run(audioTensor: OnnxTensor): Result { format that doesn't require the PyTorch framework and its gigabytes of dependencies. PyTorch has thought about this and includes an API that enables exactly this - torch.onnxtorch.onnx. ONNX is an open standard that defines the operators that make up models. The PyTorch ONNX APIs take the Pythonic PyTorch code and turn it into a functional graph that captures the operators that are needed to run the model without Python. As with everything @@ -318,7 +318,7 @@ fun run(audioTensor: OnnxTensor): Result { The popular Hugging Face library also has APIs that build on top of this torch.onnx functionality to export models to the ONNX format. Over 130,000 models130,000 models are supported making it very likely that the model you care about is one of them.
@@ -328,7 +328,7 @@ fun run(audioTensor: OnnxTensor): Result { and web browsers) via various languages (from C# to JavaScript to Swift). -You don't have to export the fifth model, ClipTokenizer, as it is available in ONNX Runtime extensionsONNX Runtime extensions, a library for pre and post processing PyTorch models.
@@ -353,7 +353,7 @@ fun run(audioTensor: OnnxTensor): Result { To run this pipeline of models as a .NET application, we build the pipeline code in C#. This code can be run on CPU, GPU, or NPU, if they are available on your machine, using ONNX Runtime's device-specific hardware accelerators. This is configured with theExecutionProviderTarget
ExecutionProviderTarget below.
You can build the application and run it on Windows with the detailed steps shown in this tutorialtutorial.
@@ -374,7 +374,7 @@ fun run(audioTensor: OnnxTensor): Result {Running a PyTorch model locally in the browser is not only possible but super simple with - the transformers.js library. Transformers.js uses ONNX Runtime Web as its backend. Many models are already converted to ONNX and served by the tranformers.js CDN, making inference in the browser a matter of writing @@ -407,7 +407,7 @@ fun run(audioTensor: OnnxTensor): Result { All components of the Whisper Tiny model (audio decoder, encoder, decoder, and text sequence generation) can be composed and exported to a single ONNX model using the Olive frameworkOlive framework. To run this model as part of a mobile application, you can use ONNX Runtime Mobile, which supports Android, iOS, react-native, and MAUI/Xamarin.
@@ -420,7 +420,7 @@ fun run(audioTensor: OnnxTensor): Result {The relevant snippet of a example Android mobile appAndroid mobile app that performs speech transcription on short samples of audio is shown below:
You can read the full Speaker Verification tutorialSpeaker Verification tutorial, and build and run the application from sourcebuild and run the application from source.
diff --git a/src/routes/components/customers.svelte b/src/routes/components/customers.svelte index a5da8146bea27..6c6c7dce06171 100644 --- a/src/routes/components/customers.svelte +++ b/src/routes/components/customers.svelte @@ -8,32 +8,36 @@ import antgroupLogo from '../../images/logos/antgroup-logo.png'; import algoriddimLogo from '../../images/logos/algoriddim-logo.png'; import ATLASLogo from '../../images/logos/ATLAS-logo.png'; + import autodeskLogo from '../../images/logos/autodesk-logo.png'; import bazaarvoiceLogo from '../../images/logos/bazaarvoice-logo.png'; import camoLogo from '../../images/logos/camo-logo.png'; import cephableLogo from '../../images/logos/cephable-logo.png'; import clearbladeLogo from '../../images/logos/clearblade-logo.png'; import deezerLogo from '../../images/logos/deezer-logo.png'; + import goodnotesLogo from '../../images/logos/goodnotes-logo.png'; + import huggingfaceLogo from '../../images/logos/huggingface-logo.png'; import hypefactorsLogo from '../../images/logos/hypefactors-logo.png'; import infarmLogo from '../../images/logos/infarm-logo.png'; import intelLogo from '../../images/logos/intel-logo.png'; import intelligenzaEticaLogo from '../../images/logos/intelligenza-etica-logo.png'; - import navitaireAmadeusLogo from '../../images/logos/navitaire-amadeus-logo.png'; - import PeakSpeedLogo from '../../images/logos/PeakSpeed_logo.png'; + import navitaireLogo from '../../images/logos/navitaire-amadeus-logo.png'; + import nvidiaLogo from '../../images/logos/nvidia.png'; + import opennlpLogo from '../../images/logos/opennlp-logo.png'; + import oracleLogo from '../../images/logos/oracle-logo.png'; + import peakspeedLogo from '../../images/logos/PeakSpeed_logo.png'; import piecesLogo from '../../images/logos/pieces-logo.png'; + import ptwLogo from '../../images/logos/ptw-logo.png'; import redisLogo from '../../images/logos/redis-logo.png'; - import RockchipLogo from '../../images/logos/Rockchip-logo.png'; + import rockchipLogo from '../../images/logos/Rockchip-logo.png'; import samtecLogo from '../../images/logos/samtec-logo.png'; import sasLogo from '../../images/logos/sas-logo.png'; import teradataLogo from '../../images/logos/teradata-logo.png'; import topazlabsLogo from '../../images/logos/topazlabs-logo.png'; - import ueLogo from '../../images/logos/ue-logo.png'; + import unrealengineLogo from '../../images/logos/ue-logo.png'; import usdaLogo from '../../images/logos/usda-logo.png'; import vespaLogo from '../../images/logos/vespa-logo.png'; import writerLogo from '../../images/logos/writer-logo.png'; import xilinxLogo from '../../images/logos/xilinx-logo.png'; - import huggingfaceLogo from '../../images/logos/huggingface-logo.png'; - import nvidiaLogo from '../../images/logos/nvidia.png'; - import oracleLogo from '../../images/logos/oracle-logo.png'; const testimonials = [ { @@ -61,6 +65,11 @@ src: ATLASLogo, alt: 'ATLAS' }, + { + href: './testimonials#Autodesk', + src: autodeskLogo, + alt: 'Autodesk' + }, { href: './testimonials#Bazaarvoice', src: bazaarvoiceLogo, @@ -86,6 +95,11 @@ src: deezerLogo, alt: 'Deezer' }, + { + href: './testimonials#Goodnotes', + src: goodnotesLogo, + alt: 'GoodNotes' + }, { href: './testimonials#Hugging%20Face', src: huggingfaceLogo, @@ -113,7 +127,7 @@ }, { href: './testimonials#Navitaire', - src: navitaireAmadeusLogo, + src: navitaireLogo, alt: 'Navitaire' }, { @@ -121,6 +135,11 @@ src: nvidiaLogo, alt: 'NVIDIA' }, + { + href: './testimonials#Apache%20OpenNLP', + src: opennlpLogo, + alt: 'Apache OpenNLP' + }, { href: './testimonials#Oracle', src: oracleLogo, @@ -128,7 +147,7 @@ }, { href: './testimonials#Peakspeed', - src: PeakSpeedLogo, + src: peakspeedLogo, alt: 'Peakspeed' }, { @@ -136,6 +155,11 @@ src: piecesLogo, alt: 'Pieces' }, + { + href: './testimonials#PTW%20Dosimetry', + src: ptwLogo, + alt: 'PTW Dosimetry' + }, { href: './testimonials#Redis', src: redisLogo, @@ -143,7 +167,7 @@ }, { href: './testimonials#Rockchip', - src: RockchipLogo, + src: rockchipLogo, alt: 'Rockchip' }, { @@ -168,7 +192,7 @@ }, { href: './testimonials#Unreal%20Engine', - src: ueLogo, + src: unrealengineLogo, alt: 'Unreal Engine' }, { diff --git a/src/routes/components/footer.svelte b/src/routes/components/footer.svelte index b030524976742..e6b855d0ca129 100644 --- a/src/routes/components/footer.svelte +++ b/src/routes/components/footer.svelte @@ -9,7 +9,7 @@