diff --git a/docs/build/eps.md b/docs/build/eps.md
index 12fc4d3235bb3..40bf99be46bff 100644
--- a/docs/build/eps.md
+++ b/docs/build/eps.md
@@ -260,13 +260,13 @@ See more information on the OpenVINO™ Execution Provider [here](../execution-p
 ### Prerequisites
 {: .no_toc }
 
-1. Install the OpenVINO™ offline/online installer from Intel<sup>®</sup> Distribution of OpenVINO™<sup>TM</sup> Toolkit **Release 2024.1** for the appropriate OS and target hardware:
-   * [Windows - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?VERSION=v_2023_1_0&OP_SYSTEM=WINDOWS&DISTRIBUTION=ARCHIVE).
-   * [Linux - CPU, GPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?VERSION=v_2023_1_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE)
+1. Install the OpenVINO™ offline/online installer from Intel<sup>®</sup> Distribution of OpenVINO™<sup>TM</sup> Toolkit **Release 2024.3** for the appropriate OS and target hardware:
+   * [Windows - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=WINDOWS&DISTRIBUTION=ARCHIVE).
+   * [Linux - CPU, GPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE)
 
    Follow [documentation](https://docs.openvino.ai/2024/home.html) for detailed instructions.
 
-  *2024.1 is the current recommended OpenVINO™ version. [OpenVINO™ 2023.1](https://docs.openvino.ai/archive/2023.1/home.html) is minimal OpenVINO™ version requirement.*
+  *2024.3 is the current recommended OpenVINO™ version. [OpenVINO™ 2023.3](https://docs.openvino.ai/2023.3/home.html) is minimal OpenVINO™ version requirement.*
 
 2. Configure the target hardware with specific follow on instructions:
    * To configure Intel<sup>®</sup> Processor Graphics(GPU) please follow these instructions: [Windows](https://docs.openvino.ai/latest/openvino_docs_install_guides_configurations_for_intel_gpu.html#gpu-guide-windows), [Linux](https://docs.openvino.ai/latest/openvino_docs_install_guides_configurations_for_intel_gpu.html#linux)
@@ -396,75 +396,24 @@ The DirectML execution provider supports building for both x64 and x86 architect
 
 ---
 
-## ARM Compute Library
+## Arm Compute Library
 See more information on the ACL Execution Provider [here](../execution-providers/community-maintained/ACL-ExecutionProvider.md).
 
-### Prerequisites
-{: .no_toc }
-
-* Supported backend: i.MX8QM Armv8 CPUs
-* Supported BSP: i.MX8QM BSP
-  * Install i.MX8QM BSP: `source fsl-imx-xwayland-glibc-x86_64-fsl-image-qt5-aarch64-toolchain-4*.sh`
-* Set up the build environment
-```
-source /opt/fsl-imx-xwayland/4.*/environment-setup-aarch64-poky-linux
-alias cmake="/usr/bin/cmake -DCMAKE_TOOLCHAIN_FILE=$OECORE_NATIVE_SYSROOT/usr/share/cmake/OEToolchainConfig.cmake"
-```
-* See [Build ARM](inferencing.md#arm) below for information on building for ARM devices
-
 ### Build Instructions
 {: .no_toc }
 
-1. Configure ONNX Runtime with ACL support:
-```
-cmake ../onnxruntime-arm-upstream/cmake -DONNX_CUSTOM_PROTOC_EXECUTABLE=/usr/bin/protoc -Donnxruntime_RUN_ONNX_TESTS=OFF -Donnxruntime_GENERATE_TEST_REPORTS=ON -Donnxruntime_DEV_MODE=ON -DPYTHON_EXECUTABLE=/usr/bin/python3 -Donnxruntime_USE_CUDA=OFF -Donnxruntime_USE_NSYNC=OFF -Donnxruntime_CUDNN_HOME= -Donnxruntime_USE_JEMALLOC=OFF -Donnxruntime_ENABLE_PYTHON=OFF -Donnxruntime_BUILD_CSHARP=OFF -Donnxruntime_BUILD_SHARED_LIB=ON -Donnxruntime_USE_EIGEN_FOR_BLAS=ON -Donnxruntime_USE_OPENBLAS=OFF -Donnxruntime_USE_ACL=ON -Donnxruntime_USE_DNNL=OFF -Donnxruntime_USE_MKLML=OFF -Donnxruntime_USE_OPENMP=ON -Donnxruntime_USE_TVM=OFF -Donnxruntime_USE_LLVM=OFF -Donnxruntime_ENABLE_MICROSOFT_INTERNAL=OFF -Donnxruntime_USE_BRAINSLICE=OFF -Donnxruntime_USE_EIGEN_THREADPOOL=OFF -Donnxruntime_BUILD_UNIT_TESTS=ON -DCMAKE_BUILD_TYPE=RelWithDebInfo
-```
-The ```-Donnxruntime_USE_ACL=ON``` option will use, by default, the 19.05 version of the Arm Compute Library. To set the right version you can use:
-```-Donnxruntime_USE_ACL_1902=ON```, ```-Donnxruntime_USE_ACL_1905=ON```, ```-Donnxruntime_USE_ACL_1908=ON``` or ```-Donnxruntime_USE_ACL_2002=ON```;
-
-To use a library outside the normal environment you can set a custom path by using ```-Donnxruntime_ACL_HOME``` and ```-Donnxruntime_ACL_LIBS``` tags that defines the path to the ComputeLibrary directory and the build directory respectively.
+You must first build Arm Compute Library 24.07 for your platform as described in the [documentation](https://github.com/ARM-software/ComputeLibrary).
+See [here](inferencing.md#arm) for information on building for Arm®-based devices.
 
-```-Donnxruntime_ACL_HOME=/path/to/ComputeLibrary```, ```-Donnxruntime_ACL_LIBS=/path/to/build```
+Add the following options to `build.sh` to enable the ACL Execution Provider:
 
-
-2. Build ONNX Runtime library, test and performance application:
-```
-make -j 6
-```
-
-3. Deploy ONNX runtime on the i.MX 8QM board
 ```
-libonnxruntime.so.0.5.0
-onnxruntime_perf_test
-onnxruntime_test_all
+--use_acl --acl_home=/path/to/ComputeLibrary --acl_libs=/path/to/ComputeLibrary/build
 ```
 
-### Native Build Instructions 
-{: .no_toc }
-
-*Validated on Jetson Nano and Jetson Xavier*
-
-1. Build ACL Library (skip if already built)
-
-    ```bash
-    cd ~
-    git clone -b v20.02 https://github.com/Arm-software/ComputeLibrary.git
-    cd ComputeLibrary
-    sudo apt-get install -y scons g++-arm-linux-gnueabihf
-    scons -j8 arch=arm64-v8a  Werror=1 debug=0 asserts=0 neon=1 opencl=1 examples=1 build=native
-    ```
-
-1. Cmake is needed to build ONNX Runtime. Because the minimum required version is 3.13,
-   it is necessary to build CMake from source. Download Unix/Linux sources from https://cmake.org/download/
-   and follow https://cmake.org/install/ to build from source. Version 3.17.5 and 3.18.4 have been tested on Jetson.
-
-1. Build onnxruntime with --use_acl flag with one of the supported ACL version flags. (ACL_1902 | ACL_1905 | ACL_1908 | ACL_2002)
-
----
-
-## ArmNN
+## Arm NN
 
-See more information on the ArmNN Execution Provider [here](../execution-providers/community-maintained/ArmNN-ExecutionProvider.md).
+See more information on the Arm NN Execution Provider [here](../execution-providers/community-maintained/ArmNN-ExecutionProvider.md).
 
 ### Prerequisites
 {: .no_toc }
@@ -480,7 +429,7 @@ source /opt/fsl-imx-xwayland/4.*/environment-setup-aarch64-poky-linux
 alias cmake="/usr/bin/cmake -DCMAKE_TOOLCHAIN_FILE=$OECORE_NATIVE_SYSROOT/usr/share/cmake/OEToolchainConfig.cmake"
 ```
 
-* See [Build ARM](inferencing.md#arm) below for information on building for ARM devices
+* See [here](inferencing.md#arm) for information on building for Arm-based devices
 
 ### Build Instructions
 {: .no_toc }
@@ -490,20 +439,20 @@ alias cmake="/usr/bin/cmake -DCMAKE_TOOLCHAIN_FILE=$OECORE_NATIVE_SYSROOT/usr/sh
 ./build.sh --use_armnn
 ```
 
-The Relu operator is set by default to use the CPU execution provider for better performance. To use the ArmNN implementation build with --armnn_relu flag
+The Relu operator is set by default to use the CPU execution provider for better performance. To use the Arm NN implementation build with --armnn_relu flag
 
 ```bash
 ./build.sh --use_armnn --armnn_relu
 ```
 
-The Batch Normalization operator is set by default to use the CPU execution provider. To use the ArmNN implementation build with --armnn_bn flag
+The Batch Normalization operator is set by default to use the CPU execution provider. To use the Arm NN implementation build with --armnn_bn flag
 
 ```bash
 ./build.sh --use_armnn --armnn_bn
 ```
 
-To use a library outside the normal environment you can set a custom path by providing the --armnn_home and --armnn_libs parameters to define the path to the ArmNN home directory and build directory respectively. 
-The ARM Compute Library home directory and build directory must also be available, and can be specified if needed using --acl_home and --acl_libs respectively.
+To use a library outside the normal environment you can set a custom path by providing the --armnn_home and --armnn_libs parameters to define the path to the Arm NN home directory and build directory respectively. 
+The Arm Compute Library home directory and build directory must also be available, and can be specified if needed using --acl_home and --acl_libs respectively.
 
 ```bash
 ./build.sh --use_armnn --armnn_home /path/to/armnn --armnn_libs /path/to/armnn/build  --acl_home /path/to/ComputeLibrary --acl_libs /path/to/acl/build
@@ -519,7 +468,7 @@ See more information on the RKNPU Execution Provider [here](../execution-provide
 
 
 * Supported platform: RK1808 Linux
-* See [Build ARM](inferencing.md#arm) below for information on building for ARM devices
+* See [here](inferencing.md#arm) for information on building for Arm-based devices
 * Use gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu instead of gcc-linaro-6.3.1-2017.05-x86_64_arm-linux-gnueabihf, and modify CMAKE_CXX_COMPILER & CMAKE_C_COMPILER in tool.cmake:
   
 ```
diff --git a/docs/build/inferencing.md b/docs/build/inferencing.md
index 4f9886913d078..125623ef28399 100644
--- a/docs/build/inferencing.md
+++ b/docs/build/inferencing.md
@@ -88,7 +88,8 @@ If you would like to use [Xcode](https://developer.apple.com/xcode/) to build th
 
 Without this flag, the cmake build generator will be Unix makefile by default.
 
-Today, Mac computers are either Intel-Based or Apple silicon(aka. ARM) based. By default, ONNX Runtime's build script only generate bits for the CPU ARCH that the build machine has. If you want to do cross-compiling: generate ARM binaries on a Intel-Based Mac computer, or generate x86 binaries on a Mac ARM computer, you can set the "CMAKE_OSX_ARCHITECTURES" cmake variable. For example:
+Today, Mac computers are either Intel-Based or Apple silicon-based. By default, ONNX Runtime's build script only generate bits for the CPU ARCH that the build machine has. If you want to do cross-compiling: generate arm64 binaries on a Intel-Based Mac computer, or generate x86 binaries on a Mac
+system with Apple silicon, you can set the "CMAKE_OSX_ARCHITECTURES" cmake variable. For example:
 
 Build for Intel CPUs:
 ```bash
@@ -107,6 +108,61 @@ The last command will generate a fat-binary for both CPU architectures.
 
 Note: unit tests will be skipped due to the incompatible CPU instruction set when doing cross-compiling.
 
+#### AIX
+In AIX, you can build ONNX Runtime for 64bit using
+
+* IBM Open XL compiler tool chain.
+  Minimum required AIX OS version is 7.2. You need to have 17.1.2 compiler PTF5 (17.1.2.5) version.
+* GNU GCC compiler tool chain.
+  Minimum required AIX OS version is 7.3. GCC version 10.3+ is required.
+
+For IBM Open XL, export below environment settings.
+```bash
+ulimit -m unlimited
+ulimit -d unlimited
+ulimit -n 2000
+ulimit -f unlimited
+export OBJECT_MODE=64
+export BUILD_TYPE="Release"
+export CC="/opt/IBM/openxlC/17.1.2/bin/ibm-clang" 
+export CXX="/opt/IBM/openxlC/17.1.2/bin/ibm-clang++_r"
+export CFLAGS="-pthread -m64 -D_ALL_SOURCE -mcmodel=large -Wno-deprecate-lax-vec-conv-all  -Wno-unused-but-set-variable -Wno-unused-command-line-argument -maltivec -mvsx  -Wno-unused-variable -Wno-unused-parameter -Wno-sign-compare"
+export CXXFLAGS="-pthread -m64 -D_ALL_SOURCE -mcmodel=large -Wno-deprecate-lax-vec-conv-all -Wno-unused-but-set-variable -Wno-unused-command-line-argument -maltivec -mvsx  -Wno-unused-variable -Wno-unused-parameter -Wno-sign-compare"
+export LDFLAGS="-L$PWD/build/Linux/$BUILD_TYPE/ -lpthread"
+export LIBPATH="$PWD/build/Linux/$BUILD_TYPE/"
+```
+For GCC, export below environment settings.
+```bash
+ulimit -m unlimited
+ulimit -d unlimited
+ulimit -n 2000
+ulimit -f unlimited
+export OBJECT_MODE=64
+export BUILD_TYPE="Release"
+export CC="gcc" 
+export CXX="g++"
+export CFLAGS="-maix64 -pthread -DFLATBUFFERS_LOCALE_INDEPENDENT=0 -maltivec -mvsx   -Wno-unused-function -Wno-unused-variable -Wno-unused-parameter -Wno-sign-compare -fno-extern-tls-init -Wl,-berok "
+export CXXFLAGS="-maix64 -pthread -DFLATBUFFERS_LOCALE_INDEPENDENT=0 -maltivec -mvsx  -Wno-unused-function -Wno-unused-variable -Wno-unused-parameter -Wno-sign-compare -fno-extern-tls-init -Wl,-berok "
+export LDFLAGS="-L$PWD/build/Linux/$BUILD_TYPE/ -Wl,-bbigtoc -lpython3.9"
+export LIBPATH="$PWD/build/Linux/$BUILD_TYPE"
+```
+To initiate build, run the below command
+```bash
+./build.sh \
+--config $BUILD_TYPE\
+  --build_shared_lib \
+  --skip_submodule_sync \
+  --cmake_extra_defines CMAKE_INSTALL_PREFIX=$PWD/install \
+  --parallel  
+```
+
+* If you want to install the package in a custom directory, then mention the directory location as value of CMAKE_INSTALL_PREFIX.
+* In case of IBM Open XL compiler tool chain, It is possible that in AIX 7.2 some of the runtime libraries like libunwind.a  needed for onnxruntime, will be missing. To fix this, you can install the relevant file-sets.
+* --parallel option in build option.
+  As name suggest, this option is for parallel building and resource intensive option. So, if your system is not having good amount of memory for each CPU core, then this option can be skipped. 
+* --allow_running_as_root  is needed if root user is triggering the build.
+    
+
 #### Notes
 
 * Please note that these instructions build the debug build, which may have performance tradeoffs. The "--config" parameter has four valid values: Debug, Release, RelWithDebInfo and MinSizeRel. Compared to "Release", "RelWithDebInfo" not only has debug info, it also disables some inlines to make the binary easier to debug. Thus RelWithDebInfo is slower than Release.
@@ -131,13 +187,14 @@ Note: unit tests will be skipped due to the incompatible CPU instruction set whe
 ### Architectures
 {: .no_toc }
 
-|           | x86_32       | x86_64       | ARM32v7      | ARM64        | PPC64LE | RISCV64 |
-|-----------|:------------:|:------------:|:------------:|:------------:|:-------:|:-------:|
-|Windows    | YES          | YES          |  YES         | YES          | NO      | NO      |
-|Linux      | YES          | YES          |  YES         | YES          | YES     | YES     |
-|macOS      | NO           | YES          |  NO          | NO           | NO      | NO      |
-|Android      | NO           | NO          |  YES          | YES           | NO      | NO      |
-|iOS      | NO           | NO          |  NO          | YES           | NO      | NO      |
+|           | x86_32       | x86_64       | ARM32v7      | ARM64        | PPC64LE | RISCV64 | PPC64BE |
+|-----------|:------------:|:------------:|:------------:|:------------:|:-------:|:-------:| :------:|
+|Windows    | YES          | YES          |  YES         | YES          | NO      | NO      | NO      |
+|Linux      | YES          | YES          |  YES         | YES          | YES     | YES     | NO      |
+|macOS      | NO           | YES          |  NO          | NO           | NO      | NO      | NO      |
+|Android      | NO           | NO          |  YES          | YES           | NO      | NO      | NO     |
+|iOS      | NO           | NO          |  NO          | YES           | NO      | NO      |  NO     |
+|AIX        | NO           | NO          |  NO          | NO           | NO      | NO      |  YES     |
 
 ### Build Environments(Host)
 {: .no_toc }
@@ -311,21 +368,21 @@ ORT_DEBUG_NODE_IO_DUMP_DATA_TO_FILES=1
     ```
 
 
-### ARM
+### Arm
 
-There are a few options for building ONNX Runtime for ARM. 
+There are a few options for building ONNX Runtime for Arm®-based devices. 
 
-First, you may do it on a real ARM device, or on a x86_64 device with an emulator(like qemu), or on a x86_64 device with a docker container with an emulator(you can run an ARM container on a x86_64 PC). Then the build instructions are essentially the same as the instructions for Linux x86_64. However, it wouldn't work if your the CPU you are targeting is not 64-bit since the build process needs more than 2GB memory.  
+First, you may do it on a real Arm-based device, or on a x86_64 device with an emulator(like qemu), or on a x86_64 device with a docker container with an emulator(you can run an Arm-based container on a x86_64 PC). Then the build instructions are essentially the same as the instructions for Linux x86_64. However, it wouldn't work if your the CPU you are targeting is not 64-bit since the build process needs more than 2GB memory.  
 
-* [Cross compiling for ARM with simulation (Linux/Windows)](#cross-compiling-for-arm-with-simulation-linuxwindows) - **Recommended**;  Easy, slow, ARM64 only(no support for ARM32)
+* [Cross compiling for Arm-based devices with simulation (Linux/Windows)](#cross-compiling-for-arm-based-devices-with-simulation-linuxwindows) - **Recommended**;  Easy, slow, ARM64 only(no support for ARM32)
 * [Cross compiling on Linux](#cross-compiling-on-linux) - Difficult, fast
 * [Cross compiling on Windows](#cross-compiling-on-windows)
 
-#### Cross compiling for ARM with simulation (Linux/Windows)
+#### Cross compiling for Arm-based devices with simulation (Linux/Windows)
 
 *EASY, SLOW, RECOMMENDED*
 
-This method relies on qemu user mode emulation. It allows you to compile using a desktop or cloud VM through instruction level simulation. You'll run the build on x86 CPU and translate every ARM instruction to x86. This is much faster than compiling natively on a low-end ARM device. The resulting ONNX Runtime Python wheel (.whl) file is then deployed to an ARM device where it can be invoked in Python 3 scripts. The build process can take hours, and may run of memory if the target CPU is 32-bit.
+This method relies on qemu user mode emulation. It allows you to compile using a desktop or cloud VM through instruction level simulation. You'll run the build on x86 CPU and translate every Arm architecture instruction to x86. This is potentially much faster than compiling natively on a low-end device. The resulting ONNX Runtime Python wheel (.whl) file is then deployed to an Arm-based device where it can be invoked in Python 3 scripts. The build process can take hours, and may run of memory if the target CPU is 32-bit.
 
 #### Cross compiling on Linux
 
@@ -364,12 +421,12 @@ This option is very fast and allows the package to be built in minutes, but is c
 
     You must also know what kind of flags your target hardware need, which can differ greatly. For example, if you just get the normal ARMv7 compiler and use it for Raspberry Pi V1 directly, it won't work because Raspberry Pi only has ARMv6. Generally every hardware vendor will provide a toolchain; check how that one was built.
 
-    A target env is identifed by:
+    A target env is identified by:
 
     * Arch: x86_32, x86_64, armv6,armv7,arvm7l,aarch64,...
     * OS: bare-metal or linux.
     * Libc: gnu libc/ulibc/musl/...
-    * ABI: ARM has mutilple ABIs like eabi, eabihf...
+    * ABI: Arm has multiple ABIs like eabi, eabihf...
 
     You can get all these information from the previous output, please be sure they are all correct.
    
@@ -528,8 +585,8 @@ This option is very fast and allows the package to be built in minutes, but is c
 
 **Using Visual C++ compilers**
 
-1. Download and install Visual C++ compilers and libraries for ARM(64).
-   If you have Visual Studio installed, please use the Visual Studio Installer (look under the section `Individual components` after choosing to `modify` Visual Studio) to download and install the corresponding ARM(64) compilers and libraries.
+1. Download and install Visual C++ compilers and libraries for Arm(64).
+   If you have Visual Studio installed, please use the Visual Studio Installer (look under the section `Individual components` after choosing to `modify` Visual Studio) to download and install the corresponding Arm(64) compilers and libraries.
 
 2. Use `.\build.bat` and specify `--arm` or `--arm64` as the build option to start building. Preferably use `Developer Command Prompt for VS` or make sure all the installed cross-compilers are findable from the command prompt being used to build using the PATH environmant variable.
 
diff --git a/docs/execution-providers/CUDA-ExecutionProvider.md b/docs/execution-providers/CUDA-ExecutionProvider.md
index 97374ff6e096d..81c0c4d270de3 100644
--- a/docs/execution-providers/CUDA-ExecutionProvider.md
+++ b/docs/execution-providers/CUDA-ExecutionProvider.md
@@ -35,12 +35,13 @@ Because of [Nvidia CUDA Minor Version Compatibility](https://docs.nvidia.com/dep
 
 ONNX Runtime built with cuDNN 8.x is not compatible with cuDNN 9.x, and vice versa. You can choose the package based on CUDA and cuDNN major versions that match your runtime environment (For example, PyTorch 2.3 uses cuDNN 8.x, while PyTorch 2.4 or later used cuDNN 9.x).
 
-### CUDA 12.x
+Note: starting ORT 1.19, **CUDA 12.x** becomes default version when distributing ONNX Runtime GPU packages in pypi.
 
-To install CUDA 12 package, please look at [Install ORT](../install).
+### CUDA 12.x
 
 | ONNX Runtime  | CUDA   | cuDNN | Notes                                                                |
 |---------------|--------|-------|----------------------------------------------------------------------|
+| 1.19.x        | 12.x   | 9.x   | Avaiable in pypi. Compatible with PyTorch >= 2.4.0 for cuda 12.x.    | 
 | 1.18.1        | 12.x   | 9.x   | cuDNN 9 is required. No Java package.                                | 
 | 1.18.0        | 12.x   | 8.x   | Java package is added.                                               |
 | 1.17.x        | 12.x   | 8.x   | Only C++/C# Nuget and Python packages are released. No Java package. |
@@ -49,7 +50,8 @@ To install CUDA 12 package, please look at [Install ORT](../install).
 
 | ONNX Runtime         | CUDA   | cuDNN                                   | Notes                                                                                                                                       |
 |----------------------|--------|-----------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|
-| 1.18.x               | 11.8   | 8.x                                     |                                                                                                                                             |
+| 1.19.x               | 11.8   | 8.x                                     | Not available in pypi. See [Install ORT](../install) for detail. Compatible with PyTorch <= 2.3.1 for CUDA 11.8.                            |
+| 1.18.x               | 11.8   | 8.x                                     | Available in pypi                                                                                                                           |
 | 1.17<br>1.16<br>1.15 | 11.8   | 8.2.4 (Linux)<br/>8.5.0.96 (Windows)    | Tested with CUDA versions from 11.6 up to 11.8, and cuDNN from 8.2 up to 8.9                                                                |
 | 1.14<br>1.13         | 11.6   | 8.2.4 (Linux)<br/>8.5.0.96 (Windows)    | libcudart 11.4.43<br/>libcufft 10.5.2.100<br/>libcurand 10.2.5.120<br/>libcublasLt 11.6.5.2<br/>libcublas 11.6.5.2<br/>libcudnn 8.2.4       |
 | 1.12<br>1.11         | 11.4   | 8.2.4 (Linux)<br/>8.2.2.26 (Windows)    | libcudart 11.4.43<br/>libcufft 10.5.2.100<br/>libcurand 10.2.5.120<br/>libcublasLt 11.6.5.2<br/>libcublas 11.6.5.2<br/>libcudnn 8.2.4       |
diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md
index af752b1a85e7e..6ffa77edc60b5 100644
--- a/docs/execution-providers/CoreML-ExecutionProvider.md
+++ b/docs/execution-providers/CoreML-ExecutionProvider.md
@@ -128,10 +128,12 @@ Operators that are supported by the CoreML Execution Provider when a NeuralNetwo
 |ai.onnx.ReduceSum||
 |ai.onnx:Relu||
 |ai.onnx:Reshape||
-|ai.onnx:Resize||
+|ai.onnx:Resize|4D input.<br/>`coordinate_transformation_mode` == `asymmetric`.<br/>`mode` == `linear` or `nearest`.<br/>`nearest_mode` == `floor`.<br/>`exclude_outside` == false<br/>`scales` or `sizes` must be constant.|
 |ai.onnx:Shape|Attribute `start` with non-default value is not supported.<br/>Attribute `end` is not supported.|
 |ai.onnx:Sigmoid||
 |ai.onnx:Slice|Inputs `starts`, `ends`, `axes`, and `steps` should be constant. Empty slice is not supported.|
+|ai.onnx:Softmax||
+|ai.onnx:Split|If provided, `splits` must be constant.|
 |ai.onnx:Squeeze||
 |ai.onnx:Sqrt||
 |ai.onnx:Sub||
@@ -147,15 +149,26 @@ Operators that are supported by the CoreML Execution Provider when a MLProgram m
 |ai.onnx:Add||
 |ai.onnx:AveragePool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.|
 |ai.onnx:Clip||
+|ai.onnx:Concat||
 |ai.onnx:Conv|Only 1D/2D Conv is supported.<br/>Bias if provided must be constant.|
+|ai.onnx:ConvTranspose|Weight and bias must be constant.<br/>padding_type of SAME_UPPER/SAME_LOWER is not supported.<br/>kernel_shape must have default values.<br/>output_shape is not supported.<br/>output_padding must have default values.|
+|ai.onnx.DepthToSpace|If 'mode' is 'CRD' the input must have a fixed shape.|
 |ai.onnx:Div||
 |ai.onnx:Gemm|Input B must be constant.|
 |ai.onnx:GlobalAveragePool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.|
 |ai.onnx:GlobalMaxPool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.|
+|ai.onnx:GridSample|4D input.<br/>'mode' of 'linear' or 'zeros'.<br/>(mode==linear && padding_mode==reflection && align_corners==0) is not supported.|
+|ai.onnx.LeakyRelu||
 |ai.onnx:MatMul|Only support for transA == 0, alpha == 1.0 and beta == 1.0 is currently implemented.|
 |ai.onnx:MaxPool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.|
 |ai.onnx:Mul||
 |ai.onnx:Pow|Only supports cases when both inputs are fp32.|
 |ai.onnx:Relu||
 |ai.onnx:Reshape||
+|ai.onnx:Resize|See [resize_op_builder.cc](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/coreml/builders/impl/resize_op_builder.cc) implementation. There are too many permutations to describe the valid combinations.|
+|ai.onnx.Slice|starts/ends/axes/steps must be constant initializers.|
+|ai.onnx.Split|If provided, `splits` must be constant.|
 |ai.onnx:Sub||
+|ai.onnx:Sigmoid||
+|ai.onnx:Tanh||
+|ai.onnx:Transpose||
diff --git a/docs/execution-providers/EP-Context-Design.md b/docs/execution-providers/EP-Context-Design.md
new file mode 100644
index 0000000000000..8e5ffcbb962dd
--- /dev/null
+++ b/docs/execution-providers/EP-Context-Design.md
@@ -0,0 +1,82 @@
+---
+title: EP context design
+description: ONNX Runtime EP context cache feature design
+parent: Execution Providers
+nav_order: 16
+redirect_from: /docs/reference/execution-providers/EP-Context-Design
+---
+
+# OnnxRuntime EP context cache feature design
+{: .no_toc }
+
+## Contents
+{: .no_toc }
+
+* TOC placeholder
+{:toc}
+
+## Background
+
+OnnxRuntime Execution Providers enable users to inference Onnx model on different kinds of hardware accelerators empowered by backend SDKs (like QNN, OpenVINO, Vitis AI, etc). The Execution Providers converts the Onnx model into graph format required by the backend SDK, and compiles it into the format required by the hardware. Specific to NPU world, the converting and compiling process takes a long time to complete, especially for LLM models. The session creation time costs tens of minutes for some cases which impacts the user experience badly.
+To avoid the converting and compiling cost, most of the backend SDKs provide the feature to dump the pre-compiled model into binary file. The pre-compiled model can be loaded by backend SDK directly and executed on the target device. It improves the session creation time greatly by using this way. In order to achieve this, OnnxRuntime defined a contribute Op called EPContext in MS domain.
+
+## EPContext Op Schema
+
+Op domain: com.microsoft
+Node inputs & outputs: variadic
+Domain: com.microsoft
+Atrribures:
+
+|Attributes           |Data type|Description                                                                                               |
+|---------------------|---------|----------------------------------------------------------------------------------------------------------|
+|main_context         |int64    |1 (default): This node points to an EP context content that contains the graph referred to by this node.<br/>0: The node does not point to any EP context content. Expect to get the graph from node with this field is 1.<br/>Some EPs support 1 single context contains multiple graphs. The EPContext node with main_context=1 refers to the real context. And the context contains graphs that are referred by other nodes with main_context=0.|
+|ep_cache_context     |string   |Payload of the EP context if embed_mode=1, or path to the context file if embed_mode=0.<br/>The path is a relative path to the Onnx model file. It can be a file name, or subfolder/filename|
+|embed_mode           |int64    |1(default): ep_cache_context contains the payload of context content.<br/>0: ep_cache_context is the context binary file path.|
+|ep_sdk_version       |string   |Optional. SDK version that used to generate the node.                                                     |
+|onnx_model_filename  |string   |Optional. Original Onnx model file name.                                                                  |
+|hardware_architecture|string   |Optional. Hardware architecture.|
+|partition_name       |string   |Optional. OnnxRuntime partitioned graph name.|
+|source               |string   |Optional. The source used to generate the node. Should be a key identified by the EP so that OnnxRuntime can support multiple EPContext nodes run with different EPs. For example, QNN EP only accepts nodes with source=QNN or QnnExecutionProvider, OpenVINO EP only accepts nodes with source=OpenVINOExecutionProvider.|
+|notes                |string   |Optional. Additional information required by specific EP.                                                 |
+
+<p align="center"><img width="60%" src="../../images/EP_context_node.png" alt="EP Context node example"/></p>
+
+## OnnxRuntime Session options related to EP context cache generation and inference
+
+|Session option             |Description                                                                                               |
+|---------------------------|----------------------------------------------------------------------------------------------------------|
+|ep.context_enable          |Used for context model generation only.<br/>1: Enable OnnxRuntime to dump the context cache model.<br/>0 (default): disable.|
+|ep.context_file_path       |Specify the file path for the dump model.<br/>Default to original_file_name.onnx_ctx.onnx for context model generation.<br/>For model inference, if user loads model from memory buffer and the EP context binary is outside the Onnx model, user need to set this option. OnnxRuntime EP use this path to get the folder path together with the ep_cache_context (which point to the contex binary path) to get the absoluate path for the context binary file.|
+|ep.context_embed_mode      |Used for context model generation only.<br/>1 (default): dump the EP context content into the Onnx model, inside ep_cache_context node attribute.<br/>0: dump the EP context content into a separate file, keep the file name in the Onnx model. File path tracked in ep_cache_context node attribute.|
+|ep.context_node_name_prefix|Used for context model generation only.<br/>Specify the EPContext node name (also the partition_name attribute, internal graph name) prefix to make it unique across nodes in case user glue multiple EPContext nodes in one model to avoid conflict.|
+
+## EP Context cache model generation workflow
+
+OnnxRuntime EPs should flows these rules to create the EP context cache model to maintain a unified user interface.
+1. ep.context_enable
+  OnnxRuntime create the EP context cache model if ep.context_enable = 1. Otherwise, ep.context_enable = 0 (default), just do the normal workflow.
+2. ep.context_file_path
+  OnnxRuntime just append “_ctx.onnx” to the input file name as the output file name if no ep.context_file_path provided. Otherwise just use the user provided file path.
+  ep.context_file_path is required if user loads the model from memory buffer, since there’s no way for OnnxRuntime to get the input file path for this scenario.
+3. ep.context_embed_mode
+  1 (default): dump the EP context context content into the Onnx model.
+  0: dump the EP context content as a separate file. EP decides the file name and tracks the file name in EPContext node attribute ep_cache_context. The separate file should always at the same location as the dumped Onnx model file. And the file path tracked in EPContext node is a relative path to the Onnx model file. Note: subfolder is allowed.
+4. ep.context_node_name_prefix
+  In case the user wants to add special tag inside the EPContext node name (also the partition_name attribute, and graph name), EP should provide this capability when EP creates the EPContext nodes.
+  This is useful if the user wants to glue multiple EPContext nodes from multiple models into one model and there’s risk that node name (graph name) confliction happens across models. Dependes on EP implementation. QNN EP supports multiple EPContext nodes, so user can merge and re-connect EPContext nodes from different models.
+
+## Inference from EP Context cache model workflow
+
+OnnxRuntime EPs which support loading from Onnx model with EPContext nodes should follow the workflow/rules for model inference.
+1. EP should be able to identify the model which has EPContext node.
+  a. EP follows its normal workflow if there’s no EPContext nodes inside the model.
+  b. If it is the Onnx model has EPContext nodes.
+    i. EP should check the source node attribute from all EPContext nodes to make sure there is any EPContext node for this EP (the source node attribute matches the key required by the EP).
+    ii. EP only partition in the EPContext nodes which has source node attribute matches the key required by the EP.
+    iii. EP loads from the cached context inside EPContext node
+2. If the context cache Onnx model is dumped with embed_mode = 1, so there is separate context binary file beside the Onnx model in the same folder. 
+  a. OnnxRuntime EP gets the context binary file relative path from EPContext ep_cache_context node attribute.
+  b. If the user loads the model from a Onnx model file path, then EP should get the input model folder path, and combine it with the relative path got from step a) as the context binary file full path.
+  c. If the user loads the model from memory buffer, user needs to provide session option ep.context_file_path. EP gets the folder path from ep.context_file_path, and combines it with the relative path   got from step a) as the context binary file full path. 
+
+<p align="center"><img width="60%" src="../../images/EP_context_nodes_with_different_eps.png" alt="EP Context nodes with different EPs"/></p>
diff --git a/docs/execution-providers/OpenVINO-ExecutionProvider.md b/docs/execution-providers/OpenVINO-ExecutionProvider.md
index 39ec668bc0bf9..fa71f70b0c277 100644
--- a/docs/execution-providers/OpenVINO-ExecutionProvider.md
+++ b/docs/execution-providers/OpenVINO-ExecutionProvider.md
@@ -20,7 +20,7 @@ Accelerate ONNX models on Intel CPUs, GPUs, NPU with Intel OpenVINO™ Execution
 ## Install
 
 Pre-built packages and Docker images are published for OpenVINO™ Execution Provider for ONNX Runtime by Intel for each release.
-* OpenVINO™ Execution Provider for ONNX Runtime Release page: [Latest v5.2 Release](https://github.com/intel/onnxruntime/releases)
+* OpenVINO™ Execution Provider for ONNX Runtime Release page: [Latest v5.4 Release](https://github.com/intel/onnxruntime/releases)
 * Python wheels Ubuntu/Windows: [onnxruntime-openvino](https://pypi.org/project/onnxruntime-openvino/)
 * Docker image: [openvino/onnxruntime_ep_ubuntu20](https://hub.docker.com/r/openvino/onnxruntime_ep_ubuntu20)
 
@@ -30,10 +30,9 @@ ONNX Runtime OpenVINO™ Execution Provider is compatible with three lastest rel
 
 |ONNX Runtime|OpenVINO™|Notes|
 |---|---|---|
+|1.19.0|2024.3|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.4)|
+|1.18.0|2024.1|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.3)|
 |1.17.1|2023.3|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.2)|
-|1.16.0|2023.1|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.1)|
-|1.15.0|2023.0|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.0.0)|
-|1.14.0|2022.3|[Details](https://github.com/intel/onnxruntime/releases/tag/v4.3)|
 
 ## Build
 
@@ -200,8 +199,30 @@ For more information on Multi-Device plugin of OpenVINO™, please refer to the
 [Intel OpenVINO™ Multi Device Plugin](https://docs.openvino.ai/latest/openvino_docs_OV_UG_Running_on_multiple_devices.html).
 
 ### Export OpenVINO Compiled Blob 
-Export the OpenVINO compiled blob as an ONNX model. Using this ONNX model for subsequent inferences avoids model recompilation and could have a positive impact on Session creation time. The exported model is saved to the same directory as the source model with the suffix -ov_{device}_blob.onnx where device can be one of the supported like CPU or NPU. This feature is currently enabled for fully supported models only. 
-Refer to [Configuration Options](#configuration-options) for more information about using these runtime options.
+Export the OpenVINO compiled blob as an ONNX model. Using this ONNX model for subsequent inferences avoids model recompilation and could have a positive impact on Session creation time. This feature is currently enabled for fully supported models only. It complies with the ORT session config keys
+```
+  Ort::SessionOptions session_options;
+
+      // Enable EP context feature to dump the partitioned graph which includes the EP context into Onnx file.
+      // "0": disable. (default)
+      // "1": enable.
+
+  session_options.AddConfigEntry(kOrtSessionOptionEpContextEnable, "1");
+
+      // Flag to specify whether to dump the EP context into single Onnx model or pass bin path.
+      // "0": dump the EP context into separate file, keep the file name in the Onnx model.
+      // "1": dump the EP context into the Onnx model. (default).
+
+  session_options.AddConfigEntry(kOrtSessionOptionEpContextEmbedMode, "1");
+
+      // Specify the file path for the Onnx model which has EP context.
+      // Defaults to <actual_model_path>/original_file_name_ctx.onnx if not specified
+
+  session_options.AddConfigEntry(kOrtSessionOptionEpContextFilePath, ".\ov_compiled_epctx.onnx");
+
+  sess = onnxruntime.InferenceSession(<path_to_model_file>, session_options)
+```
+Refer to [Session Options](https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h) for more information about session options.
 
 ### Enable QDQ Optimizations Passes
 Optimizes ORT quantized models for the NPU device to only keep QDQs for supported ops and optimize for performance and accuracy.Generally this feature will give better performance/accuracy with ORT Optimizations disabled. 
@@ -239,8 +260,7 @@ The session configuration options are passed to SessionOptionsAppendExecutionPro
 
 ```
 OrtOpenVINOProviderOptions options;
-options.device_type = "GPU";
-options.precision = "FP32"; 
+options.device_type = "GPU_FP32";
 options.num_of_threads = 8;
 options.cache_dir = "";
 options.context = 0x123456ff;
@@ -277,7 +297,6 @@ The following table lists all the available configuration options for API 2.0 an
 | context | string | OpenCL Context | void* | This option is only available when OpenVINO EP is built with OpenCL flags enabled. It takes in the remote context i.e the cl_context address as a void pointer.|
 | enable_opencl_throttling | string | True/False | boolean | This option enables OpenCL queue throttling for GPU devices (reduces CPU utilization when using GPU). |
 | enable_qdq_optimizer | string | True/False | boolean | This option enables QDQ Optimization to improve model performance and accuracy on NPU. |
-| export_ep_ctx_blob | string | True/False | boolean | This options enables exporting the OpenVINO Compiled Blob as an ONNX Operator EPContext. | 
 
 
 Valid Hetero or Multi or Auto Device combinations:
diff --git a/docs/execution-providers/QNN-ExecutionProvider.md b/docs/execution-providers/QNN-ExecutionProvider.md
index 7558ea51582e1..1cf50ecadc517 100644
--- a/docs/execution-providers/QNN-ExecutionProvider.md
+++ b/docs/execution-providers/QNN-ExecutionProvider.md
@@ -431,6 +431,51 @@ g_ort->AddSessionConfigEntry(session_options, kOrtSessionOptionEpContextEmbedMod
 options.add_session_config_entry("ep.context_embed_mode", "0")
 ```
 
+## QNN EP weight sharing
+
+### Weight sharing in Onnx domain
+Weight sharing in Onnx means multiple Onnx models with external weights point to the same external weight file. The Onnx models share same tensor names so that they reference to the same tensor data.
+<p align="center"><img width="50%" src="../../images/Onnx_weight_sharing.png" alt="Weight sharing across Onnx models"/></p>
+
+### Weight sharing in QNN domain
+QNN weight sharing is enabled with QNN pre-generated QNN context binary. It requires users to generate context binary offline on Linux x86_64 or Windows x86_64 machine (Windows support since QNN 2.26). The QNN context binary contains multiple graphs which share the same tensors.
+<p align="center"><img width="30%" src="../../images/Qnn_weight_sharing.png" alt="Weight sharing in QNN context binary"/></p>
+
+### Weight sharing in QNN domain
+The way OnnxRuntime to convert Onnx model with weight sharing to QNN context binary with weight sharing.
+1. Create QNN context with weight sharing configuration enabled.
+2. Convert and compile model1.onnx into QNN context (get Qnn graph1).
+3. Convert and compile model2.onnx into QNN context (get Qnn graph2).
+4. Repeat step 2 if more models.
+5. Generated the QNN context binary file, generated wrapped Onnx model with EPContext nodes.
+OnnxRuntime QNN EP provides [OnnxRuntime_qnn_ctx_gen](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/test/qnn_ctx_gen) tool to complete these steps.
+Example command line:
+```
+./onnxruntime_qnn_ctx_gen -i "soc_model|60 htp_graph_finalization_optimization_mode|3" ./model1.onnx,./model2.onnx
+```
+It creates 2 Onnx model (model1.onnx_ctx.onnx, model2.onnx_ctx.onnx) and a QNN context binary file (model2.onnx_ctx.onnx_xxx.bin).
+<p align="center"><img width="90%" src="../../images/Ort_Qnn_Ep_weight_sharing.png" alt="Weight sharing from Onnx to QNN"/></p>
+If user creates the QNN context binary .bin file weight sharing from QNN toolchain (qnn-context-binary-generator). The context binary .bin file looks the same. User needs to create model1.onnx and model2.onnx with EPContext node which points to this .bin file. Each EPContext node should refer (node name and partition_name) to different Qnn graph names from the QNN context. Here’s an example script for reference [gen_qnn_ctx_onnx_model.py](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/qnn/gen_qnn_ctx_onnx_model.py) which wraps one single QNN graph into EPContext node. 
+
+### Inference with QNN resource sharing workflow
+OnnxRuntime inference session need to have resource sharing enabled (set session option ep.share_ep_contexts to 1) to use the dumped Qnn context model with weight sharing enabled.
+1. Create OnnxRuuntime inference session with ep.share_ep_contexts=1, loads the model1.onnx_ctx.onnx model.
+ 1.1 The session loads the model1.onnx_ctx.onnx model.
+ 1.2 The shared place is empty.
+ 1.3 EPContext node1 in model1.onnx_ctx.onnx specifies that it uses Qnn_graph1
+ 1.4 QNN EP loads the qnn_ctx.bin and deserialize the binary to get Qnn graphs (Qnn_graph1, Qnn_graph2).
+ 1.5 Uses Qnn_graph1 for this OnnxRuntime session.
+ 1.6 Put the Qnn_graph2 into the shared place.
+2. Create OnnxRuuntime inference session with ep.share_ep_contexts=1, loads the model2.onnx_ctx.onnx model.
+ 2.1 The session loads the model2.onnx_ctx.onnx model.
+ 2.2 The EPContext node2 in model2.onnx_ctx.onnx specifies that it uses Qnn_graph2.
+ 2.3 The shared place has Qnn_graph2.
+ 2.4 QNN EP skips loading qnn_ctx.bin since it gets what it wants from the shared place.
+ 2.5 Uses Qnn_graph2 from the shared place for this session.
+3. To avoid issues while existing execution, user needs to destroy the 2nd session first, then the 1st session.
+
+[Code example](https://github.com/microsoft/onnxruntime/blob/291a5352b27ded5714e5748b381f2efb88f28fb9/onnxruntime/test/providers/qnn/qnn_ep_context_test.cc#L979-L992).
+
 ## Usage
 ### C++
 C API details are [here](../get-started/with-c.md).
diff --git a/docs/execution-providers/TensorRT-ExecutionProvider.md b/docs/execution-providers/TensorRT-ExecutionProvider.md
index 3671f418c5078..ded86899eee6e 100644
--- a/docs/execution-providers/TensorRT-ExecutionProvider.md
+++ b/docs/execution-providers/TensorRT-ExecutionProvider.md
@@ -27,21 +27,24 @@ See [Build instructions](../build/eps.md#tensorrt).
 
 ## Requirements
 
-| ONNX Runtime | TensorRT | CUDA       |
-| :----------- | :------- | :--------- |
-| 1.18-main    | 10.0     | 11.8, 12.2 |
-| 1.17         | 8.6      | 11.8, 12.2 |
-| 1.16         | 8.6      | 11.8       |
-| 1.15         | 8.6      | 11.8       |
-| 1.14         | 8.5      | 11.6       |
-| 1.12-1.13    | 8.4      | 11.4       |
-| 1.11         | 8.2      | 11.4       |
-| 1.10         | 8.0      | 11.4       |
-| 1.9          | 8.0      | 11.4       |
-| 1.7-1.8      | 7.2      | 11.0.3     |
-| 1.5-1.6      | 7.1      | 10.2       |
-| 1.2-1.4      | 7.0      | 10.1       |
-| 1.0-1.1      | 6.0      | 10.0       |
+Note: starting ORT 1.19, **CUDA 12** becomes default version when distributing ONNX Runtime GPU packages.
+
+| ONNX Runtime | TensorRT | CUDA           |
+| :----------- | :------- | :------------- |
+| 1.19-main    | 10.2     | **12.x**, 11.8 |
+| 1.18         | 10.0     | 11.8, 12.x     |
+| 1.17         | 8.6      | 11.8, 12.x     |
+| 1.16         | 8.6      | 11.8           |
+| 1.15         | 8.6      | 11.8           |
+| 1.14         | 8.5      | 11.6           |
+| 1.12-1.13    | 8.4      | 11.4           |
+| 1.11         | 8.2      | 11.4           |
+| 1.10         | 8.0      | 11.4           |
+| 1.9          | 8.0      | 11.4           |
+| 1.7-1.8      | 7.2      | 11.0.3         |
+| 1.5-1.6      | 7.1      | 10.2           |
+| 1.2-1.4      | 7.0      | 10.1           |
+| 1.0-1.1      | 6.0      | 10.0           |
 
 For more details on CUDA/cuDNN versions, please see [CUDA EP requirements](./CUDA-ExecutionProvider.md#requirements).
 
@@ -565,7 +568,7 @@ export ORT_TENSORRT_CONTEXT_MEMORY_SHARING_ENABLE=1
 </details>
 
 ## TensorRT EP Caches
-There are three major TRT EP cahces:
+There are three major TRT EP caches:
 * TRT timing cache
 * TRT engine cache
 * Embedded engine model / EPContext model
diff --git a/docs/execution-providers/Vitis-AI-ExecutionProvider.md b/docs/execution-providers/Vitis-AI-ExecutionProvider.md
index 655b563bcaff4..6e95434e2b7c5 100644
--- a/docs/execution-providers/Vitis-AI-ExecutionProvider.md
+++ b/docs/execution-providers/Vitis-AI-ExecutionProvider.md
@@ -27,9 +27,9 @@ The following table lists AMD targets that are supported by the Vitis AI ONNX Ru
 | **Architecture**   							    | **Family**                                                 | **Supported Targets**                                      | **Supported OS**                                           |
 |---------------------------------------------------|------------------------------------------------------------|------------------------------------------------------------|------------------------------------------------------------|
 | AMD64							                    | Ryzen AI                                                   | AMD Ryzen 7040U, 7040HS                                    | Windows                                                    |
-| ARM64 Cortex-A53   				                | Zynq UltraScale+ MPSoC                                     | ZCU102, ZCU104, KV260                                      | Linux                                                      |
-| ARM64 Cortex-A72				                    | Versal AI Core / Premium                                   | VCK190                                                     | Linux                                                      |
-| ARM64	Cortex-A72						            | Versal AI Edge                                             | VEK280                                                     | Linux                                                      |
+| Arm® Cortex®-A53   				                | Zynq UltraScale+ MPSoC                                     | ZCU102, ZCU104, KV260                                      | Linux                                                      |
+| Arm® Cortex®-A72				                    | Versal AI Core / Premium                                   | VCK190                                                     | Linux                                                      |
+| Arm® Cortex®-A72						            | Versal AI Edge                                             | VEK280                                                     | Linux                                                      |
 
 
 AMD Adaptable SoC developers can also leverage the Vitis AI ONNX Runtime Execution Provider to support custom (chip-down) designs.
diff --git a/docs/execution-providers/Xnnpack-ExecutionProvider.md b/docs/execution-providers/Xnnpack-ExecutionProvider.md
index c1900aa841860..f58929a0d6c1a 100644
--- a/docs/execution-providers/Xnnpack-ExecutionProvider.md
+++ b/docs/execution-providers/Xnnpack-ExecutionProvider.md
@@ -8,7 +8,7 @@ nav_order: 9
 
 # XNNPACK Execution Provider
 
-Accelerate ONNX models on Android/iOS devices and WebAssembly with ONNX Runtime and the XNNPACK execution provider. [XNNPACK](https://github.com/google/XNNPACK) is a highly optimized library of floating-point neural network inference operators for ARM, WebAssembly, and x86 platforms.
+Accelerate ONNX models on Android/iOS devices and WebAssembly with ONNX Runtime and the XNNPACK execution provider. [XNNPACK](https://github.com/google/XNNPACK) is a highly optimized library of floating-point neural network inference operators for Arm®-based, WebAssembly, and x86 platforms.
 
 ## Contents
 {: .no_toc }
diff --git a/docs/execution-providers/community-maintained/ACL-ExecutionProvider.md b/docs/execution-providers/community-maintained/ACL-ExecutionProvider.md
index f894dcc86f1a1..02a0edf4e743d 100644
--- a/docs/execution-providers/community-maintained/ACL-ExecutionProvider.md
+++ b/docs/execution-providers/community-maintained/ACL-ExecutionProvider.md
@@ -10,14 +10,7 @@ redirect_from: /docs/reference/execution-providers/ACL-ExecutionProvider
 # ACL Execution Provider
 {: .no_toc }
 
-The integration of ACL as an execution provider (EP) into ONNX Runtime accelerates performance of ONNX model workloads across Armv8 cores. [Arm Compute Library](https://github.com/ARM-software/ComputeLibrary){:target="_blank"} is an open source inference engine maintained by Arm and Linaro companies.
-
-
-## Contents
-{: .no_toc }
-
-* TOC placeholder
-{:toc}
+The ACL Execution Provider enables accelerated performance on Arm®-based CPUs through [Arm Compute Library](https://github.com/ARM-software/ComputeLibrary){:target="_blank"}.
 
 
 ## Build
@@ -30,10 +23,44 @@ For build instructions, please see the [build page](../../build/eps.md#arm-compu
 ```
 Ort::Env env = Ort::Env{ORT_LOGGING_LEVEL_ERROR, "Default"};
 Ort::SessionOptions sf;
-bool enable_cpu_mem_arena = true;
-Ort::ThrowOnError(OrtSessionOptionsAppendExecutionProvider_ACL(sf, enable_cpu_mem_arena));
+bool enable_fast_math = true;
+Ort::ThrowOnError(OrtSessionOptionsAppendExecutionProvider_ACL(sf, enable_fast_math));
 ```
 The C API details are [here](../../get-started/with-c.html).
 
+### Python
+{: .no_toc }
+
+```
+import onnxruntime
+
+providers = [("ACLExecutionProvider", {"enable_fast_math": "true"})]
+sess = onnxruntime.InferenceSession("model.onnx", providers=providers)
+```
+
 ## Performance Tuning
-When/if using [onnxruntime_perf_test](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/test/perftest){:target="_blank"}, use the flag -e acl
+Arm Compute Library has a fast math mode that can increase performance with some potential decrease in accuracy for MatMul and Conv operators. It is disabled by default.
+
+When using [onnxruntime_perf_test](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/test/perftest){:target="_blank"}, use the flag `-e acl` to enable the ACL Execution Provider.  You can additionally use `-i 'enable_fast_math|true'` to enable fast math.
+
+Arm Compute Library uses the ONNX Runtime intra-operator thread pool when running via the execution provider. You can control the size of this thread pool using the `-x` option.
+
+## Supported Operators
+
+|Operator|Supported types|
+|---|---|
+|AveragePool|float|
+|BatchNormalization|float|
+|Concat|float|
+|Conv|float, float16|
+|FusedConv|float|
+|FusedMatMul|float, float16|
+|Gemm|float|
+|GlobalAveragePool|float|
+|GlobalMaxPool|float|
+|MatMul|float, float16|
+|MatMulIntegerToFloat|uint8, int8, uint8+int8|
+|MaxPool|float|
+|NhwcConv|float|
+|Relu|float|
+|QLinearConv|uint8, int8, uint8+int8|
diff --git a/docs/execution-providers/community-maintained/ArmNN-ExecutionProvider.md b/docs/execution-providers/community-maintained/ArmNN-ExecutionProvider.md
index 57d07af02bc3a..e38a0a75ef92d 100644
--- a/docs/execution-providers/community-maintained/ArmNN-ExecutionProvider.md
+++ b/docs/execution-providers/community-maintained/ArmNN-ExecutionProvider.md
@@ -7,7 +7,7 @@ nav_order: 2
 redirect_from: /docs/reference/execution-providers/ArmNN-ExecutionProvider
 ---
 
-# ArmNN Execution Provider
+# Arm NN Execution Provider
 {: .no_toc}
 
 ## Contents
@@ -16,14 +16,14 @@ redirect_from: /docs/reference/execution-providers/ArmNN-ExecutionProvider
 * TOC placeholder
 {:toc}
 
-Accelerate performance of ONNX model workloads across Armv8 cores with the ArmNN execution provider. [ArmNN](https://github.com/ARM-software/armnn) is an open source inference engine maintained by Arm and Linaro companies. 
+Accelerate performance of ONNX model workloads across Arm®-based devices with the Arm NN execution provider. [Arm NN](https://github.com/ARM-software/armnn) is an open source inference engine maintained by Arm and Linaro companies. 
 
 ## Build
-For build instructions, please see the [BUILD page](../../build/eps.md#armnn).
+For build instructions, please see the [BUILD page](../../build/eps.md#arm-nn).
 
 ## Usage
 ### C/C++
-To use ArmNN as execution provider for inferencing, please register it as below.
+To use Arm NN as execution provider for inferencing, please register it as below.
 ```
 Ort::Env env = Ort::Env{ORT_LOGGING_LEVEL_ERROR, "Default"};
 Ort::SessionOptions so;
diff --git a/docs/execution-providers/index.md b/docs/execution-providers/index.md
index 1e2c13abcf67f..52687f6f48d2c 100644
--- a/docs/execution-providers/index.md
+++ b/docs/execution-providers/index.md
@@ -24,9 +24,9 @@ ONNX Runtime supports many different execution providers today. Some of the EPs
 |CPU|GPU|IoT/Edge/Mobile|Other|
 ---|---|---|---
 |Default CPU|[NVIDIA CUDA](../execution-providers/CUDA-ExecutionProvider.md)|[Intel OpenVINO](../execution-providers/OpenVINO-ExecutionProvider.md)|[Rockchip NPU](../execution-providers/community-maintained/RKNPU-ExecutionProvider.md) (*preview*)|
-|[Intel DNNL](../execution-providers/oneDNN-ExecutionProvider.md)|[NVIDIA TensorRT](../execution-providers/TensorRT-ExecutionProvider.md)|[ARM Compute Library](../execution-providers/community-maintained/ACL-ExecutionProvider.md) (*preview*)|[Xilinx Vitis-AI](../execution-providers/Vitis-AI-ExecutionProvider.md) (*preview*)|
+|[Intel DNNL](../execution-providers/oneDNN-ExecutionProvider.md)|[NVIDIA TensorRT](../execution-providers/TensorRT-ExecutionProvider.md)|[Arm Compute Library](../execution-providers/community-maintained/ACL-ExecutionProvider.md) (*preview*)|[Xilinx Vitis-AI](../execution-providers/Vitis-AI-ExecutionProvider.md) (*preview*)|
 |[TVM](../execution-providers/community-maintained/TVM-ExecutionProvider.md) (*preview*)|[DirectML](../execution-providers/DirectML-ExecutionProvider.md)|[Android Neural Networks API](../execution-providers/NNAPI-ExecutionProvider.md)|[Huawei CANN](../execution-providers/community-maintained/CANN-ExecutionProvider.md) (*preview*)|
-|[Intel OpenVINO](../execution-providers/OpenVINO-ExecutionProvider.md)|[AMD MIGraphX](../execution-providers/MIGraphX-ExecutionProvider.md)|[ARM-NN](../execution-providers/community-maintained/ArmNN-ExecutionProvider.md) (*preview*)|[AZURE](../execution-providers/Azure-ExecutionProvider.md) (*preview*)|
+|[Intel OpenVINO](../execution-providers/OpenVINO-ExecutionProvider.md)|[AMD MIGraphX](../execution-providers/MIGraphX-ExecutionProvider.md)|[Arm NN](../execution-providers/community-maintained/ArmNN-ExecutionProvider.md) (*preview*)|[AZURE](../execution-providers/Azure-ExecutionProvider.md) (*preview*)|
 |[XNNPACK](../execution-providers/Xnnpack-ExecutionProvider.md)|[Intel OpenVINO](../execution-providers/OpenVINO-ExecutionProvider.md)|[CoreML](../execution-providers/CoreML-ExecutionProvider.md) (*preview*)|
 ||[AMD ROCm](../execution-providers/ROCm-ExecutionProvider.md)|[TVM](../execution-providers/community-maintained/TVM-ExecutionProvider.md) (*preview*)|
 ||[TVM](../execution-providers/community-maintained/TVM-ExecutionProvider.md) (*preview*)|[Qualcomm QNN](../execution-providers/QNN-ExecutionProvider.md)|
diff --git a/docs/genai/howto/build-from-source.md b/docs/genai/howto/build-from-source.md
index 012d8ea2fd048..1fbcab494e3fa 100644
--- a/docs/genai/howto/build-from-source.md
+++ b/docs/genai/howto/build-from-source.md
@@ -16,7 +16,7 @@ nav_order: 2
 ## Pre-requisites
 
 - `cmake`
-- `.Net v6` (if building C#)
+- `.NET6` (if building C#)
 
 ## Clone the onnxruntime-genai repo
 
@@ -25,11 +25,10 @@ git clone https://github.com/microsoft/onnxruntime-genai
 cd onnxruntime-genai
 ```
 
-## Install ONNX Runtime
+## Download ONNX Runtime binaries
 
-By default, the onnxruntime-genai build expects to find the ONNX Runtime include and binaries in a folder called `ort` in the root directory of onnxruntime-genai. You can put the ONNX Runtime files in a different location and specify this location to the onnxruntime-genai build via the --ort_home command line argument.
+By default, the onnxruntime-genai build expects to find the ONNX Runtime include and binaries in a folder called `ort` in the root directory of onnxruntime-genai. You can put the ONNX Runtime files in a different location and specify this location to the onnxruntime-genai build via the `--ort_home` command line argument.
 
-### Option 1: Install from release
 
 These instructions assume you are in the `onnxruntime-genai` folder.
 
@@ -38,9 +37,9 @@ These instructions assume you are in the `onnxruntime-genai` folder.
 These instruction use `win-x64`. Replace this if you are using a different architecture.
 
 ```bash
-curl -L https://github.com/microsoft/onnxruntime/releases/download/v1.18.0/onnxruntime-win-x64-1.18.0.zip -o onnxruntime-win-x64-1.18.0.zip
-tar xvf onnxruntime-win-x64-1.18.0.zip
-move onnxruntime-win-x64-1.18.0 ort 
+curl -L https://github.com/microsoft/onnxruntime/releases/download/v1.19.2/onnxruntime-win-x64-1.19.2.zip -o onnxruntime-win-x64-1.19.2.zip
+tar xvf onnxruntime-win-x64-1.19.2.zip
+move onnxruntime-win-x64-1.19.2 ort 
 ```
 
 #### Linux and Mac
@@ -48,151 +47,86 @@ move onnxruntime-win-x64-1.18.0 ort
 These instruction use `linux-x64-gpu`. Replace this if you are using a different architecture.
 
 ```bash
-curl -L https://github.com/microsoft/onnxruntime/releases/download/v1.18.0/onnxruntime-linux-x64-gpu-1.18.0.tgz -o onnxruntime-linux-x64-gpu-1.18.0.tgz
-tar xvzf onnxruntime-linux-x64-gpu-1.18.0.tgz
-mv onnxruntime-linux-x64-gpu-1.18.0 ort 
+curl -L https://github.com/microsoft/onnxruntime/releases/download/v1.19.2/onnxruntime-linux-x64-gpu-1.19.2.tgz -o onnxruntime-linux-x64-gpu-1.19.2.tgz
+tar xvzf onnxruntime-linux-x64-gpu-1.19.2.tgz
+mv onnxruntime-linux-x64-gpu-1.19.2 ort 
 ```
 
-### Option 2: Install from nightly
+#### Android
 
-Download the nightly nuget package `Microsoft.ML.OnnxRuntime` from: https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly.
-  
-Extract the nuget package.
-  
-```bash
-tar xvf Microsoft.ML.OnnxRuntime.1.18.0-dev-20240322-0323-ca825cb6e6.nupkg
-```
-  
-Copy the include and lib files into `ort`.
-  
-On Windows
-  
-Example is given for `win-x64`. Change this to your architecture if different.
-
-```cmd
-copy build\native\include\onnxruntime_c_api.h ort\include
-copy runtimes\win-x64\native\*.dll ort\lib
-```
-
-On Linux
-
-Example is given for `linux-x64`. Change this to your architecture if different.
-
-```cmd
-cp build/native/include/onnxruntime_c_api.h ort/include
-cp build/linux-x64/native/libonnxruntime*.so* ort/lib
-```      
-      
-### Option 3: Build from source
-
-#### Clone the onnxruntime repo 
+If you do not already have an `ort` folder, create one.
 
 ```bash
-cd ..
-git clone https://github.com/microsoft/onnxruntime.git
-cd onnxruntime
+mkdir ort
 ```
 
-#### Build ONNX Runtime for CPU on Windows
-
 ```bash
-build.bat --build_shared_lib --skip_tests --parallel --config Release
-copy include\onnxruntime\core\session\onnxruntime_c_api.h ..\onnxruntime-genai\ort\include
-copy build\Windows\Release\Release\*.dll ..\onnxruntime-genai\ort\lib
-copy build\Windows\Release\Release\onnxruntime.lib ..\onnxruntime-genai\ort\lib
-```
-
-#### Build ONNX Runtime for DirectML on Windows
-
-```bash
-build.bat --build_shared_lib --skip_tests --parallel --use_dml --config Release
-copy include\onnxruntime\core\session\onnxruntime_c_api.h ..\onnxruntime-genai\ort\include
-copy include\onnxruntime\core\providers\dml\dml_provider_factory.h ..\onnxruntime-genai\ort\include
-copy build\Windows\Release\Release\*.dll ..\onnxruntime-genai\ort\lib
-copy build\Windows\Release\Release\onnxruntime.lib ..\onnxruntime-genai\ort\lib
+curl -L https://repo1.maven.org/maven2/com/microsoft/onnxruntime/onnxruntime-android/1.19.2/onnxruntime-android-1.19.2.aar -o ort/onnxruntime-android-1.19.2.aar
+cd ort
+tar xvf onnxruntime-android-1.19.2.aar
+cd ..
 ```
 
+## Build the generate() API
 
-#### Build ONNX Runtime for CUDA on Windows
-
-```bash
-build.bat --build_shared_lib --skip_tests --parallel --use_cuda --config Release
-copy include\onnxruntime\core\session\onnxruntime_c_api.h ..\onnxruntime-genai\ort\include
-copy include\onnxruntime\core\providers\cuda\*.h ..\onnxruntime-genai\ort\include
-copy build\Windows\Release\Release\*.dll ..\onnxruntime-genai\ort\lib
-copy build\Windows\Release\Release\onnxruntime.lib ..\onnxruntime-genai\ort\lib
-```
+This step assumes that you are in the root of the onnxruntime-genai repo, and you have followed the previous steps to copy the onnxruntime headers and binaries into the folder specified by <ORT_HOME>, which defaults to `onnxruntime-genai/ort`.
 
-#### Build ONNX Runtime on Linux
+All of the build commands below have a `--config` argument, which takes the following options:
+- `Release` builds release binaries
+- `Debug` build binaries with debug symbols
+- `RelWithDebInfo` builds release binaries with debug info
 
-```bash
-./build.sh --build_shared_lib --skip_tests --parallel [--use_cuda] --config Release
-cp include/onnxruntime/core/session/onnxruntime_c_api.h ../onnxruntime-genai/ort/include
-cp build/Linux/Release/libonnxruntime*.so* ../onnxruntime-genai/ort/lib
-```
+### Build Python API
 
-You may need to provide extra command line options for building with CUDA on Linux. An example full command is as follows.
+#### Windows CPU build
 
 ```bash
-./build.sh --parallel --build_shared_lib --use_cuda --cuda_version 11.8 --cuda_home /usr/local/cuda-11.8 --cudnn_home /usr/lib/x86_64-linux-gnu/ --config Release --build_wheel --skip_tests --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES="80" --cmake_extra_defines CMAKE_CUDA_COMPILER=/usr/local/cuda-11.8/bin/nvcc
+python build.py --config Release
 ```
 
-Replace the values given above for different versions and locations of CUDA.
-
-#### Build ONNX Runtime on Mac
+#### Windows DirectML build
 
 ```bash
-./build.sh --build_shared_lib --skip_tests --parallel --config Release
-cp include/onnxruntime/core/session/onnxruntime_c_api.h ../onnxruntime-genai/ort/include
-cp build/MacOS/Release/libonnxruntime*.dylib* ../onnxruntime-genai/ort/lib
+python build.py --use_dml --config Release
 ```
 
-## Build the generate() API
-
-This step assumes that you are in the root of the onnxruntime-genai repo, and you have followed the previos steps to copy the onnxruntime headers and binaries into the folder specified by <ORT_HOME>, which defaults to `onnxruntime-genai/ort`.
+#### Linux build
 
 ```bash
-cd ../onnxruntime-genai
+python build.py --config Release
 ```
 
-### Build Python API
-
-#### Build for Windows CPU
+#### Linux CUDA build
 
 ```bash
-python build.py
+python build.py --use_cuda --config Release
 ```
 
-#### Build for Windows DirectML
+#### Mac build
 
 ```bash
-python build.py --use_dml
+python build.py --config Release
 ```
 
-#### Build on Linux
+### Build Java API
 
 ```bash
-python build.py
+python build.py --build_java --config Release
 ```
 
-#### Build on Linux with CUDA
-
-```bash
-python build.py --use_cuda
-```
+### Build for Android
 
-#### Build on Mac
+If building on Windows, install `ninja`.
 
 ```bash
-python build.py
+pip install ninja
 ```
 
-### Build Java API
+Run the build script.
 
 ```bash
-python build.py --build_java --config Release
+python build.py --build_java --android --android_home <path to your Android SDK> --android_ndk_path <path to your NDK installation> --android_abi  [armeabi-v7a|arm64-v8a|x86|x86_64] --config Release
 ```
-Change config to Debug for debug builds.
 
 ## Install the library into your application
 
@@ -203,12 +137,28 @@ cd build/wheel
 pip install *.whl
 ```
 
-### Install .jar
+### Install NuGet
+
+_Coming soon_
+
+### Install JAR
 
 Copy `build/Windows/Release/src/java/build/libs/*.jar` into your application.
 
-### Install Nuget package
+### Install AAR
+
+Copy `build/Android/Release/src/java/build/android/outputs/aar/onnxruntime-genai-release.aar` into your application.
+
 
 ### Install C/C++ header file and library
 
-_Coming soon_
+#### Windows
+
+Use the header in `src\ort_genai.h` and the libraries in `build\Windows\Release`
+
+#### Linux
+
+Use the header in `src/ort_genai.h` and the libraries in `build/Linux/Release`
+
+
+
diff --git a/docs/genai/howto/install.md b/docs/genai/howto/install.md
index 86f969c8ccf32..3d5e8f6c90944 100644
--- a/docs/genai/howto/install.md
+++ b/docs/genai/howto/install.md
@@ -21,14 +21,12 @@ Note: only one of these sets of packages (CPU, DirectML, CUDA) should be install
 ### CPU
 
 ```bash
-pip install numpy
 pip install onnxruntime-genai
 ```
 
 ### DirectML
 
 ```bash
-pip install numpy
 pip install onnxruntime-genai-directml
 ```
 
@@ -43,15 +41,13 @@ Ensure that the `CUDA_PATH` environment variable is set to the location of your
 #### CUDA 11
 
 ```bash
-pip install numpy
-pip install onnxruntime-genai-cuda --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-genai/pypi/simple/
+pip install onnxruntime-genai-cuda --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-11/pypi/simple/
 ```
 
 #### CUDA 12
 
 ```bash
-pip install numpy
-pip install onnxruntime-genai-cuda --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
+pip install onnxruntime-genai-cuda
 ```
 
 
@@ -65,16 +61,10 @@ Note: install only one of these packages (CPU, DirectML, CUDA) in your project.
 
 ONNX Runtime generate() versions 0.3.0 and earlier came bundled with the core ONNX Runtime binaries. From version 0.4.0 onwards, the packages are separated to allow a more flexible developer experience.
 
-Version 0.4.0-rc1 depends on the ONNX Runtime version 1.19.0 RC. To install 0.4.0-rc1, add the following nuget source *before* installing the ONNX Runtime generate() nuget package.
-
-```
-dotnet nuget add source https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/nuget/v3/index.json --name ORT-Nightly
-```
-
 ### CPU
 
 ```bash
-dotnet add package Microsoft.ML.OnnxRuntimeGenAI --prerelease
+dotnet add package Microsoft.ML.OnnxRuntimeGenAI
 ```
 
 ### CUDA 
@@ -82,13 +72,13 @@ dotnet add package Microsoft.ML.OnnxRuntimeGenAI --prerelease
 Note: only CUDA 11 is supported for versions 0.3.0 and earlier, and only CUDA 12 is supported for versions 0.4.0 and later.
 
 ```bash
-dotnet add package Microsoft.ML.OnnxRuntimeGenAI.Cuda --prerelease
+dotnet add package Microsoft.ML.OnnxRuntimeGenAI.Cuda
 ```
 
 ### DirectML
 
 ```bash
-dotnet add package Microsoft.ML.OnnxRuntimeGenAI.DirectML --prerelease
+dotnet add package Microsoft.ML.OnnxRuntimeGenAI.DirectML
 ```
 
 
diff --git a/docs/genai/howto/troubleshoot.md b/docs/genai/howto/troubleshoot.md
index 9f0fe8c389338..fc055754bccff 100644
--- a/docs/genai/howto/troubleshoot.md
+++ b/docs/genai/howto/troubleshoot.md
@@ -31,4 +31,21 @@ The onnxruntime-genai Python package should run without error after this extra s
 
 ### Windows CUDA import error
 
-After CUDA toolkit installation completed on windows, ensure that the `CUDA_PATH` system environment variable has been set to the path where the toolkit was installed. This variable will be used when importing the onnxruntime_genai python module on Windows. Unset or incorrectly set `CUDA_PATH` variable may lead to a `DLL load failed while importing onnxruntime_genai`.
\ No newline at end of file
+```
+DLL load failed while importing onnxruntime_genai
+```
+
+After CUDA toolkit installation completed on windows, ensure that the `CUDA_PATH` system environment variable has been set to the path where the toolkit was installed. This variable will be used when importing the onnxruntime_genai python module on Windows. Unset or incorrectly set `CUDA_PATH` variable may lead to a `DLL load failed while importing onnxruntime_genai`.
+
+### Transformers / Tokenizers incompatibility with ONNX Runtime generate()
+
+```
+RuntimeError: [json.exception.type_error.302] type must be string, but is array
+```
+
+Occurs when you generate models with the Model Builder.
+
+There was a change in the HuggingFace transformers version 4.45.0 that caused an incompatibility with onnxruntime-genai versions 0.4.0 and earlier, reasolved in 0.5.0. There are two alternative workarounds that you can employ to fix this issue:
+
+- Option 1: downgrade your transformers version to lower than v4.45.0 (which is the version in which the above change was introduced)
+- Option 2: build onnxruntime-genai from source, using these instructions https://onnxruntime.ai/docs/genai/howto/build-from-source.html
diff --git a/docs/genai/tutorials/phi3-python.md b/docs/genai/tutorials/phi3-python.md
index 563cd5d3967f0..ed6af9d98f1ab 100644
--- a/docs/genai/tutorials/phi3-python.md
+++ b/docs/genai/tutorials/phi3-python.md
@@ -13,7 +13,7 @@ nav_order: 2
 ## Introduction
 {: .no_toc }
 
-Phi-3 ONNX models are hosted on HuggingFace and you can run them with the ONNX Runtime generate() API.
+Phi-3 and Phi 3.5 ONNX models are hosted on HuggingFace and you can run them with the ONNX Runtime generate() API.
 
 The mini (3.3B) and medium (14B) versions available now, with support. Both mini and medium have a short (4k) context version and a long (128k) context version. The long context version can accept much longer prompts and produce longer output text, but it does consume more memory.
 
@@ -28,6 +28,9 @@ Available models are:
 * [https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cpu](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cpu)
 * [https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cuda](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cuda)
 * [https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml/)
+* [https://huggingface.co/microsoft/Phi-3.5-mini-instruct-onnx](https://huggingface.co/microsoft/Phi-3.5-mini-instruct-onnx) 
+
+This tutorial demonstrates how to download and run the short context (4k) mini (3B) model variant pf Phi 3 model. See the [model reference](#phi-3-onnx-model-reference) for download commands for the other variants.
 
 
 This tutorial downloads and runs the short context (4k) mini (3B) model variant. See the [model reference](#phi-3-onnx-model-reference) for download commands for the other variants.
@@ -264,3 +267,16 @@ python phi3-qa.py -m Phi-3-medium-128k-instruct-onnx-cuda/cuda-int4-rtn-block-32
 git clone https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml
 python phi3-qa.py -m Phi-3-medium-128k-instruct-onnx-directml/directml-int4-awq-block-128
 ```
+
+### Phi-3.5 mini 128k context CUDA
+```bash
+huggingface-cli download microsoft/Phi-3.5-mini-instruct-onnx --include cuda/cuda-int4-awq-block-128/* --local-dir .
+python phi3-qa.py -m cuda/cuda-int4-awq-block-128
+```
+
+### Phi-3.5 mini 128k context CPU
+
+```bash
+huggingface-cli download microsoft/Phi-3.5-mini-instruct-onnx --include cpu_and_mobile/cpu-int4-awq-block-128-acc-level-4/* --local-dir .
+python phi3-qa.py -m cpu_and_mobile/cpu-int4-awq-block-128-acc-level-4
+```
diff --git a/docs/genai/tutorials/phi3-v.md b/docs/genai/tutorials/phi3-v.md
index ee4c70038cd01..e4aa4f75dca6e 100644
--- a/docs/genai/tutorials/phi3-v.md
+++ b/docs/genai/tutorials/phi3-v.md
@@ -13,14 +13,14 @@ image: /images/coffee.png
 
 The Phi-3 vision model is a small, but powerful multi modal model that allows you to use both image and text to output text. It is used in scenarios such as describing the content of images in detail.
 
-The Phi-3 vision model is supported by versions of onnxruntime-genai 0.3.0-rc2 and later.
+The Phi-3 vision model is supported by versions of onnxruntime-genai 0.3.0 and later.
 
 You can download the models here:
 
 * [https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-cpu](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-cpu)
+* [https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-directml](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-directml)
 * [https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-cuda](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-cuda)
 
-Support for DirectML is coming soon!
 
 * TOC placeholder
 {:toc}
@@ -46,13 +46,10 @@ Support for DirectML is coming soon!
 ## Choose your platform
 
 If you have an NVIDIA GPU, that will give the best performance right now.
-
-The models will also run on CPU, but they will be slower.
-
-Support for Windows machines with GPUs other than NVIDIA is coming soon!
  
 **Note: Only one package and model is required based on your hardware. That is, only execute the steps for one of the following sections**
 
+
 ## Run with NVIDIA CUDA
 
 1. Download the model
@@ -60,6 +57,7 @@ Support for Windows machines with GPUs other than NVIDIA is coming soon!
    ```bash
    huggingface-cli download microsoft/Phi-3-vision-128k-instruct-onnx-cuda --include cuda-int4-rtn-block-32/* --local-dir .
    ```
+   
    This command downloads the model into a folder called `cuda-int4-rtn-block-32`.
 
 2. Setup your CUDA environment
@@ -74,15 +72,13 @@ Support for Windows machines with GPUs other than NVIDIA is coming soon!
    * CUDA 11
 
    ```bash
-   pip install numpy
-   pip install --pre onnxruntime-genai-cuda --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-genai/pypi/simple/
+   pip install onnxruntime-genai-cuda --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-11/pypi/simple/
    ```
 
    * CUDA 12
 
    ```bash
-   pip install numpy
-   pip install onnxruntime-genai-cuda --pre --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
+   pip install onnxruntime-genai-cuda
    ```
 
 4. Run the model
@@ -91,6 +87,7 @@ Support for Windows machines with GPUs other than NVIDIA is coming soon!
 
    ```bash
    curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3v.py -o phi3v.py
+   pip install pyreadline3
    python phi3v.py -m cuda-int4-rtn-block-32 
    ```
 
@@ -117,9 +114,8 @@ Support for Windows machines with GPUs other than NVIDIA is coming soon!
 
 2. Install the generate() API for CPU
    
-   ```
-   pip install numpy
-   pip install --pre onnxruntime-genai
+   ```bash
+   pip install onnxruntime-genai
    ```
 
 3. Run the model
@@ -128,6 +124,7 @@ Support for Windows machines with GPUs other than NVIDIA is coming soon!
 
    ```bash
    curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3v.py -o phi3v.py
+   pip install pyreadline3
    python phi3v.py -m cpu-int4-rtn-block-32-acc-level-4
    ```
 
@@ -152,3 +149,42 @@ Support for Windows machines with GPUs other than NVIDIA is coming soon!
    The products include Chocolade, Gummibarchen, Scottish Longbreads, Sir Rodney's Scones, Tarte au sucre,
    and Chocolate Biscuits. The Grand Total column sums up the sales for each product across the two quarters.</s>
    ```
+
+## Run with DirectML
+
+1. Download the model
+
+   ```bash
+   huggingface-cli download microsoft/Phi-3-vision-128k-instruct-onnx-directml --include directml-int4-rtn-block-32/* --local-dir .
+   ```
+
+   This command downloads the model into a folder called `directml-int4-rtn-block-32`.
+
+2. Install the generate() API
+
+   ```bash
+   pip install onnxruntime-genai-directml
+   ```
+
+3. Run the model
+
+   Run the model with [phi3v.py](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi3v.py).
+
+   ```bash
+   curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3v.py -o phi3v.py
+   pip install pyreadline3
+   python phi3v.py -m directml-int4-rtn-block-32 
+   ```
+
+   Enter the path to an image file and a prompt. The model uses the image and prompt to give you an answer.
+
+   For example: `What does the sign say?`
+
+   ![coffee](../../../images/nashville.jpg)
+
+   ```
+   The sign says 'DO NOT ENTER'.
+   ```
+
+      
+
diff --git a/docs/get-started/with-python.md b/docs/get-started/with-python.md
index c89d92e4ad432..7ff3d1048c58d 100644
--- a/docs/get-started/with-python.md
+++ b/docs/get-started/with-python.md
@@ -22,26 +22,26 @@ There are two Python packages for ONNX Runtime. Only one of these packages shoul
 
 ### Install ONNX Runtime CPU
 
-Use the CPU package if you are running on Arm CPUs and/or macOS.
+Use the CPU package if you are running on Arm®-based CPUs and/or macOS.
 
 ```bash
 pip install onnxruntime
 ```
 
-### Install ONNX Runtime GPU (CUDA 11.x)
+### Install ONNX Runtime GPU (CUDA 12.x)
 
-The default CUDA version for ORT is 11.8.
+The default CUDA version for ORT is 12.x.
 
 ```bash
 pip install onnxruntime-gpu
 ```
 
-### Install ONNX Runtime GPU (CUDA 12.x)
+### Install ONNX Runtime GPU (CUDA 11.8)
 
-For Cuda 12.x, please use the following instructions to install from [ORT Azure Devops Feed](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/onnxruntime-cuda-12/PyPI/onnxruntime-gpu/overview)
+For Cuda 11.8, please use the following instructions to install from [ORT Azure Devops Feed](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/onnxruntime-cuda-11/PyPI/onnxruntime-gpu/overview)
 
 ```bash
-pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
+pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-11/pypi/simple/
 ```
 
 ## Install ONNX for model export
@@ -260,8 +260,8 @@ If using pip, run `pip install --upgrade pip` prior to downloading.
 |[onnxruntime](https://pypi.org/project/onnxruntime)|CPU (Release)| Windows (x64), Linux (x64, ARM64), Mac (X64),  |
 |[ort-nightly](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly/PyPI/ort-nightly)|CPU (Dev)    | Same as above |
 |[onnxruntime-gpu](https://pypi.org/project/onnxruntime-gpu)|GPU (Release)| Windows (x64), Linux (x64, ARM64) |
-|[ort-nightly-gpu for CUDA 11.*](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly/PyPI/ort-nightly-gpu) |GPU (Dev) | Windows (x64), Linux (x64, ARM64) |
-|[ort-nightly-gpu for CUDA 12.*](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ort-cuda-12-nightly/PyPI/ort-nightly-gpu) |GPU (Dev) | Windows (x64), Linux (x64, ARM64) |
+|[ort-nightly-gpu for CUDA 11.*](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ort-cuda-11-nightly/PyPI/ort-nightly-gpu) |GPU (Dev) | Windows (x64), Linux (x64, ARM64) |
+|[ort-nightly-gpu for CUDA 12.*](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly/PyPI/ort-nightly-gpu) |GPU (Dev) | Windows (x64), Linux (x64, ARM64) |
 
 Before installing nightly package, you will need install dependencies first.
 ```
@@ -270,12 +270,12 @@ python -m pip install coloredlogs flatbuffers numpy packaging protobuf sympy
 
 Example to install ort-nightly-gpu for CUDA 11.*:
 ```
-python -m pip install ort-nightly-gpu --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/
+python -m pip install ort-nightly-gpu --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ort-cuda-11-nightly/pypi/simple/
 ```
 
 Example to install ort-nightly-gpu for CUDA 12.*:
 ```
-python -m pip install ort-nightly-gpu --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ort-cuda-12-nightly/pypi/simple/
+python -m pip install ort-nightly-gpu --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/
 ```
 
 For Python compiler version notes, see [this page](https://github.com/microsoft/onnxruntime/tree/main/docs/Python_Dev_Notes.md)
diff --git a/docs/install/index.md b/docs/install/index.md
index d9e14b1609697..60057a88215bb 100644
--- a/docs/install/index.md
+++ b/docs/install/index.md
@@ -46,25 +46,29 @@ For ONNX Runtime GPU package, it is required to install [CUDA](https://developer
 pip install onnxruntime
 ```
 
-#### Install ONNX Runtime GPU (CUDA 11.x)
-The default CUDA version for ORT is 11.8.
+#### Install ONNX Runtime GPU (CUDA 12.x)
+The default CUDA version for [onnxruntime-gpu in pypi](https://pypi.org/project/onnxruntime-gpu) is 12.x since 1.19.0.
 
 ```bash
 pip install onnxruntime-gpu
 ```
 
-#### Install ONNX Runtime GPU (CUDA 12.x)
-For Cuda 12.x, please use the following instructions to install from [ORT Azure Devops Feed](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/onnxruntime-cuda-12/PyPI/onnxruntime-gpu/overview)
+For previous versions, you can download here: [1.18.1](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/onnxruntime-cuda-12/PyPI/onnxruntime-gpu/overview/1.18.1), [1.18.0](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/onnxruntime-cuda-12/PyPI/onnxruntime-gpu/overview/1.18.0)
+
+
+#### Install ONNX Runtime GPU (CUDA 11.x)
+For Cuda 11.x, please use the following instructions to install from [ORT Azure Devops Feed](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/onnxruntime-cuda-11/PyPI/onnxruntime-gpu/overview) for 1.19.2 or later.
 
 ```bash
-pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
+pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-11/pypi/simple/
 ```
 
-#### Install ONNX Runtime GPU (ROCm)
-For ROCm, please follow instructions to install it at the [AMD ROCm install docs](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.0.0/). The ROCm execution provider for ONNX Runtime is built and tested with ROCm 6.0.0
+For previous versions, you can download here: [1.18.1](https://pypi.org/project/onnxruntime-gpu/1.18.1/), [1.18.0](https://pypi.org/project/onnxruntime-gpu/1.18.0/)
 
-To build from source on Linux, follow the instructions [here](https://onnxruntime.ai/docs/build/eps.html#amd-rocm). Alternatively, each major ORT release has a corresponding C/C++ ROCm package, found [here](https://github.com/microsoft/onnxruntime/releases/). 
+#### Install ONNX Runtime GPU (ROCm)
+For ROCm, please follow instructions to install it at the [AMD ROCm install docs](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.0.0/). The ROCm execution provider for ONNX Runtime is built and tested with ROCm 6.0.0. 
 
+To build from source on Linux, follow the instructions [here](https://onnxruntime.ai/docs/build/eps.html#amd-rocm).
 
 ### Install ONNX to export the model
 
@@ -94,16 +98,16 @@ pip install skl2onnx
 dotnet add package Microsoft.ML.OnnxRuntime
 ```
 
-#### Install ONNX Runtime GPU (CUDA 11.x)
+#### Install ONNX Runtime GPU (CUDA 12.x)
 
-The default CUDA version for ORT is 11.8
+The default CUDA version for ORT is 12.x
 
 ```bash
 # GPU
 dotnet add package Microsoft.ML.OnnxRuntime.Gpu
 ```
 
-#### Install ONNX Runtime GPU (CUDA 12.x)
+#### Install ONNX Runtime GPU (CUDA 11.8)
 
 1. Project Setup
 
@@ -116,8 +120,8 @@ a nuget.config file to your project in the same directory as your .csproj file.
 <configuration>
     <packageSources>
         <clear/>
-        <add key="onnxruntime-cuda-12"
-             value="https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/nuget/v3/index.json"/>
+        <add key="onnxruntime-cuda-11"
+             value="https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-11/nuget/v3/index.json"/>
     </packageSources>
 </configuration>
 ```
@@ -405,8 +409,8 @@ below:
 |--------------|---------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|
 | Python       | If using pip, run `pip install --upgrade pip` prior to downloading.                                                                               |                                                                                                                                               |                                                                                                |
 |              | CPU: [**onnxruntime**](https://pypi.org/project/onnxruntime)                                                                                      | [ort-nightly (dev)](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly/PyPI/ort-nightly/overview)                    |                                                                                                |
-|              | GPU (CUDA/TensorRT) for CUDA 11.x: [**onnxruntime-gpu**](https://pypi.org/project/onnxruntime-gpu)                                                              | [ort-nightly-gpu (dev)](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly/PyPI/ort-nightly-gpu/overview/)           | [View](../execution-providers/CUDA-ExecutionProvider.md#requirements)                          |
-|              | GPU (CUDA/TensorRT) for CUDA 12.x: [**onnxruntime-gpu**](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/onnxruntime-cuda-12/PyPI/onnxruntime-gpu/overview/)                                                              | [ort-nightly-gpu (dev)](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ort-cuda-12-nightly/PyPI/ort-nightly-gpu/overview/)           | [View](../execution-providers/CUDA-ExecutionProvider.md#requirements)                          |
+|              | GPU (CUDA/TensorRT) for CUDA 12.x: [**onnxruntime-gpu**](https://pypi.org/project/onnxruntime-gpu)                                                              | [ort-nightly-gpu (dev)](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly/PyPI/ort-nightly-gpu/overview/)           | [View](../execution-providers/CUDA-ExecutionProvider.md#requirements)                          |
+|              | GPU (CUDA/TensorRT) for CUDA 11.x: [**onnxruntime-gpu**](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/onnxruntime-cuda-11/PyPI/onnxruntime-gpu/overview/)                                                              | [ort-nightly-gpu (dev)](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ort-cuda-11-nightly/PyPI/ort-nightly-gpu/overview/)           | [View](../execution-providers/CUDA-ExecutionProvider.md#requirements)                          |
 |              | GPU (DirectML): [**onnxruntime-directml**](https://pypi.org/project/onnxruntime-directml/)                                                        | [ort-nightly-directml (dev)](https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly/PyPI/ort-nightly-directml/overview/) | [View](../execution-providers/DirectML-ExecutionProvider.md#requirements)                      |
 |              | OpenVINO: [**intel/onnxruntime**](https://github.com/intel/onnxruntime/releases/latest) - *Intel managed*                                         |                                                                                                                                               | [View](../build/eps.md#openvino)                                                               |
 |              | TensorRT (Jetson): [**Jetson Zoo**](https://elinux.org/Jetson_Zoo#ONNX_Runtime) - *NVIDIA managed*                                                |                                                                                                                                               |                                                                                                |
diff --git a/docs/performance/model-optimizations/float16.md b/docs/performance/model-optimizations/float16.md
index 972f5fe516f6b..a0335ccbac70f 100644
--- a/docs/performance/model-optimizations/float16.md
+++ b/docs/performance/model-optimizations/float16.md
@@ -62,7 +62,9 @@ from onnxconverter_common import auto_mixed_precision
 import onnx
 
 model = onnx.load("path/to/model.onnx")
-model_fp16 = auto_convert_mixed_precision(model, test_data, rtol=0.01, atol=0.001, keep_io_types=True)
+# Assuming x is the input to the model
+feed_dict = {'input': x.numpy()}
+model_fp16 = auto_convert_mixed_precision(model, feed_dict, rtol=0.01, atol=0.001, keep_io_types=True)
 onnx.save(model_fp16, "path/to/model_fp16.onnx")
 ```
 
@@ -73,6 +75,7 @@ auto_convert_mixed_precision(model, feed_dict, validate_fn=None, rtol=None, atol
 ```
 
 - `model`: The ONNX model to convert.
+- `feed_dict`: Test data used to measure the accuracy of the model during conversion. Format is similar to InferenceSession.run (map of input names to values)
 - `validate_fn`: A function accepting two lists of numpy arrays (the outputs of the float32 model and the mixed-precision model, respectively) that returns `True` if the results are sufficiently close and `False` otherwise. Can be used instead of or in addition to `rtol` and `atol`.
 - `rtol`, `atol`: Absolute and relative tolerances used for validation. See [numpy.allclose](https://numpy.org/doc/stable/reference/generated/numpy.allclose.html) for more information.
 - `keep_io_types`: Whether model inputs/outputs should be left as float32.
diff --git a/docs/performance/model-optimizations/quantization.md b/docs/performance/model-optimizations/quantization.md
index c769b0889fa23..ae49e591d94ca 100644
--- a/docs/performance/model-optimizations/quantization.md
+++ b/docs/performance/model-optimizations/quantization.md
@@ -202,7 +202,7 @@ ONNX Runtime quantization on GPU only supports S8S8.
 
 On x86-64 machines with AVX2 and AVX512 extensions, ONNX Runtime uses the VPMADDUBSW instruction for U8S8 for performance. This instruction might suffer from saturation issues: it can happen that the output does not fit into a 16-bit integer and has to be clamped (saturated) to fit. Generally, this is not a big issue for the final result. However, if you do encounter a large accuracy drop, it may be caused by saturation. In this case, you can either try [reduce_range](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/quantize.py) or the U8U8 format which doesn't have saturation issues.
 
-There is no such issue on other CPU architectures (x64 with VNNI and ARM).
+There is no such issue on other CPU architectures (x64 with VNNI and Arm®).
 
 ### List of Supported Quantized Ops
 {: .no_toc}
@@ -231,13 +231,66 @@ ONNX Runtime leverages the TensorRT Execution Provider for quantization on GPU n
 
 We provide two end-to end examples: [Yolo V3](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization/object_detection/trt/yolov3) and [resnet50](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization/image_classification/trt/resnet50).
 
+## Quantize to Int4/UInt4
+
+ONNX Runtime can quantize certain operators in a model to 4 bit integer types. Block-wise weight-only quantizaiton is applied to the operators. The supported op types are:
+- [MatMul](https://github.com/onnx/onnx/blob/main/docs/Operators.md#matmul):
+  - The node is quantized only if the input `B` is constant
+  - support QOperator or QDQ format.
+  - If QOperator is selected, the node is converted to a [MatMulNBits](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#commicrosoftmatmulnbits) node. Weight `B` is blockwise quantized and saved in the new node. [HQQ](https://arxiv.org/pdf/2309.15531.pdf), [GPTQ](https://huggingface.co/docs/transformers/main/en/quantization/gptq) and RTN (default) algorithms are supported.
+  - If QDQ is selected, the MatMul node is replaced by a DequantizeLinear -> MatMul pair. Weight `B` is blockwise quantized and saved in the DequantizeLinear node as an initializer.
+- [Gather](https://github.com/onnx/onnx/blob/main/docs/Operators.md#Gather):
+  - The node is quantized only if the input `data` is constant.
+  - support QOperator
+  - Gather is quantized to a [GatherBlockQuantized](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#commicrosoftgatherblockquantized) node. Input `data` is blockwise quantized and saved in the new node. Only support RTN algorithm.
+
+Since Int4/UInt4 types are introduced in [onnx opset 21](https://github.com/onnx/onnx/releases/tag/v1.16.0), if the model's onnx domain version is < 21, it is force upgraded to opset 21. Please make sure the operators in the model are compatible with onnx opset 21.
+
+To run a model that has GatherBlockQuantized nodes, ONNX Runtime 1.20 is needed.
+
+Code Examples:
+
+```python
+from onnxruntime.quantization import (
+    matmul_4bits_quantizer,
+    quant_utils,
+    quantize
+)
+from pathlib import Path
+
+model_fp32_path="path/to/orignal/model.onnx"
+model_int4_path="path/to/save/quantized/model.onnx"
+
+quant_config = matmul_4bits_quantizer.DefaultWeightOnlyQuantConfig(
+  block_size=128, # 2's exponential and >= 16
+  is_symmetric=True, # if true, quantize to Int4. otherwsie, quantize to uint4.
+  accuracy_level=4, # used by MatMulNbits, see https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#attributes-35
+  quant_format=quant_utils.QuantFormat.QOperator, 
+  op_types_to_quantize=("MatMul","Gather"), # specify which op types to quantize
+  quant_axes=(("MatMul", 0), ("Gather", 1),) # specify which axis to quantize for an op type.
+
+model = quant_utils.load_model_with_shape_infer(Path(model_fp32_path))
+quant = matmul_4bits_quantizer.MatMul4BitsQuantizer(
+  model, 
+  nodes_to_exclude=None, # specify a list of nodes to exclude from quantizaiton
+  nodes_to_include=None, # specify a list of nodes to force include from quantization
+  algo_config=quant_config,)
+quant.process()
+quant.model.save_model_to_file(
+  model_int4_path,
+  True) # save data to external file
+
+```
+
+For AWQ and GTPQ quantization usage, please refer to [Gen-AI model builder](https://github.com/microsoft/onnxruntime-genai/tree/main/src/python/py/models#quantized-pytorch-model).
+
 ## FAQ
 ### Why am I not seeing performance improvements?
 {: .no_toc }
 
 The performance improvement depends on your model and hardware. The performance gain from quantization has two aspects: compute and memory. Old hardware has none or few of the instructions needed to perform efficient inference in int8. And quantization has overhead (from quantizing and dequantizing), so it is not rare to get worse performance on old devices.
 
-x86-64 with VNNI, GPU with Tensor Core int8 support and ARM with dot-product instructions can get better performance in general.
+x86-64 with VNNI, GPU with Tensor Core int8 support and Arm®-based processors with dot-product instructions can get better performance in general.
 
 ### Which quantization method should I choose, dynamic or static?
 {: .no_toc}
diff --git a/docs/tutorials/csharp/stable-diffusion-csharp.md b/docs/tutorials/csharp/stable-diffusion-csharp.md
index 588fb18e70436..5ba5ec6ea6bfe 100644
--- a/docs/tutorials/csharp/stable-diffusion-csharp.md
+++ b/docs/tutorials/csharp/stable-diffusion-csharp.md
@@ -52,8 +52,6 @@ To run in the cloud with Azure Machine Learning:
 The Hugging Face site has a great library of open source models. We will leverage and download the [ONNX Stable Diffusion models from Hugging Face](https://huggingface.co/models?sort=downloads&search=Stable+Diffusion).
 
  - [Stable Diffusion Models v1.4](https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/onnx)
- - [Stable Diffusion Models v1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/onnx)
-
 
 Once you have selected a model version repo, click `Files and Versions`, then select the `ONNX` branch. If there isn't an ONNX model branch available, use the `main` branch and convert it to ONNX. See the [ONNX conversion tutorial for PyTorch](https://learn.microsoft.com/windows/ai/windows-ml/tutorials/pytorch-convert-model) for more information.
 
diff --git a/docs/tutorials/mobile/pose-detection.md b/docs/tutorials/mobile/pose-detection.md
index ad4296aa64603..248d06889550a 100644
--- a/docs/tutorials/mobile/pose-detection.md
+++ b/docs/tutorials/mobile/pose-detection.md
@@ -19,7 +19,7 @@ Learn how to build and run ONNX models on mobile with built-in pre and post proc
 
 ## Object detection with YOLOv8
 
-You can find the full source code for the [Android](https://github.com/microsoft/ app in the ONNX Runtime inference examples repository.
+You can find the full source code for the [Android](https://github.com/microsoft/) app in the ONNX Runtime inference examples repository.
 
 ### Build the ONNX model with built-in pre and post processing
 
diff --git a/docs/tutorials/on-device-training/android-app.md b/docs/tutorials/on-device-training/android-app.md
index b9b0ae49c7bec..ab528a5a1c1ad 100644
--- a/docs/tutorials/on-device-training/android-app.md
+++ b/docs/tutorials/on-device-training/android-app.md
@@ -7,15 +7,15 @@ nav_order: 1
 ---
 
 # On-Device Training: Building an Android Application
-
+{: .no_toc }
 In this tutorial, we will explore how to build an Android application that incorporates ONNX Runtime's On-Device Training solution. On-device training refers to the process of training a machine learning model directly on an edge device without relying on cloud services or external servers.
 
 Here is what the application will look like at the end of this tutorial:
 
-<img src="../../../images/on-device-training-application-prediction-tom.jpg"  width="30%" height="30%">
+<img src="../../../images/on-device-training-application-prediction-tom.jpg"  alt="an image classification app with Tom Cruise in the middle." width="30%" height="30%">
 
 ## Introduction
-
+{: .no_toc }
 We will guide you through the steps to create an Android app that can train a simple image classification model using on-device training techniques. This tutorial showcases the `transfer learning` technique where knowledge gained from training a model on one task is leveraged to improve the performance of a model on a different but related task. Instead of starting the learning process from scratch, transfer learning allows us to transfer the knowledge or features learned by a pre-trained model to a new task.
 
 For this tutorial, we will leverage the `MobileNetV2` model which has been trained on large-scale image datasets such as ImageNet (which has 1,000 classes). We will use this model for classifying custom data into one of four classes. The initial layers of MobileNetV2 serve as a feature extractor, capturing generic visual features applicable to various tasks, and only the final classifier layer will be trained for the task at hand.
@@ -24,26 +24,10 @@ In this tutorial, we will use data to learn to:
 - Classify animals into one of four categories using a pre-packed animals dataset.
 - Classify celebrities into one of four categories using a custom celebrities dataset.
 
-## Contents
-
-- [Introduction](#introduction)
-- [Prerequisites](#prerequisites)
-- [Offline Phase - Building the training artifacts](#offline-phase---building-the-training-artifacts)
-  - [Export the model to ONNX](#op1)
-  - [Define the trainable and non trainable parameters](#op2)
-  - [Generate the training artifacts](#op3)
-- [Training Phase - Android application development](#training-phase---android-application-development)
-  - [Setting up the project in Android Studio](#tp1)
-  - [Adding the ONNX Runtime dependency](#tp2)
-  - [Packaging the Prebuilt Training Artifacts and Dataset](#tp3)
-  - [Interfacing with ONNX Runtime - C++ Code](#tp4)
-  - [Image Preprocessing](#tp5)
-  - [Application Frontend](#tp6)
-- [Training Phase - Running the application on a device](#training-phase---running-the-application-on-a-device)
-  - [Running the application on a device](#tp7)
-  - [Training with a pre-loaded dataset - Animals](#tp8)
-  - [Training with a custom dataset - Celebrities](#tp9)
-- [Conclusion](#conclusion)
+
+## Table of Contents
+* TOC placeholder
+{:toc}
 
 ## Prerequisites
 
@@ -791,7 +775,7 @@ To follow this tutorial, you should have a basic understanding of Android app de
 
    b. Launching the application on the device should look like this:
 
-   <img src="../../../images/on-device-training-application-landing-page.jpg"  width="30%" height="30%">
+   <img src="../../../images/on-device-training-application-landing-page.jpg"  alt="Barebones ORT Personalize app" width="30%" height="30%">
 
 2. <a name="tp8"></a>Training with a pre-loaded dataset - Animals
 
@@ -805,7 +789,7 @@ To follow this tutorial, you should have a basic understanding of Android app de
 
    e. Use any animal image from your library for inferencing now.
 
-   <img src="../../../images/on-device-training-application-prediction-cow.jpg"  width="30%" height="30%">
+   <img src="../../../images/on-device-training-application-prediction-cow.jpg" alt="ORT Personalize app with an image of a cow" width="30%" height="30%">
 
    As can be seen from the image above, the model correctly predicted `Cow`.
 
@@ -825,7 +809,7 @@ To follow this tutorial, you should have a basic understanding of Android app de
 
    g. That's it!. Hopefully the application classified the image correctly.
 
-   <img src="../../../images/on-device-training-application-prediction-tom.jpg"  width="30%" height="30%">
+   <img src="../../../images/on-device-training-application-prediction-tom.jpg" alt="an image classification app with Tom Cruise in the middle." width="30%" height="30%">
 
 
 ## Conclusion
diff --git a/docs/tutorials/on-device-training/ios-app.md b/docs/tutorials/on-device-training/ios-app.md
index fff1347923ef0..e61bab68596ff 100644
--- a/docs/tutorials/on-device-training/ios-app.md
+++ b/docs/tutorials/on-device-training/ios-app.md
@@ -7,7 +7,7 @@ nav_order: 2
 ---
 
 # Building an iOS Application
-
+{: .no_toc }
 In this tutorial, we will explore how to build an iOS application that incorporates ONNX Runtime's On-Device Training solution. On-device training refers to the process of training a machine learning model directly on an edge device without relying on cloud services or external servers.
 
 In this tutorial, we will build a simple speaker identification app that learns to identify a speaker's voice. We will take a look at how to train a model on-device, export the trained model, and use the trained model to perform inference.
@@ -18,6 +18,7 @@ Here is what the application will look like:
 <img src="../../../images/iOS_speaker_identification_app.png" alt="application demo, with buttons for voice, train, and infer." width="30%" height="30%">
 
 ## Introduction
+{: .no_toc }
 We will guide you through the process of building an iOS application that can train a simple audio classification model using on-device training techniques. The tutorial showcases the `transfer learning` technique where knowledge gained from training a model on one task is leveraged to improve the performance of a model on a different but related task. Instead of starting the learning process from scratch, transfer learning allows us to transfer the knowledge or features learned by a pre-trained model to a new task.
 
 In this tutorial, we will leverage the [`wav2vec`](https://huggingface.co/superb/wav2vec2-base-superb-sid) model which has been trained on large-scale celebrity speech data such as `VoxCeleb1`. We will use the pre-trained model to extract features from the audio data and train a binary classifier to identify the speaker. The initial layers of the model serve as a feature extractor, capturing the important features of the audio data. Only the last layer of the model is trained to perform the classification task.
@@ -29,23 +30,9 @@ In the tutorial, we will:
 - Use the exported model to perform inference
 
 
-## Contents
-- [Building an iOS Application](#building-an-ios-application)
-  - [Introduction](#introduction)
-  - [Contents](#contents)
-  - [Prerequisites](#prerequisites)
-  - [Generating the training artifacts](#generating-the-training-artifacts)
-  - [Building the iOS application](#building-the-ios-application)
-    - [Xcode Setup](#xcode-setup)
-    - [Application Overview](#application-overview)
-    - [Training the model](#training-the-model)
-    - [Inference with the trained model](#inference-with-the-trained-model)
-    - [Recording Audio](#recording-audio)
-    - [Train View](#train-view)
-    - [Infer View](#infer-view)
-    - [ContentView](#contentview)
-  - [Running the iOS application](#running-the-ios-application)
-  - [Conclusion](#conclusion)
+## Table of Contents
+* TOC placeholder
+{:toc}
 
 
 ## Prerequisites
@@ -947,27 +934,27 @@ Now, we are ready to run the application. You can run the application on the sim
 
 a. Now, when you run the application, you should see the following screen:
 
-<img src="../../../images/iOS_speaker_identification_app.png"  width="30%" height="30%">
+<img src="../../../images/iOS_speaker_identification_app.png" alt="My Voice application with Train and Infer buttons" width="30%" height="30%">
 
 
 b. Next, click on the `Train` button to navigate to the `TrainView`. The `TrainView` will prompt you to record your voice. You will need to record your voice `kNumRecordings` times. 
 
-<img src="../../../images/iOS_speaker_identification_training_screen.jpg"  width="30%" height="30%">
+<img src="../../../images/iOS_speaker_identification_training_screen.jpg" alt="My Voice application with words to record" width="30%" height="30%">
 
 
 c. Once all the recordings are complete, the application will train the model on the given data. You will see the progress bar indicating the progress of the training.
 
-<img src="../../../images/iOS_speaker_identification_training_progress_screen.jpg"  width="30%" height="30%">
+<img src="../../../images/iOS_speaker_identification_training_progress_screen.jpg" alt="Loading bar while the app is training" width="30%" height="30%">
 
 
 d. Once the training is complete, you will see the following screen:
 
-<img src="../../../images/iOS_speaker_identification_training_complete_screen.jpg"  width="30%" height="30%">
+<img src="../../../images/iOS_speaker_identification_training_complete_screen.jpg" alt="The app informs you training finished successfully!" width="30%" height="30%">
 
 
 e. Now, click on the `Infer` button to navigate to the `InferView`. The `InferView` will prompt you to record your voice. Once the recording is complete, it will perform inference with the trained model and display the result of the inference.
 
-<img src="../../../images/iOS_speaker_identification_infer_screen.jpg"  width="30%" height="30%">
+<img src="../../../images/iOS_speaker_identification_infer_screen.jpg" alt="My Voice application allows you to record and infer whether it's you or not." width="30%" height="30%">
 
 
 That's it! Hopefully, it identified your voice correctly.
diff --git a/docs/tutorials/web/ep-webnn.md b/docs/tutorials/web/ep-webnn.md
index fe1c1d729daf0..f04dd7870d7cb 100644
--- a/docs/tutorials/web/ep-webnn.md
+++ b/docs/tutorials/web/ep-webnn.md
@@ -74,59 +74,59 @@ To use WebNN EP, you just need to make 3 small changes:
 
 WebNN API and WebNN EP are in actively development, you might consider installing the latest nightly build version of ONNX Runtime Web (onnxruntime-web@dev) to benefit from the latest features and improvements.
 
-## Keep tensor data on WebNN MLBuffer (IO binding)
+## Keep tensor data on WebNN MLTensor (IO binding)
 
-By default, a model's inputs and outputs are tensors that hold data in CPU memory. When you run a session with WebNN EP with 'gpu' or 'npu' device type, the data is copied to GPU or NPU memory, and the results are copied back to CPU memory. Memory copy between different devices as well as different sessions will bring much overhead to the inference time, WebNN provides a new opaque device-specific storage type MLBuffer to address this issue.
-If you get your input data from a MLBuffer, or you want to keep the output data on MLBuffer for further processing, you can use IO binding to keep the data on MLBuffer. This will be especially helpful when running transformer based models, which usually runs a single model multiple times with previous output as the next input.
+By default, a model's inputs and outputs are tensors that hold data in CPU memory. When you run a session with WebNN EP with 'gpu' or 'npu' device type, the data is copied to GPU or NPU memory, and the results are copied back to CPU memory. Memory copy between different devices as well as different sessions will bring much overhead to the inference time, WebNN provides a new opaque device-specific storage type MLTensor to address this issue.
+If you get your input data from a MLTensor, or you want to keep the output data on MLTensor for further processing, you can use IO binding to keep the data on MLTensor. This will be especially helpful when running transformer based models, which usually runs a single model multiple times with previous output as the next input.
 
-For model input, if your input data is a WebNN storage MLBuffer, you can [create a MLBuffer tensor and use it as input tensor](#create-input-tensor-from-a-mlbuffer).
+For model input, if your input data is a WebNN storage MLTensor, you can [create a MLTensor tensor and use it as input tensor](#create-input-tensor-from-a-mltensor).
 
 For model output, there are 2 ways to use the IO binding feature:
-- [Use pre-allocated MLBuffer tensors](#use-pre-allocated-mlbuffer-tensors)
+- [Use pre-allocated MLTensor tensors](#use-pre-allocated-mltensor-tensors)
 - [Specify the output data location](#specify-the-output-data-location)
 
 Please also check the following topic:
-- [MLBuffer tensor life cycle management](#mlbuffer-tensor-life-cycle-management)
+- [MLTensor tensor life cycle management](#mltensor-tensor-life-cycle-management)
 
-**Note:** The MLBuffer necessitates a shared MLContext for IO binding. This implies that the MLContext should be pre-created as a WebNN EP option and utilized across all sessions.
+**Note:** The MLTensor necessitates a shared MLContext for IO binding. This implies that the MLContext should be pre-created as a WebNN EP option and utilized across all sessions.
 
-### Create input tensor from a MLBuffer
+### Create input tensor from a MLTensor
 
-If your input data is a WebNN storage MLBuffer, you can create a MLBuffer tensor and use it as input tensor:
+If your input data is a WebNN storage MLTensor, you can create a MLTensor tensor and use it as input tensor:
 
 ```js
 const mlContext = await navigator.ml.createContext({deviceType, ...});
-const inputMLBuffer = await mlContext.createBuffer({
+const inputMLTensor = await mlContext.createTensor({
   dataType: 'float32',
   dimensions: [1, 3, 224, 224],
-  usage: MLBufferUsage.WRITE_TO,
+  usage: MLTensorUsage.WRITE,
 });
 
-mlContext.writeBuffer(mlBuffer, inputArrayBuffer);
-const inputTensor = ort.Tensor.fromMLBuffer(mlBuffer, {
+mlContext.writeTensor(inputMLTensor, inputArrayBuffer);
+const inputTensor = ort.Tensor.fromMLTensor(inputMLTensor, {
   dataType: 'float32',
   dims: [1, 3, 224, 224]
 });
 
 ```
 
-Use this tensor as model inputs(feeds) so that the input data will be kept on MLBuffer.
+Use this tensor as model inputs(feeds) so that the input data will be kept on MLTensor.
 
-### Use pre-allocated MLBuffer tensors
+### Use pre-allocated MLTensor tensors
 
-If you know the output shape in advance, you can create a MLBuffer tensor and use it as output tensor:
+If you know the output shape in advance, you can create a MLTensor tensor and use it as output tensor:
 
 ```js
 
-// Create a pre-allocated buffer and the corresponding tensor. Assuming that the output shape is [10, 1000].
+// Create a pre-allocated MLTensor and the corresponding ORT tensor. Assuming that the output shape is [10, 1000].
 const mlContext = await navigator.ml.createContext({deviceType, ...});
-const myPreAllocatedBuffer = await mlContext.createBuffer({
+const myPreAllocatedMLTensor = await mlContext.createTensor({
   dataType: 'float32',
   dimensions: [10, 1000],
-  usage: MLBufferUsage.READ_FROM,
+  usage: MLTensorUsage.READ,
 });
 
-const myPreAllocatedOutputTensor = ort.Tensor.fromMLBuffer(myPreAllocatedBuffer, {
+const myPreAllocatedOutputTensor = ort.Tensor.fromMLTensor(myPreAllocatedMLTensor, {
   dataType: 'float32',
   dims: [10, 1000]
 });
@@ -140,17 +140,17 @@ const results = await mySession.run(feeds, fetches);
 
 ```
 
-By specifying the output tensor in the fetches, ONNX Runtime Web will use the pre-allocated buffer as the output buffer. If there is a shape mismatch, the `run()` call will fail.
+By specifying the output tensor in the fetches, ONNX Runtime Web will use the pre-allocated MLTensor as the output tensor. If there is a shape mismatch, the `run()` call will fail.
 
 ### Specify the output data location
 
-If you don't want to use pre-allocated MLBuffer tensors for outputs, you can also specify the output data location in the session options:
+If you don't want to use pre-allocated MLTensor tensors for outputs, you can also specify the output data location in the session options:
 
 ```js
 const mySessionOptions1 = {
   ...,
-  // keep all output data on MLBuffer
-  preferredOutputLocation: 'ml-buffer'
+  // keep all output data on MLTensor
+  preferredOutputLocation: 'ml-tensor'
 };
 
 const mySessionOptions2 = {
@@ -158,7 +158,7 @@ const mySessionOptions2 = {
   // alternatively, you can specify the output location for each output tensor
   preferredOutputLocation: {
     'output_0': 'cpu',         // keep output_0 on CPU. This is the default behavior.
-    'output_1': 'ml-buffer'   // keep output_1 on MLBuffer buffer
+    'output_1': 'ml-tensor'   // keep output_1 on MLTensor tensor
   }
 };
 ```
@@ -169,18 +169,18 @@ See [API reference: preferredOutputLocation](https://onnxruntime.ai/docs/api/js/
 
 ## Notes
 
-### MLBuffer tensor life cycle management
+### MLTensor tensor life cycle management
 
-It is important to understand how the underlying MLBuffer is managed so that you can avoid memory leaks and improve buffer usage efficiency.
+It is important to understand how the underlying MLTensor is managed so that you can avoid memory leaks and improve tensor usage efficiency.
 
-A MLBuffer tensor is created either by user code or by ONNX Runtime Web as model's output.
-- When it is created by user code, it is always created with an existing MLBuffer using `Tensor.fromMLBuffer()`. In this case, the tensor does not "own" the MLBuffer.
+A MLTensor tensor is created either by user code or by ONNX Runtime Web as model's output.
+- When it is created by user code, it is always created with an existing MLTensor using `Tensor.fromMLTensor()`. In this case, the tensor does not "own" the MLTensor.
 
-  - It is user's responsibility to make sure the underlying buffer is valid during the inference, and call `mlBuffer.destroy()` to dispose the buffer when it is no longer needed.
-  - Avoid calling `tensor.getData()` and `tensor.dispose()`. Use the MLBuffer directly.
-  - Using a MLBuffer tensor with a destroyed MLBuffer will cause the session run to fail.
-- When it is created by ONNX Runtime Web as model's output (not a pre-allocated MLBuffer tensor), the tensor "owns" the buffer.
+  - It is user's responsibility to make sure the underlying MLTensor is valid during the inference, and call `mlTensor.destroy()` to dispose the MLTensor when it is no longer needed.
+  - Avoid calling `tensor.getData()` and `tensor.dispose()`. Use the MLTensor tensor directly.
+  - Using a MLTensor tensor with a destroyed MLTensor will cause the session run to fail.
+- When it is created by ONNX Runtime Web as model's output (not a pre-allocated MLTensor tensor), the tensor "owns" the MLTensor.
 
-  - You don't need to worry about the case that the buffer is destroyed before the tensor is used.
-  - Call `tensor.getData()` to download the data from the MLBuffer to CPU and get the data as a typed array.
-  - Call `tensor.dispose()` explicitly to destroy the underlying MLBuffer when it is no longer needed.
+  - You don't need to worry about the case that the MLTensor is destroyed before the tensor is used.
+  - Call `tensor.getData()` to download the data from the MLTensor to CPU and get the data as a typed array.
+  - Call `tensor.dispose()` explicitly to destroy the underlying MLTensor when it is no longer needed.
diff --git a/images/EP_context_node.png b/images/EP_context_node.png
new file mode 100644
index 0000000000000..953bcf353558a
Binary files /dev/null and b/images/EP_context_node.png differ
diff --git a/images/EP_context_nodes_with_different_eps.png b/images/EP_context_nodes_with_different_eps.png
new file mode 100644
index 0000000000000..c7b986d0f9c89
Binary files /dev/null and b/images/EP_context_nodes_with_different_eps.png differ
diff --git a/images/Onnx_weight_sharing.png b/images/Onnx_weight_sharing.png
new file mode 100644
index 0000000000000..b3c277903ddfb
Binary files /dev/null and b/images/Onnx_weight_sharing.png differ
diff --git a/images/Ort_Qnn_Ep_weight_sharing.png b/images/Ort_Qnn_Ep_weight_sharing.png
new file mode 100644
index 0000000000000..e8fa37d1bb2a4
Binary files /dev/null and b/images/Ort_Qnn_Ep_weight_sharing.png differ
diff --git a/images/Qnn_weight_sharing.png b/images/Qnn_weight_sharing.png
new file mode 100644
index 0000000000000..d415c3bfc57ca
Binary files /dev/null and b/images/Qnn_weight_sharing.png differ
diff --git a/images/nashville.jpg b/images/nashville.jpg
new file mode 100644
index 0000000000000..da40173230e0c
Binary files /dev/null and b/images/nashville.jpg differ
diff --git a/src/app.html b/src/app.html
index 5f79324942486..cdfdad8b3f2dc 100644
--- a/src/app.html
+++ b/src/app.html
@@ -36,6 +36,11 @@
 				},
 				propertyConfiguration: {
 					// Properties Plugin configuration
+					gpcDataSharingOptIn: false, 
+					callback: {
+						userConsentDetails: _getWcpUserConsentDetails
+					},
+
 					env: 'PROD' // Environment can be set to PPE or PROD as needed.
 				},
 				webAnalyticsConfiguration: {
@@ -77,6 +82,7 @@
 			}
 		};
 
+		var siteConsent = null;
 		WcpConsent.init(
 			'en-US',
 			'cookie-banner',
@@ -91,6 +97,24 @@
 			WcpConsent.themes.light
 		);
 
+		function _getWcpUserConsentDetails() {
+			if (siteConsent) {
+				return siteConsent.getConsent();
+			}
+
+			// The exact value that you return here is dependent on your site, team and how
+			// use any data that is stored (work with you privacy team to determine what the
+			// correct "defaults" (true or false) should be for each item when the code is
+			// unable to determine (via WCP) if or what the user has (or has not) consented
+			// to.
+			return {
+				Required: [true], // Most likely `true`
+				Analytics: [true],
+				SocialMedia: [true],
+				Advertising: [false]
+			};
+		}
+
 		function onConsentChanged(categoryPreferences) {
 			if (categoryPreferences.Analytics) {
 				// Google Analytics
diff --git a/src/images/logos/autodesk-logo.png b/src/images/logos/autodesk-logo.png
new file mode 100644
index 0000000000000..cb7d223734dbb
Binary files /dev/null and b/src/images/logos/autodesk-logo.png differ
diff --git a/src/images/logos/goodnotes-logo.png b/src/images/logos/goodnotes-logo.png
new file mode 100644
index 0000000000000..86ee9ccee519a
Binary files /dev/null and b/src/images/logos/goodnotes-logo.png differ
diff --git a/src/routes/blogs/+page.svelte b/src/routes/blogs/+page.svelte
index 8dda46876bcdc..bbc36db183912 100644
--- a/src/routes/blogs/+page.svelte
+++ b/src/routes/blogs/+page.svelte
@@ -366,6 +366,12 @@
 		}
 	];
 	let blogsCommunity = [
+		{
+			title:'Running Phi-3 Mistral 7B LLMs on Raspberry Pi 5: A Step-by-Step Guide',
+			date: 'September 5, 2024',
+			link: 'https://medium.com/@vadikus/running-phi-3-mistral-7b-llms-on-raspberry-pi-5-a-step-by-step-guide-185e8102e35b',
+			blurb: 'Learn how to run Phi-3 Mistral 7B on Raspberry Pi 5 using the ONNX Runtime Gen AI library.'
+		},
 		{
 			title:
 				'Deploying a Production-Ready RAG Server: A Comprehensive Guide with LlamaIndex',
diff --git a/src/routes/blogs/nimbleedge-x-onnxruntime/+page.svx b/src/routes/blogs/nimbleedge-x-onnxruntime/+page.svx
index 7dc2f326cb3f5..48efc63953143 100644
--- a/src/routes/blogs/nimbleedge-x-onnxruntime/+page.svx
+++ b/src/routes/blogs/nimbleedge-x-onnxruntime/+page.svx
@@ -32,7 +32,7 @@ url: 'https://onnxruntime.ai/blogs/nimbleedge-x-onnxruntime'
 
 [NimbleEdge](https://www.nimbleedge.com/) is an on-device Machine Learning (ML) platform that enables real-time personalization in mobile apps, executing data capture, processing and ML inference on end users' mobile devices vs. on cloud. Using mobile compute efficiently to deliver optimal performance with minimal device resource usage is a key priority for NimbleEdge. For this, NimbleEdge leverages various ML inference runtimes, including, prominently, **ONNX Runtime**.
 
-In this blog post, we'll explore how on-device compute can be leveraged for cost-efficient, privacy-preserving real-time ML in mobile apps, and how NimbleEdge leverages ONNX Runtime to enable this. We also share results from NimbleEdge's on-device deployment with Dream11, India's largest fantasy gaming platform with 200Mn+ users.
+In this blog post, we'll explore how on-device compute can be leveraged for cost-efficient, privacy-preserving real-time ML in mobile apps, and how NimbleEdge leverages ONNX Runtime to enable this. We also share results from NimbleEdge’s on-device deployment with one of India’s largest fantasy gaming platforms with hundreds of millions of users.  
 
 ### **Introduction**
 
@@ -102,17 +102,17 @@ For inference execution, NimbleEdge utilizes a number of runtimes, prominently i
 
 Through the capabilities listed here, NimbleEdge's comprehensive on-device ML platform enables high performance real-time ML deployments in days vs. months.
 
-### **Case Study: Real time ranking of fantasy sports contests for Dream11**
+### **Case Study: Real time ranking of fantasy sports contests for leading Indian fantasy gaming co**
 
-Dream11 is an Indian fantasy sports platform (like Fanduel/ Draftkings in USA) with 200M+ users, and a peak concurrency of ~15 million users. Dream11 offers thousands of fantasy contests across dozens of matches from 10+ sports, with each contest varying in contest entry amount, win %, and participant count.
+Fantasy Gaming co (name obscured for confidentiality) is an Indian fantasy sports platform (like Fanduel/ Draftkings in USA) with hundreds of millions of users, and a peak concurrency of several million users. Fantasy Gaming co offers thousands of fantasy contests across dozens of matches from 10+ sports, with each contest varying in contest entry amount, win %, and no. of participants. 
 
-To streamline the user journey, Dream11 was running a recommendation system that delivered personalized contest recommendations to users, based on historical interactions. Dream11 analyzed customer clickstream data, and identified that incorporating in-session user interactions in the recommender systems would significantly improve quality of recommendations vs. leveraging batch predictions generated hourly.
+To streamline the user journey, Fantasy Gaming co was running a recommendation system that delivered personalized contest recommendations to users, based on historical interactions. They analyzed customer clickstream data, and identified that incorporating in-session user interactions in the recommender systems would significantly improve quality of recommendations vs. leveraging batch predictions generated hourly. 
 
-Due to this, Dream11 was keen to deploy real-time, session-aware recommendations, but implementation was challenging due to the aforementioned challenges in real-time ML on cloud. Hence, Dream11 turned to on-device ML with NimbleEdge for implementing real-time personalized contest recommendations.
+Due to this, Fantasy Gaming co was keen to deploy real-time, session-aware recommendations, but implementation was challenging due to the aforementioned challenges in real-time ML on cloud. Hence, Fantasy Gaming co turned to on-device ML with NimbleEdge for implementing real-time personalized contest recommendations.  
 
 **Results**
 
-With NimbleEdge, Dream11 is now able to generate features and predictions based on real-time user interactions, resulting in improved relevance of recommendations for millions of users. Additionally, inference was delivered at millisecond latency, with minimal battery and CPU usage impact!
+With NimbleEdge, Fantasy Gaming co is now able to generate features and predictions based on real-time user interactions, resulting in improved relevance of recommendations for millions of users. Additionally, inference was delivered at millisecond latency, with minimal battery and CPU usage impact!
 
 **No. of inferences:** `7B+`
 
diff --git a/src/routes/blogs/pytorch-on-the-edge/+page.svelte b/src/routes/blogs/pytorch-on-the-edge/+page.svelte
index 83ab6d2d49db6..d0a9d765cd5f1 100644
--- a/src/routes/blogs/pytorch-on-the-edge/+page.svelte
+++ b/src/routes/blogs/pytorch-on-the-edge/+page.svelte
@@ -179,9 +179,9 @@ fun run(audioTensor: OnnxTensor): Result {
 <div class="container mx-auto px-4 md:px-8 lg:px-48 pt-8">
 	<h1 class="text-5xl pb-2">Run PyTorch models on the edge</h1>
 	<p class="text-neutral">
-		By: <a href="https://www.linkedin.com/in/natkershaw/" class="text-blue-700">Natalie Kershaw</a>
+		By: <a href="https://www.linkedin.com/in/natkershaw/" class="dark:text-blue-300 text-blue-800 underline">Natalie Kershaw</a>
 		and
-		<a href="https://www.linkedin.com/in/prasanthpulavarthi/" class="text-blue-700"
+		<a href="https://www.linkedin.com/in/prasanthpulavarthi/" class="dark:text-blue-300 text-blue-800 underline"
 			>Prasanth Pulavarthi</a
 		>
 	</p>
@@ -217,12 +217,12 @@ fun run(audioTensor: OnnxTensor): Result {
 				anywhere that is outside of the cloud, ranging from large, well-resourced personal computers
 				to small footprint devices such as mobile phones. This has been a challenging task to
 				accomplish in the past, but new advances in model optimization and software like
-				<a href="https://onnxruntime.ai/pytorch" class="text-blue-700">ONNX Runtime</a>
+				<a href="https://onnxruntime.ai/pytorch" class="dark:text-blue-300 text-blue-800 underline">ONNX Runtime</a>
 				make it more feasible - even for new generative AI and large language models like Stable Diffusion,
 				Whisper, and Llama2.
 			</p>
 
-			<h2 class="text-blue-700 text-3xl mb-4">Considerations for PyTorch models on the edge</h2>
+			<h2 class="dark:text-blue-300 text-blue-800 underline text-3xl mb-4">Considerations for PyTorch models on the edge</h2>
 
 			<p class="mb-4">
 				There are several factors to keep in mind when thinking about running a PyTorch model on the
@@ -292,7 +292,7 @@ fun run(audioTensor: OnnxTensor): Result {
 				</li>
 			</ul>
 
-			<h2 class="text-blue-700 text-3xl mb-4">Tools for PyTorch models on the edge</h2>
+			<h2 class="dark:text-blue-300 text-blue-800 underline text-3xl mb-4">Tools for PyTorch models on the edge</h2>
 
 			<p class="mb-4">
 				We mentioned ONNX Runtime several times above. ONNX Runtime is a compact, standards-based
@@ -305,7 +305,7 @@ fun run(audioTensor: OnnxTensor): Result {
 				format that doesn't require the PyTorch framework and its gigabytes of dependencies. PyTorch
 				has thought about this and includes an API that enables exactly this - <a
 					href="https://pytorch.org/docs/stable/onnx.html"
-					class="text-blue-700">torch.onnx</a
+					class="dark:text-blue-300 text-blue-800 underline">torch.onnx</a
 				>. <a href="https://onnx.ai/">ONNX</a> is an open standard that defines the operators that make
 				up models. The PyTorch ONNX APIs take the Pythonic PyTorch code and turn it into a functional
 				graph that captures the operators that are needed to run the model without Python. As with everything
@@ -318,7 +318,7 @@ fun run(audioTensor: OnnxTensor): Result {
 				The popular Hugging Face library also has APIs that build on top of this torch.onnx
 				functionality to export models to the ONNX format. Over <a
 					href="https://huggingface.co/blog/ort-accelerating-hf-models"
-					class="text-blue-700">130,000 models</a
+					class="dark:text-blue-300 text-blue-800 underline">130,000 models</a
 				> are supported making it very likely that the model you care about is one of them.
 			</p>
 
@@ -328,7 +328,7 @@ fun run(audioTensor: OnnxTensor): Result {
 				and web browsers) via various languages (from C# to JavaScript to Swift).
 			</p>
 
-			<h2 class="text-blue-700 text-3xl mb-4">Examples of PyTorch models on the edge</h2>
+			<h2 class="dark:text-blue-300 text-blue-800 underline text-3xl mb-4">Examples of PyTorch models on the edge</h2>
 
 			<h3 class=" text-2xl mb-2">Stable Diffusion on Windows</h3>
 
@@ -345,7 +345,7 @@ fun run(audioTensor: OnnxTensor): Result {
 			<p class="mb-4">
 				You don't have to export the fifth model, ClipTokenizer, as it is available in <a
 					href="https://onnxruntime.ai/docs/extensions"
-					class="text-blue-700">ONNX Runtime extensions</a
+					class="dark:text-blue-300 text-blue-800 underline">ONNX Runtime extensions</a
 				>, a library for pre and post processing PyTorch models.
 			</p>
 
@@ -353,7 +353,7 @@ fun run(audioTensor: OnnxTensor): Result {
 				To run this pipeline of models as a .NET application, we build the pipeline code in C#. This
 				code can be run on CPU, GPU, or NPU, if they are available on your machine, using ONNX
 				Runtime's device-specific hardware accelerators. This is configured with the <code
-					class="bg-gray-200 p-1 rounded">ExecutionProviderTarget</code
+					class="bg-gray-200 dark:bg-gray-700 p-1 rounded">ExecutionProviderTarget</code
 				> below.
 			</p>
 			<Highlight language={csharp} code={dotnetcode} />
@@ -366,7 +366,7 @@ fun run(audioTensor: OnnxTensor): Result {
 			<p class="mb-4">
 				You can build the application and run it on Windows with the detailed steps shown in this <a
 					href="https://onnxruntime.ai/docs/tutorials/csharp/stable-diffusion-csharp.html"
-					class="text-blue-700">tutorial</a
+					class="dark:text-blue-300 text-blue-800 underline">tutorial</a
 				>.
 			</p>
 
@@ -374,7 +374,7 @@ fun run(audioTensor: OnnxTensor): Result {
 
 			<p class="mb-4">
 				Running a PyTorch model locally in the browser is not only possible but super simple with
-				the <a href="https://huggingface.co/docs/transformers.js/index" class="text-blue-700"
+				the <a href="https://huggingface.co/docs/transformers.js/index" class="dark:text-blue-300 text-blue-800 underline"
 					>transformers.js</a
 				> library. Transformers.js uses ONNX Runtime Web as its backend. Many models are already converted
 				to ONNX and served by the tranformers.js CDN, making inference in the browser a matter of writing
@@ -407,7 +407,7 @@ fun run(audioTensor: OnnxTensor): Result {
 				All components of the Whisper Tiny model (audio decoder, encoder, decoder, and text sequence
 				generation) can be composed and exported to a single ONNX model using the <a
 					href="https://github.com/microsoft/Olive/tree/main/examples/whisper"
-					class="text-blue-700">Olive framework</a
+					class="dark:text-blue-300 text-blue-800 underline">Olive framework</a
 				>. To run this model as part of a mobile application, you can use ONNX Runtime Mobile, which
 				supports Android, iOS, react-native, and MAUI/Xamarin.
 			</p>
@@ -420,7 +420,7 @@ fun run(audioTensor: OnnxTensor): Result {
 			<p class="mb-4">
 				The relevant snippet of a example <a
 					href="https://github.com/microsoft/onnxruntime-inference-examples/tree/main/mobile/examples/speech_recognition"
-					class="text-blue-700">Android mobile app</a
+					class="dark:text-blue-300 text-blue-800 underline">Android mobile app</a
 				> that performs speech transcription on short samples of audio is shown below:
 			</p>
 			<Highlight language={kotlin} code={mobilecode} />
@@ -476,11 +476,11 @@ fun run(audioTensor: OnnxTensor): Result {
 			<p class="mb-4">
 				You can read the full <a
 					href="https://onnxruntime.ai/docs/tutorials/on-device-training/ios-app.html"
-					class="text-blue-700">Speaker Verification tutorial</a
+					class="dark:text-blue-300 text-blue-800 underline">Speaker Verification tutorial</a
 				>, and
 				<a
 					href="https://github.com/microsoft/onnxruntime-training-examples/tree/master/on_device_training/mobile/ios"
-					class="text-blue-700">build and run the application from source</a
+					class="dark:text-blue-300 text-blue-800 underline">build and run the application from source</a
 				>.
 			</p>
 
diff --git a/src/routes/components/customers.svelte b/src/routes/components/customers.svelte
index a5da8146bea27..6c6c7dce06171 100644
--- a/src/routes/components/customers.svelte
+++ b/src/routes/components/customers.svelte
@@ -8,32 +8,36 @@
 	import antgroupLogo from '../../images/logos/antgroup-logo.png';
 	import algoriddimLogo from '../../images/logos/algoriddim-logo.png';
 	import ATLASLogo from '../../images/logos/ATLAS-logo.png';
+	import autodeskLogo from '../../images/logos/autodesk-logo.png';
 	import bazaarvoiceLogo from '../../images/logos/bazaarvoice-logo.png';
 	import camoLogo from '../../images/logos/camo-logo.png';
 	import cephableLogo from '../../images/logos/cephable-logo.png';
 	import clearbladeLogo from '../../images/logos/clearblade-logo.png';
 	import deezerLogo from '../../images/logos/deezer-logo.png';
+	import goodnotesLogo from '../../images/logos/goodnotes-logo.png';
+	import huggingfaceLogo from '../../images/logos/huggingface-logo.png';
 	import hypefactorsLogo from '../../images/logos/hypefactors-logo.png';
 	import infarmLogo from '../../images/logos/infarm-logo.png';
 	import intelLogo from '../../images/logos/intel-logo.png';
 	import intelligenzaEticaLogo from '../../images/logos/intelligenza-etica-logo.png';
-	import navitaireAmadeusLogo from '../../images/logos/navitaire-amadeus-logo.png';
-	import PeakSpeedLogo from '../../images/logos/PeakSpeed_logo.png';
+	import navitaireLogo from '../../images/logos/navitaire-amadeus-logo.png';
+	import nvidiaLogo from '../../images/logos/nvidia.png';
+	import opennlpLogo from '../../images/logos/opennlp-logo.png';
+	import oracleLogo from '../../images/logos/oracle-logo.png';
+	import peakspeedLogo from '../../images/logos/PeakSpeed_logo.png';
 	import piecesLogo from '../../images/logos/pieces-logo.png';
+	import ptwLogo from '../../images/logos/ptw-logo.png';
 	import redisLogo from '../../images/logos/redis-logo.png';
-	import RockchipLogo from '../../images/logos/Rockchip-logo.png';
+	import rockchipLogo from '../../images/logos/Rockchip-logo.png';
 	import samtecLogo from '../../images/logos/samtec-logo.png';
 	import sasLogo from '../../images/logos/sas-logo.png';
 	import teradataLogo from '../../images/logos/teradata-logo.png';
 	import topazlabsLogo from '../../images/logos/topazlabs-logo.png';
-	import ueLogo from '../../images/logos/ue-logo.png';
+	import unrealengineLogo from '../../images/logos/ue-logo.png';
 	import usdaLogo from '../../images/logos/usda-logo.png';
 	import vespaLogo from '../../images/logos/vespa-logo.png';
 	import writerLogo from '../../images/logos/writer-logo.png';
 	import xilinxLogo from '../../images/logos/xilinx-logo.png';
-	import huggingfaceLogo from '../../images/logos/huggingface-logo.png';
-	import nvidiaLogo from '../../images/logos/nvidia.png';
-	import oracleLogo from '../../images/logos/oracle-logo.png';
 
 	const testimonials = [
 		{
@@ -61,6 +65,11 @@
 			src: ATLASLogo,
 			alt: 'ATLAS'
 		},
+		{
+			href: './testimonials#Autodesk',
+			src: autodeskLogo,
+			alt: 'Autodesk'
+		},
 		{
 			href: './testimonials#Bazaarvoice',
 			src: bazaarvoiceLogo,
@@ -86,6 +95,11 @@
 			src: deezerLogo,
 			alt: 'Deezer'
 		},
+		{
+			href: './testimonials#Goodnotes',
+			src: goodnotesLogo,
+			alt: 'GoodNotes'
+		},
 		{
 			href: './testimonials#Hugging%20Face',
 			src: huggingfaceLogo,
@@ -113,7 +127,7 @@
 		},
 		{
 			href: './testimonials#Navitaire',
-			src: navitaireAmadeusLogo,
+			src: navitaireLogo,
 			alt: 'Navitaire'
 		},
 		{
@@ -121,6 +135,11 @@
 			src: nvidiaLogo,
 			alt: 'NVIDIA'
 		},
+		{
+			href: './testimonials#Apache%20OpenNLP',
+			src: opennlpLogo,
+			alt: 'Apache OpenNLP'
+		},
 		{
 			href: './testimonials#Oracle',
 			src: oracleLogo,
@@ -128,7 +147,7 @@
 		},
 		{
 			href: './testimonials#Peakspeed',
-			src: PeakSpeedLogo,
+			src: peakspeedLogo,
 			alt: 'Peakspeed'
 		},
 		{
@@ -136,6 +155,11 @@
 			src: piecesLogo,
 			alt: 'Pieces'
 		},
+		{
+			href: './testimonials#PTW%20Dosimetry',
+			src: ptwLogo,
+			alt: 'PTW Dosimetry'
+		},
 		{
 			href: './testimonials#Redis',
 			src: redisLogo,
@@ -143,7 +167,7 @@
 		},
 		{
 			href: './testimonials#Rockchip',
-			src: RockchipLogo,
+			src: rockchipLogo,
 			alt: 'Rockchip'
 		},
 		{
@@ -168,7 +192,7 @@
 		},
 		{
 			href: './testimonials#Unreal%20Engine',
-			src: ueLogo,
+			src: unrealengineLogo,
 			alt: 'Unreal Engine'
 		},
 		{
diff --git a/src/routes/components/footer.svelte b/src/routes/components/footer.svelte
index b030524976742..e6b855d0ca129 100644
--- a/src/routes/components/footer.svelte
+++ b/src/routes/components/footer.svelte
@@ -9,7 +9,7 @@
 <footer class="footer p-10 mt-10 text-base-content z-40 border-top border-t">
 	<div>
 		<p>ONNX Runtime<br />Copyright © Microsoft. All rights reserved.</p>
-		<span class="footer-title">Follow us at:</span>
+		<span class="dark:text-blue-200 footer-title">Follow us at:</span>
 		<div class="grid grid-flow-col gap-4">
 			<a aria-label="youtube" href="https://www.youtube.com/onnxruntime" target="_blank"
 				><div class="w-8 h-8 pt-0.5 hover:text-primary"><FaYoutube /></div></a
@@ -24,12 +24,12 @@
 	</div>
 	<div />
 	<div>
-		<span class="footer-title text-bold	">Get Started</span>
+		<span class="dark:text-blue-200 footer-title text-bold">Get Started</span>
 		<a href={pathvar + '/getting-started'} class="link link-hover">Install</a>
 		<a href={pathvar + '/pytorch'} class="link link-hover">PyTorch</a>
 	</div>
 	<div>
-		<span class="footer-title">Resources</span>
+		<span class="dark:text-blue-200 footer-title">Resources</span>
 		<a href={pathvar + '/blogs'} class="link link-hover">Blogs</a>
 		<a rel="external" href={pathvar + '/docs/tutorials'} class="link link-hover">Tutorials</a>
 		<a rel="external" href={pathvar + '/docs/api/'} class="link link-hover">APIs</a>
diff --git a/src/routes/components/generative-ai-hero.svelte b/src/routes/components/generative-ai-hero.svelte
index 0c5a6054d3a03..bff3953cc7ebd 100644
--- a/src/routes/components/generative-ai-hero.svelte
+++ b/src/routes/components/generative-ai-hero.svelte
@@ -15,7 +15,7 @@
 				and more.
 			</p>
 			<br />
-			<a href="https://onnxruntime.ai/docs/genai" class="btn btn-primary">Learn more about ONNX Runtime & Generative AI →</a>
+			<a href="./generative-ai" class="btn btn-primary">Learn more about ONNX Runtime & Generative AI →</a>
 		</div>
 		<div class="m-auto overflow:hidden">
 			<Ortgenerativeai />
diff --git a/src/routes/components/header.svelte b/src/routes/components/header.svelte
index e6fab22abf3be..2f3428090ad95 100644
--- a/src/routes/components/header.svelte
+++ b/src/routes/components/header.svelte
@@ -46,6 +46,7 @@
 				<li>
 					<p class="hover:bg-primary focus:bg-primary">Community</p>
 					<ul class="p-2">
+						<li><a class="hover:bg-primary focus:bg-primary" href={pathvar + '/roadmap'}>Roadmap</a></li>
 						<li><a class="hover:bg-primary focus:bg-primary" href={pathvar + '/events'}>Events</a></li>
 						<li><a class="hover:bg-primary focus:bg-primary" href={pathvar + '/testimonials'}>Testimonials</a></li>
 						<li>
@@ -93,6 +94,7 @@
 				<details class="z-[1]">
 					<summary class="hover:bg-primary focus:bg-primary">Community</summary>
 					<ul class="p-2">
+						<li><a class="hover:bg-primary focus:bg-primary" href={pathvar + '/roadmap'}>Roadmap</a></li>
 						<li><a class="hover:bg-primary focus:bg-primary" href={pathvar + '/events'}>Events</a></li>
 						<li><a class="hover:bg-primary focus:bg-primary" href={pathvar + '/testimonials'}>Testimonials</a></li>
 						<li>
diff --git a/src/routes/events/+page.svelte b/src/routes/events/+page.svelte
index 80bfb1f22e45a..cf3e7c28a3680 100644
--- a/src/routes/events/+page.svelte
+++ b/src/routes/events/+page.svelte
@@ -20,8 +20,7 @@
 				}
 			],
 			image: converttoort,
-			imagealt:
-				'Slide detailing how to convert from various frameworks to ONNX, then deploy anywhere using ORT'
+			imagealt: 'Slide detailing how to convert from various frameworks to ONNX, then deploy anywhere using ORT'
 		}
 	];
 
@@ -74,6 +73,7 @@
 					date={event.date}
 					linkarr={event.linkarr}
 					image={event.image}
+					imagealt={event.imagealt}
 				/>
 			{/each}
 		</div>
diff --git a/src/routes/events/event-post.svelte b/src/routes/events/event-post.svelte
index 6fcaf93f691fc..5aaa33f21d1a1 100644
--- a/src/routes/events/event-post.svelte
+++ b/src/routes/events/event-post.svelte
@@ -33,7 +33,7 @@
 			<div class="card-body col-span-3 md:col-span-2">
 				<h2 class="card-title">{title}</h2>
 				<p>{description}</p>
-				<p class="text-blue-700 text-right">
+				<p class="text-blue-800 text-right">
 					{date}
 				</p>
 				<div class="card-actions">
@@ -43,7 +43,7 @@
 				</div>
 			</div>
 			<div class="card-image col-span-1 m-auto hidden md:flex">
-				<img class="" src={image} alt={imagealt} />
+				<img src={image} alt={imagealt} />
 			</div>
 		</div>
 	</a>
diff --git a/src/routes/generative-ai/+page.svelte b/src/routes/generative-ai/+page.svelte
index beb6588e8b9a2..6f473910d5330 100644
--- a/src/routes/generative-ai/+page.svelte
+++ b/src/routes/generative-ai/+page.svelte
@@ -1,86 +1,157 @@
 <script lang="ts">
 	import LandingHero from '../components/landing-hero.svelte';
-	const title = 'Generative AI + ONNX Runtime';
+	import { fade } from 'svelte/transition';
+	import coffee from './coffee.png';
+	import whisper from './whisper.png';
+	import vision_ui from './vision_UI.png';
+	import mobile from './mobile.png';
+	import desktop from './desktop.png';
+	import cloud from './browser.png';
+	import aibrain from './aibrain.webp';
+	const title = 'Generative AI';
 	const description =
-		'Integrate the power of generative AI in your apps and services with ONNX Runtime. Broad platform support and deep optimizations empower usage of state-of-the-art models for image synthesis, text generation, and more.';
+		'Learn about integrating the power of generative AI in your apps and services. Use state-of-the-art models for text generation, audio synthesis, and more to create innovative experiences.';
 	const imgsrc = 'onnxruntimelogo';
 	const imgalt = 'ONNX Runtime Logo';
-	import stablediffusion1 from '../../images/StableDiffusion1.webp';
-	import stablediffusion2 from '../../images/StableDiffusion2.webp';
-	let image = 'https://i.ibb.co/0YBy62j/ORT-icon-for-light-bg.png'
-	let imageSquare = 'https://i.ibb.co/0YBy62j/ORT-icon-for-light-bg.png'
-	let authors = ['']
-	let keywords = 'onnxruntime, onnx runtime generative ai, onnx runtime generative ai models, onnx runtime generative ai deployment, onnx runtime generative ai performance, onnx runtime generative ai time to market, onnx runtime generative ai deploy anywhere, onnx runtime generative ai boost performance, onnx runtime generative ai improve time to market, onnx runtime generative ai production ready, onnx runtime generative ai lower latency, onnx runtime generative ai higher throughput, onnx runtime generative ai get innovations into production faster, onnx runtime generative ai testimonials, onnx runtime generative ai performance enhancements, onnx runtime generative ai production ready, onnx runtime generative ai lower latency, onnx runtime generative ai higher throughput, onnx runtime generative ai get innovations into production faster, onnx runtime generative ai testimonials, onnx runtime generative ai performance enhancements'
+	let image = 'https://i.ibb.co/0YBy62j/ORT-icon-for-light-bg.png';
+	let authors = ['']; // Ensure to populate with valid author names if needed
+	const keywords =
+		'onnxruntime, onnx runtime generative ai, onnx runtime generative ai models, onnx runtime generative ai deployment, onnx runtime generative ai performance, onnx runtime generative ai time to market, onnx runtime generative ai production ready, onnx runtime generative ai lower latency, onnx runtime generative ai higher throughput, onnx runtime generative ai testimonials, onnx runtime generative ai performance enhancements';
+	const cycleWords = ['Desktop', 'Mobile', 'Cloud'];
+	let cycleIndex = 0;
+
+	// Set interval for cycling platform display
+	const interval = setInterval(() => {
+		cycleIndex = (cycleIndex + 1) % cycleWords.length;
+	}, 3000);
+
+	// Cleanup interval on component destruction
+	import { onDestroy } from 'svelte';
+	onDestroy(() => {
+		clearInterval(interval);
+	});
 </script>
+
 <svelte:head>
-	<!-- Dynamic meta tags -->
+	<!-- Meta tags -->
 	<meta name="description" content={description} />
-	<meta name="image" content={image} />
+	<meta name="image" content={aibrain} />
 	<meta name="author" content={authors.join(', ')} />
 	<meta name="keywords" content={keywords} />
+
 	<!-- Open Graph / Facebook -->
-	<meta property="og:description" content={description}/>
-	<meta property="og:image" content={image} />
-	
+	<meta property="og:description" content={description} />
+	<meta property="og:image" content={aibrain} />
+
 	<!-- Twitter -->
 	<meta property="twitter:description" content={description} />
-	<meta property="twitter:image" content={image} />
-	<meta property="twitter:card" content={imageSquare} />
+	<meta property="twitter:image" content={aibrain} />
+	<meta property="twitter:card" content="summary_large_image" />
 </svelte:head>
 
 <LandingHero {title} {description} {imgsrc} {imgalt} />
+
 <div class="container mx-auto px-10 my-10">
-	<h1 class="text-4xl">Stable Diffusion + ONNX Runtime</h1>
-	<p class="pb-4">Use ONNX Runtime to accelerate this popular image generation model.</p>
-	<h2 class="text-3xl pb-4">Benefits</h2>
-	<div class="grid gap-10 grid-cols-1 lg:grid-cols-2 pb-4">
-		<div class="card bg-base-300">
-			<div class="card-body">
-				<h2 class="card-title pb-4">Run Stable Diffusion outside of a Python environment</h2>
-				<div class="card-actions">
-					<a
-						href="https://onnxruntime.ai/docs/tutorials/csharp/stable-diffusion-csharp.html"
-						class="btn-primary btn">Inference Stable Diffusion →</a
-					>
+	<div class=" grid grid-cols-1 md:grid-cols-4">
+		<div class="m-2 lg:m-8">
+			<img class="rounded-xl" src={aibrain} alt="Representing generative AI" />
+		</div>
+		<div class="col-span-3 my-auto">
+			<h2 class="text-3xl pb-4">What is Generative AI?</h2>
+			<p>
+				Generative AI refers to artificial intelligence that creates new content—such as text,
+				images, audio, or code—based on patterns learned from existing data. Generative AI leverages
+				transformer models for text and diffusion models for images. These innovations are
+				transforming industries, enabling personalized experiences, automating creative processes,
+				and opening new possibilities for content generation!
+			</p>
+		</div>
+	</div>
+
+	<div class="divider" />
+
+	<div>
+		<h2 class="text-3xl">Generative AI Models</h2>
+		<div class="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-4 my-8 gap-4">
+			<!-- Model cards -->
+			{#each [{ title: 'Text Generation Models', description: 'Generate human-like text for chatbots, content creation, summarization, and more.', demos: [{ name: 'Llama', url: 'https://github.com/microsoft/onnxruntime-genai/tree/main/examples/python/README.md' }, { name: 'Mistral', url: 'https://github.com/microsoft/Olive/tree/main/examples/mistral' }, { name: 'Phi', url: 'https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi-3-tutorial.md' }] }, { title: 'Image Generation Models', description: 'Create artwork or realistic images from descriptions using AI models like Stable Diffusion.', demos: [{ name: 'Stable Diffusion', url: 'https://github.com/microsoft/Olive/tree/main/examples/stable_diffusion' }] }, { title: 'Audio Models', description: 'Generate audio, music, or speech from data inputs with AI models like Whisper.', demos: [{ name: 'Whisper', url: 'https://huggingface.co/spaces/Xenova/whisper-web' }] }, { title: 'Other Models', description: 'Generate diverse outputs like code, video, or 3D designs.', demos: [{ name: 'Request a Model', url: 'https://github.com/microsoft/onnxruntime-genai/discussions/categories/model-support' }] }] as model}
+				<div class="mx-auto flex flex-col gap-4 h-full">
+					<div class="flex-1">
+						<h3 class="text-2xl">{model.title}</h3>
+						<p>{model.description}</p>
+					</div>
+					<div class="grid gap-4 grid-cols-{model.demos.length} mt-auto">
+						{#each model.demos as demo}
+							<a href={demo.url} class="btn btn-primary btn-block">{demo.name}</a>
+						{/each}
+					</div>
 				</div>
-			</div>
+			{/each}
 		</div>
-		<div class="card bg-base-300">
-			<div class="card-body">
-				<h2 class="card-title pb-4">
-					Speed up inference of Stable Diffusion on NVIDIA and AMD GPUs
+	</div>
+
+	<div class="divider" />
+
+	<div>
+		<h2 class="text-3xl">ONNX Runtime ❤️ Generative AI</h2>
+		<p>
+			Use ONNX Runtime for high performance, scalability, and flexibility when deploying generative
+			AI models. With support for diverse frameworks and hardware acceleration, ONNX Runtime ensures
+			efficient, cost-effective model inference across platforms.
+		</p>
+
+		<div class="grid grid-cols-1 lg:grid-cols-2 my-8">
+			<div>
+				<h2 class="text-2xl text-center">Run ONNX Runtime on:</h2>
+				<h2 class="text-center text-4xl">
+					{#key cycleIndex}
+						<strong in:fade={{ duration: 1000 }}>{cycleWords[cycleIndex]}</strong>
+						<img
+							class="mx-auto mt-4"
+							height="512px"
+							width="512px"
+							in:fade={{ duration: 1000 }}
+							src={cycleIndex === 0 ? desktop : cycleIndex === 1 ? mobile : cloud}
+							alt="Platform for desktop, mobile, and cloud usage"
+						/>
+					{/key}
 				</h2>
-				<div class="card-actions">
-					<a
-						href="https://medium.com/microsoftazure/accelerating-stable-diffusion-inference-with-onnx-runtime-203bd7728540"
-						class="btn btn-primary">Accelerate Stable Diffusion →</a
-					>
-				</div>
+			</div>
+
+			<!-- Feature Cards -->
+			<div class="grid gap-4 grid-cols-1 md:grid-cols-2 lg:grid-cols-2 my-8">
+				{#each [{ title: 'Multiplatform', description: 'Run ONNX Runtime on Desktop 🖥️, Mobile 📱, Browser 💻, or Cloud ☁️.' }, { title: 'On Device', description: 'Inference privately 🔐 and save costs ⚙️ with on-device models.' }, { title: 'Multimodal Compatibility', description: 'Use ONNX Runtime with vision or omni models. We work to quickly enable all new model scenarios 🚀.' }, { title: 'Easy to Use', description: 'Get started quickly ⏩ with our examples and tutorials.' }] as feature}
+					<div class="card bg-primary dark:text-black text-neutral-content">
+						<div class="card-body items-center text-center">
+							<h2 class="card-title">{feature.title}</h2>
+							<p class="my-auto pt-10">{feature.description}</p>
+						</div>
+					</div>
+				{/each}
 			</div>
 		</div>
 	</div>
-	<h2 class="text-3xl">Performance</h2>
-	<p class="pb-4">The average latency in seconds on Stable Diffusion v1.5 and v2.1 models:</p>
-	<div class="grid gap-10 grid-cols-1 md:grid-cols-2 pb-10">
-		<div class="">
-			<img src={stablediffusion1} alt="Stable Diffusion v1.5 latency graphs" />
-		</div>
-		<div class="">
-			<img src={stablediffusion2} alt="Stable Diffusion v2.1 latency graphs" />
+
+	<div>
+		<h3 class="text-2xl">Tutorials & Demos</h3>
+		<p>Get started with any of these tutorials and demos:</p>
+		<div class="grid grid-cols-1 gap-4 lg:grid-cols-3 my-8">
+			<!-- Tutorial cards -->
+			{#each [{ title: 'Phi-3 Vision', img: coffee, description: 'A Desktop app demo to interact with text and images simultaneously.', url: 'https://onnxruntime.ai/docs/genai/tutorials/phi3-v.html' }, { title: 'Olive Examples', img: vision_ui, description: 'Use Olive, a hardware-aware optimizer, to quickly generate the ideal ONNX model for your needs.', url: 'https://github.com/microsoft/Olive/tree/main/examples' }, { title: 'Whisper in Browser', img: whisper, description: 'Run whisper to transcribe user audio in your browser!', url: 'https://github.com/microsoft/onnxruntime-inference-examples/tree/main/js/ort-whisper' }] as tutorial}
+				<div class="card bg-base-100 image-full sm:w-80 mx-auto">
+					<figure><img class="brightness-50" src={tutorial.img} alt={tutorial.title} /></figure>
+					<div class="card-body items-center text-center">
+						<h2 class="card-title text-white">{tutorial.title}</h2>
+						<p class="text-white">{tutorial.description}</p>
+						<a href={tutorial.url} class="btn btn-primary">Try it out!</a>
+					</div>
+				</div>
+			{/each}
+			<!-- More tutorials -->
 		</div>
-	</div>
-	<h2 class="text-3xl">Large Language Models + ONNX Runtime</h2>
-	<p class="pb-4">
-		ONNX Runtime supports many popular large language model (LLM) families in the Hugging Face Model
-		Hub. These, along with thousands of other models, are easily convertible to ONNX using the
-		Optimum API.
-	</p>
-	<div class="">
-		<a href="https://huggingface.co/models?other=llama" class="btn btn-primary">LLaMA →</a>
-		<a href="https://huggingface.co/models?other=gpt_neo" class="btn btn-primary">GPT Neo →</a>
-		<a href="https://huggingface.co/models?other=bloom" class="btn btn-primary">BLOOM →</a>
-		<a href="https://huggingface.co/models?other=opt" class="btn btn-primary">OPT →</a>
-		<a href="https://huggingface.co/models?other=gpt-j" class="btn btn-primary">GPT-J →</a>
-		<a href="https://huggingface.co/models?other=flan-t5" class="btn btn-primary">FLAN-T5 →</a>
+		<a
+			href="https://github.com/microsoft/onnxruntime-inference-examples"
+			class="btn btn-secondary btn-block mt-auto">See more demos!</a
+		>
 	</div>
 </div>
diff --git a/src/routes/generative-ai/aibrain.webp b/src/routes/generative-ai/aibrain.webp
new file mode 100644
index 0000000000000..03107b21fa52b
Binary files /dev/null and b/src/routes/generative-ai/aibrain.webp differ
diff --git a/src/routes/generative-ai/browser.png b/src/routes/generative-ai/browser.png
new file mode 100644
index 0000000000000..6328326455549
Binary files /dev/null and b/src/routes/generative-ai/browser.png differ
diff --git a/src/routes/generative-ai/coffee.png b/src/routes/generative-ai/coffee.png
new file mode 100644
index 0000000000000..be588435b9c1e
Binary files /dev/null and b/src/routes/generative-ai/coffee.png differ
diff --git a/src/routes/generative-ai/desktop.png b/src/routes/generative-ai/desktop.png
new file mode 100644
index 0000000000000..0faa3d85cb633
Binary files /dev/null and b/src/routes/generative-ai/desktop.png differ
diff --git a/src/routes/generative-ai/mobile.png b/src/routes/generative-ai/mobile.png
new file mode 100644
index 0000000000000..a1518b5415fdd
Binary files /dev/null and b/src/routes/generative-ai/mobile.png differ
diff --git a/src/routes/generative-ai/vision_UI.png b/src/routes/generative-ai/vision_UI.png
new file mode 100644
index 0000000000000..93ed7cdbde6fb
Binary files /dev/null and b/src/routes/generative-ai/vision_UI.png differ
diff --git a/src/routes/generative-ai/whisper.png b/src/routes/generative-ai/whisper.png
new file mode 100644
index 0000000000000..51a02d7dcd6db
Binary files /dev/null and b/src/routes/generative-ai/whisper.png differ
diff --git a/src/routes/getting-started/+page.svelte b/src/routes/getting-started/+page.svelte
index 3fde8aaab57e5..a8972e5e8bcee 100644
--- a/src/routes/getting-started/+page.svelte
+++ b/src/routes/getting-started/+page.svelte
@@ -34,7 +34,7 @@
 	<p class="pt-4">
 		For more in-depth installation instructions, check out the <a
 			href="https://onnxruntime.ai/docs/tutorials/"
-			class="text-blue-700">ONNX Runtime documentation</a
+			class="dark:text-blue-300 text-blue-800 underline">ONNX Runtime documentation</a
 		>.
 	</p>
 </div>
@@ -45,9 +45,9 @@
 			If you are interested in joining the ONNX Runtime open source community, you might want to join
 			us on GitHub where you can interact with other users and developers, participate in<a
 				href="https://github.com/microsoft/onnxruntime/discussions"
-				class="text-blue-700">discussions</a
+				class="dark:text-blue-300 text-blue-800 underline">discussions</a
 			>, and get help with any
-			<a href="https://github.com/microsoft/onnxruntime/issues" class="text-blue-700">issues</a> you
+			<a href="https://github.com/microsoft/onnxruntime/issues" class="dark:text-blue-300 text-blue-800 underline">issues</a> you
 			encounter. You can also contribute to the project by reporting bugs, suggesting features, or
 			submitting pull requests.
 			<div class="py-4">
diff --git a/src/routes/getting-started/table.svelte b/src/routes/getting-started/table.svelte
index b47ec1fa21fc6..e3cdd46ccc2e1 100644
--- a/src/routes/getting-started/table.svelte
+++ b/src/routes/getting-started/table.svelte
@@ -20,7 +20,7 @@
 		'QNN',
 		'Tensor RT',
 		'ACL (Preview)',
-		'ArmNN (Preview)',
+		'Arm NN (Preview)',
 		'Azure (Preview)',
 		'CANN (Preview)',
 		'Rockchip NPU (Preview)',
diff --git a/src/routes/huggingface/+page.svelte b/src/routes/huggingface/+page.svelte
index 94eb27af81b4d..cc7fd1108b905 100644
--- a/src/routes/huggingface/+page.svelte
+++ b/src/routes/huggingface/+page.svelte
@@ -1,4 +1,4 @@
-<script lang="ts">
+scenario<script lang="ts">
 	import LandingHero from '../components/landing-hero.svelte';
 	import ImagesHf1 from '../../images/undraw/image_HF1.svelte';
 	import ImageHf2 from '../../images/undraw/image_HF2.svelte';
@@ -81,28 +81,28 @@
 			<p class="pb-4">
 				The top 30 most popular model architectures on Hugging Face are all supported by ONNX
 				Runtime, and over 80 Hugging Face model architectures in total boast ORT support. This list
-				includes <a href="https://huggingface.co/models?other=bert" class="text-blue-700">BERT</a>,
-				<a href="https://huggingface.co/models?other=gpt2" class="text-blue-700">GPT2</a>,
-				<a href="https://huggingface.co/models?other=t5" class="text-blue-700">T5</a>,
-				<a href="https://huggingface.co/models?other=stable-diffusion" class="text-blue-700"
+				includes <a href="https://huggingface.co/models?other=bert" class="dark:text-blue-300 text-blue-800 underline">BERT</a>,
+				<a href="https://huggingface.co/models?other=gpt2" class="dark:text-blue-300 text-blue-800 underline">GPT2</a>,
+				<a href="https://huggingface.co/models?other=t5" class="dark:text-blue-300 text-blue-800 underline">T5</a>,
+				<a href="https://huggingface.co/models?other=stable-diffusion" class="dark:text-blue-300 text-blue-800 underline"
 					>Stable Diffusion</a
 				>,
-				<a href="https://huggingface.co/models?other=whisper" class="text-blue-700">Whisper</a>, and
+				<a href="https://huggingface.co/models?other=whisper" class="dark:text-blue-300 text-blue-800 underline">Whisper</a>, and
 				many more.
 			</p>
 			<p class="pb-4">
 				ONNX models can be found directly from the Hugging Face Model Hub in its <a
 					href="https://huggingface.co/models?library=onnx"
-					class="text-blue-700">ONNX model library</a
+					class="dark:text-blue-300 text-blue-800 underline">ONNX model library</a
 				>.
 			</p>
 			<p class="pb-4">
 				Hugging Face also provides ONNX support for a variety of other models not listed in the ONNX
 				model library. With <a
 					href="https://huggingface.co/docs/optimum/exporters/onnx/overview"
-					class="text-blue-700">Hugging Face Optimum</a
+					class="dark:text-blue-300 text-blue-800 underline">Hugging Face Optimum</a
 				>, you can easily convert pretrained models to ONNX, and
-				<a href="https://huggingface.co/docs/transformers.js/index" class="text-blue-700"
+				<a href="https://huggingface.co/docs/transformers.js/index" class="dark:text-blue-300 text-blue-800 underline"
 					>Transformers.js</a
 				> lets you run Hugging Face Transformers directly from your browser!
 			</p>
@@ -119,16 +119,16 @@
 				ONNX Runtime also supports many increasingly popular large language model (LLM)
 				architectures, including <a
 					href="https://huggingface.co/models?other=llama"
-					class="text-blue-700">LLaMA</a
+					class="dark:text-blue-300 text-blue-800 underline">LLaMA</a
 				>,
-				<a href="https://huggingface.co/models?other=gpt_neo" class="text-blue-700">GPT Neo</a>,
-				<a href="https://huggingface.co/models?other=bloom" class="text-blue-700">BLOOM</a>, and
+				<a href="https://huggingface.co/models?other=gpt_neo" class="dark:text-blue-300 text-blue-800 underline">GPT Neo</a>,
+				<a href="https://huggingface.co/models?other=bloom" class="dark:text-blue-300 text-blue-800 underline">BLOOM</a>, and
 				many more.
 			</p>
 			<p>
 				Hugging Face also provides an <a
 					href="https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard"
-					class="text-blue-700">Open LLM Leaderboard</a
+					class="dark:text-blue-300 text-blue-800 underline">Open LLM Leaderboard</a
 				> with more detailed tracking and evaluation of recently releases LLMs from the community.
 			</p>
 		</div>
@@ -149,7 +149,7 @@
 				and designs responsible AI solutions.
 			</p>
 			<p>
-				<a href="https://ml.azure.com/" class="text-blue-700">Azure Machine Learning</a> publishes a
+				<a href="https://ml.azure.com/" class="dark:text-blue-300 text-blue-800 underline">Azure Machine Learning</a> publishes a
 				curated model list that is updated regularly and includes the most popular models. You can run
 				the vast majority of the models on the curated list with ONNX Runtime, using HuggingFace Optimum.
 			</p>
@@ -166,12 +166,12 @@
 		<div>
 			<h1 class="text-3xl pb-4">Transformers.js + ONNX Runtime Web</h1>
 			<p class="pb-4">
-				<a href="https://huggingface.co/docs/transformers.js/index" class="text-blue-700"
+				<a href="https://huggingface.co/docs/transformers.js/index" class="dark:text-blue-300 text-blue-800 underline"
 					>Transformers.js</a
 				>
 				is an amazing tool to run transformers on the web, designed to be functionally equivalent to
 				Hugging Face’s
-				<a href="https://github.com/huggingface/transformers" class="text-blue-700">transformers</a>
+				<a href="https://github.com/huggingface/transformers" class="dark:text-blue-300 text-blue-800 underline">transformers</a>
 				python library.
 			</p>
 			<p class="pb-4">
diff --git a/src/routes/roadmap/+page.svelte b/src/routes/roadmap/+page.svelte
new file mode 100644
index 0000000000000..bb3c97699fc7b
--- /dev/null
+++ b/src/routes/roadmap/+page.svelte
@@ -0,0 +1,312 @@
+<div class="container mx-auto px-8">
+	<h1 class="text-3xl">ONNX Runtime Release Roadmap</h1>
+	<p>
+		ONNX Runtime is released on a quarterly basis. Patch releases are published between major
+		releases as necessary.
+	</p>
+	<div class="flex justify-center my-4">
+		<div class="stats stats-vertical md:stats-horizontal shadow rounded-sm">
+			<div class="stat">
+				<div class="stat-figure">
+					<svg
+						class="stroke-success"
+						xmlns="http://www.w3.org/2000/svg"
+						width="32"
+						height="32"
+						viewBox="0 0 24 24"
+						fill="none"
+						stroke="#000000"
+						stroke-width="2"
+						stroke-linecap="round"
+						stroke-linejoin="round"><polyline points="20 6 9 17 4 12" /></svg
+					>
+				</div>
+				<div class="stat-title">Previous release</div>
+				<div class="stat-value text-success">1.19.2</div>
+				<div class="stat-desc">Release date: 9/4/2024</div>
+			</div>
+
+			<div class="stat">
+				<div class="stat-figure text-secondary">
+					<svg
+						xmlns="http://www.w3.org/2000/svg"
+						width="24"
+						height="24"
+						class="stroke-warning"
+						viewBox="0 0 24 24"
+						fill="none"
+						stroke-width="2"
+						stroke-linecap="round"
+						stroke-linejoin="round"
+						><circle cx="12" cy="12" r="3" /><path
+							d="M19.4 15a1.65 1.65 0 0 0 .33 1.82l.06.06a2 2 0 0 1 0 2.83 2 2 0 0 1-2.83 0l-.06-.06a1.65 1.65 0 0 0-1.82-.33 1.65 1.65 0 0 0-1 1.51V21a2 2 0 0 1-2 2 2 2 0 0 1-2-2v-.09A1.65 1.65 0 0 0 9 19.4a1.65 1.65 0 0 0-1.82.33l-.06.06a2 2 0 0 1-2.83 0 2 2 0 0 1 0-2.83l.06-.06a1.65 1.65 0 0 0 .33-1.82 1.65 1.65 0 0 0-1.51-1H3a2 2 0 0 1-2-2 2 2 0 0 1 2-2h.09A1.65 1.65 0 0 0 4.6 9a1.65 1.65 0 0 0-.33-1.82l-.06-.06a2 2 0 0 1 0-2.83 2 2 0 0 1 2.83 0l.06.06a1.65 1.65 0 0 0 1.82.33H9a1.65 1.65 0 0 0 1-1.51V3a2 2 0 0 1 2-2 2 2 0 0 1 2 2v.09a1.65 1.65 0 0 0 1 1.51 1.65 1.65 0 0 0 1.82-.33l.06-.06a2 2 0 0 1 2.83 0 2 2 0 0 1 0 2.83l-.06.06a1.65 1.65 0 0 0-.33 1.82V9a1.65 1.65 0 0 0 1.51 1H21a2 2 0 0 1 2 2 2 2 0 0 1-2 2h-.09a1.65 1.65 0 0 0-1.51 1z"
+						/></svg
+					>
+				</div>
+				<div class="font-bold underline">In-Progress Release</div>
+				<div class="stat-value text-warning">1.20</div>
+				<div class="stat-desc">Release date: 10/30/2024</div>
+			</div>
+
+			<div class="stat">
+				<div class="stat-figure text-primary">
+					<svg
+						xmlns="http://www.w3.org/2000/svg"
+						fill="none"
+						viewBox="0 0 24 24"
+						class="stroke-primary"
+						width="28"
+						height="28"
+					>
+						<path
+							stroke-linecap="round"
+							stroke-linejoin="round"
+							stroke-width="2"
+							d="M6 2h12M6 22h12M8 2v6l4 4-4 4v6M16 2v6l-4 4 4 4v6"
+						/>
+					</svg>
+				</div>
+				<div class="stat-title">Next release</div>
+				<div class="stat-value text-primary">1.21</div>
+				<div class="stat-desc">Release date: Feb. 2025</div>
+			</div>
+		</div>
+	</div>
+	<h2 class="text-xl font-bold mt-2">Announcements</h2>
+	<ul class="list-disc ml-8">
+		<li><strong>All ONNX Runtime Training packages have been deprecated.</strong> ORT 1.19.2 was the last
+		release for which onnxruntime-training (PyPI), onnxruntime-training-cpu (PyPI), Microsoft.ML.OnnxRuntime.Training
+		(Nuget), onnxruntime-training-c (CocoaPods), onnxruntime-training-objc (CocoaPods), and onnxruntime-training-android
+		(Maven Central) were published.</li>
+		<li><strong>ONNX Runtime packages will stop supporting Python 3.8 and Python 3.9.</strong> This decision aligns with 
+		NumPy Python version support. To continue using ORT with Python 3.8 and Python 3.9, you can use ORT 1.19.2 and earlier.</li>
+	</ul>
+	<h2 class="text-xl font-bold mt-2">New Packages</h2>
+	<p class="font-thin">We are planning to start releasing the following packages:</p>
+	<ul class="list-disc ml-8">
+		<li>Maven package with Android support for QNN EP</li>
+		<li>CocoaPods package with Mac / iOS support for ORT GenAI</li>
+	</ul>
+
+	<h2 class="text-xl font-bold mt-2">Versioning Updates</h2>
+	<p class="font-thin">
+		We are planning to upgrade ONNX Runtime support for the following (where the first value is the
+		highest version previously supported and the second value is the version support that will be
+		added in ORT 1.20):
+	</p>
+	<ul class="list-disc ml-8">
+		<li>ONNX 1.16.1 --> 1.17.0</li>
+		<li>TensorRT 10.2 --> 10.4</li>
+		<li>DirectML 1.15.1 --> 1.15.2</li>
+	</ul>
+
+	<h2 class="text-xl font-bold mt-2">Major Updates</h2>
+	<p class="font-thin">
+		In addition to various bug fixes and performance improvements, ORT 1.20 will include the
+		following major updates:
+	</p>
+	<ul class="list-disc ml-8">
+		<li>Add MultiLoRA support.</li>
+		<li>Improve CPU FP16 and INT4 performance.</li>
+		<li>
+			Increase GenAI API model support, including Whisper, Phi-3.5-vision multi-frame, and more.
+		</li>
+		<li>Publish Phi-3.5 ONNX model variants to Hugging Face.</li>
+		<li>
+			Expand mobile support to include GPU EP and FP16 support for CoreML EP and XNNPACK kernels.
+		</li>
+		<li>Add Apple support for AI Toolkit for VS Code.</li>
+	</ul>
+
+	<h2 class="text-xl font-bold mt-2">Feature Requests</h2>
+	<p class="font-thin">
+		To request new ONNX Runtime features to be included in a future release, please submit a feature
+		request through <a
+			href="https://github.com/microsoft/onnxruntime/issues/new?assignees=&labels=feature+request&projects=&template=07-feature_request.yml&title=%5BFeature+Request%5D+"
+			class="text-blue-600 underline">GitHub Issues</a
+		>
+		or through
+		<a
+			href="https://github.com/microsoft/onnxruntime/discussions/new?category=ideas-feature-requests"
+			class="text-blue-600 underline">GitHub Discussions</a
+		>.
+	</p>
+	<p class="font-thin">To ensure that your request is addressed as quickly as possible, please:</p>
+	<ul class="list-disc ml-8">
+		<li>Include a detailed title.</li>
+		<li>
+			Provide as much detail as possible in the body of your request (e.g., use case for the
+			feature, the platform(s) or EP(s) this feature is needed for, etc.).
+		</li>
+		<li>
+			Apply a label corresponding to the appropriate ONNX Runtime area (e.g., "platform:mobile",
+			"platform:web", "ep:CUDA", etc.) if you know it.
+		</li>
+	</ul>
+	<p class="font-thin">
+		<em>Note: All timelines and features listed on this page are subject to change.</em>
+	</p>
+	<div class="divider"></div>
+	<h2 class="text-xl font-bold mt-2">ONNX Runtime 1.20</h2>
+	<p class="font-thin">
+		<strong>Tentative release date:</strong> 10/30/2024
+	</p>
+
+	<div class="join join-vertical w-full p-2">
+		<!-- Announcements Section -->
+		<div class="collapse collapse-arrow join-item border-base-300 border">
+			<input type="checkbox" name="announcements" />
+			<div class="collapse-title text-xl font-bold">Announcements</div>
+			<div class="collapse-content">
+				<p class="font-thin">
+					<strong>All ONNX Runtime Training packages have been deprecated.</strong> ORT 1.19.2 was the
+					last release for which onnxruntime-training (PyPI), onnxruntime-training-cpu (PyPI), Microsoft.ML.OnnxRuntime.Training
+					(Nuget), onnxruntime-training-c (CocoaPods), onnxruntime-training-objc (CocoaPods), and onnxruntime-training-android
+					(Maven Central) were published. ONNX Runtime packages will stop supporting Python 3.8 and Python
+					3.9. This decision aligns with NumPy Python version support. To continue using ORT with Python
+					3.8 and Python 3.9, you can use ORT 1.19.2 and earlier.
+				</p>
+			</div>
+		</div>
+
+		<!-- Build System & Packages Section -->
+		<div class="collapse collapse-arrow join-item border-base-300 border">
+			<input type="checkbox" name="build" />
+			<div class="collapse-title text-xl font-bold">Build System & Packages</div>
+			<div class="collapse-content">
+				<ul class="list-disc ml-8">
+					<li>Upgrade ONNX support from 1.16.1 to 1.17.0.</li>
+					<li>Add Python 3.12 support for Windows ARM64.</li>
+					<li>Add vcpkg support.</li>
+					<li>
+						Digitally sign DLLs in Maven build.
+					</li>
+				</ul>
+			</div>
+		</div>
+
+		<!-- Core Section -->
+		<div class="collapse collapse-arrow join-item border-base-300 border">
+			<input type="checkbox" name="core" />
+			<div class="collapse-title text-xl font-bold">Core</div>
+			<div class="collapse-content">
+				<ul class="list-disc ml-8">
+					<li>Add MultiLoRA support.</li>
+					<li>
+						Improve ThreadPool to spend less time busy waiting.
+					</li>
+					<li>Improve memory utilization, particularly related to external weights.</li>
+					<li>Improve partitioning.</li>
+				</ul>
+			</div>
+		</div>
+
+		<!-- Performance Section -->
+		<div class="collapse collapse-arrow join-item border-base-300 border">
+			<input type="checkbox" name="performance" />
+			<div class="collapse-title text-xl font-bold">Performance</div>
+			<div class="collapse-content">
+				<ul class="list-disc ml-8">
+					<li>Add FP16 SLM model support on CPU.</li>
+					<li>Add INT4 quantized embedding support on CPU and CUDA.</li>
+				</ul>
+			</div>
+		</div>
+
+		<!-- EPs Section -->
+		<div class="collapse collapse-arrow join-item border-base-300 border">
+			<input type="checkbox" name="eps" />
+			<div class="collapse-title text-xl font-bold">EPs</div>
+			<div class="collapse-content">
+				<h3 class="text-lg font-semibold">TensorRT</h3>
+				<ul class="list-disc ml-8">
+					<li>Upgrade TensorRT support from 10.2 to 10.4.</li>
+					<li>Enable DDS, including performance fixes for NMS.</li>
+				</ul>
+				<h3 class="text-lg font-semibold">QNN</h3>
+				<ul class="list-disc ml-8">
+					<li>Add HTP shared weights context binary.</li>
+					<li>Add runtime support for HTP shared weights in multiple ORT sessions.</li>
+					<li>Add efficient mode support.</li>
+				</ul>
+				<h3 class="text-lg font-semibold">OpenVINO</h3>
+				<ul class="list-disc ml-8">
+					<li>Add context generation memory optimizations.</li>
+					<li>Add efficient mode support.</li>
+				</ul>
+				<h3 class="text-lg font-semibold">DirectML</h3>
+				<ul class="list-disc ml-8">
+					<li>Upgrade DirectML support from 1.15.1 to 1.15.2.</li>
+				</ul>
+			</div>
+		</div>
+
+		<!-- Mobile Section -->
+		<div class="collapse collapse-arrow join-item border-base-300 border">
+			<input type="checkbox" name="mobile" />
+			<div class="collapse-title text-xl font-bold">Mobile</div>
+			<div class="collapse-content">
+				<ul class="list-disc ml-8">
+					<li>
+						Add Android QNN support, including a pre-build package, performance improvements, and
+						Phi-3 model support.
+					</li>
+					<li>Add GPU EP support for ORT Mobile.</li>
+					<li>Add FP16 support for CoreML EP and XNNPACK kernels.</li>
+				</ul>
+			</div>
+		</div>
+
+		<!-- Web Section -->
+		<div class="collapse collapse-arrow join-item border-base-300 border">
+			<input type="checkbox" name="web" />
+			<div class="collapse-title text-xl font-bold">Web</div>
+			<div class="collapse-content">
+				<ul class="list-disc ml-8">
+					<li>Add quantized embedding support.</li>
+					<li>
+						Add on-demand weight loading support, which offloads wasm32 heap and enables
+						8B-parameter LLM models.
+					</li>
+					<li>
+						Add support for wasm64 through a custom build (will not be included in released
+						packages).
+					</li>
+					<li>Add GQA support.</li>
+					<li>Improve performance for integrated Intel GPU.</li>
+					<li>Add support for Opset 21, including Reshape, Shape, and Gelu.</li>
+				</ul>
+			</div>
+		</div>
+
+		<!-- GenAI Section -->
+		<div class="collapse collapse-arrow join-item border-base-300 border">
+			<input type="checkbox" name="genai" />
+			<div class="collapse-title text-xl font-bold">GenAI</div>
+			<div class="collapse-content">
+				<ul class="list-disc ml-8">
+					<li>Add continuous decoding support, including chat mode and system prompt caching.</li>
+					<li>Introduce MultiLoRA API.</li>
+					<li>Add Whisper model support.</li>
+					<li>Add Phi-3.5-vision multi-frame model support.</li>
+					<li>Add Phi-3.5 and Llama-3.1 model support on Qualcomm NPU.</li>
+					<li>Introduce packages for Mac/iOS.</li>
+				</ul>
+			</div>
+		</div>
+
+		<!-- Extensions Section -->
+		<div class="collapse collapse-arrow join-item border-base-300 border">
+			<input type="checkbox" name="extensions" />
+			<div class="collapse-title text-xl font-bold">Extensions</div>
+			<div class="collapse-content">
+				<ul class="list-disc ml-8">
+					<li>Improve performance profiling and optimize tokenization.</li>
+					<li>Increase multi-modal model support, including more kernel attributes.</li>
+					<li>Add Unigram tokenization model support.</li>
+					<li>Remove OpenCV dependency from C API build.</li>
+				</ul>
+			</div>
+		</div>
+	</div>
+</div>
diff --git a/src/routes/testimonials/+page.svelte b/src/routes/testimonials/+page.svelte
index ca71962dda7d6..e008af091f23d 100644
--- a/src/routes/testimonials/+page.svelte
+++ b/src/routes/testimonials/+page.svelte
@@ -1,39 +1,41 @@
 <script>
 	import TestimonialCard from './testimonial-card.svelte';
 	import ImageTestimonials from '../../images/undraw/image_testimonials.svelte';
-	import adobelogo from '../../images/logos/adobe-logo.png';
-	import amdlogo from '../../images/logos/amd-logo.png';
-	import antgrouplogo from '../../images/logos/antgroup-logo.png';
+	import adobeLogo from '../../images/logos/adobe-logo.png';
+	import amdLogo from '../../images/logos/amd-logo.png';
+	import antgroupLogo from '../../images/logos/antgroup-logo.png';
 	import algoriddimLogo from '../../images/logos/algoriddim-logo.png';
-	import atlaslogo from '../../images/logos/ATLAS-logo.png';
-	import bazaarvoicelogo from '../../images/logos/bazaarvoice-logo.png';
-	import camologo from '../../images/logos/camo-logo.png';
-	import cephablelogo from '../../images/logos/cephable-logo.png';
-	import clearbladelogo from '../../images/logos/clearblade-logo.png';
-	import deezerlogo from '../../images/logos/deezer-logo.png';
-	import intelligenzaeticablogo from '../../images/logos/intelligenza-etica-logo.png';
-	import huggingfacelogo from '../../images/logos/huggingface-logo.png';
-	import hypefactorslogo from '../../images/logos/hypefactors-logo.png';
-	import infarmlogo from '../../images/logos/infarm-logo.png';
-	import intellogo from '../../images/logos/intel-logo.png';
-	import navitairelogo from '../../images/logos/navitaire-amadeus-logo.png';
-	import nvidialogo from '../../images/logos/nvidia.png';
-	import opennlplogo from '../../images/logos/opennlp-logo.png';
-	import oraclelogo from '../../images/logos/oracle-logo.png';
-	import peakspeedlogo from '../../images/logos/PeakSpeed_logo.png';
-	import pieceslogo from '../../images/logos/pieces-logo.png';
-	import ptwlogo from '../../images/logos/ptw-logo.png';
-	import redislogo from '../../images/logos/redis-logo.png';
-	import rockchiplogo from '../../images/logos/Rockchip-logo.png';
-	import samteclogo from '../../images/logos/samtec-logo.png';
-	import saslogo from '../../images/logos/sas-logo.png';
-	import teradatalogo from '../../images/logos/teradata-logo.png';
-	import topazlabslogo from '../../images/logos/topazlabs-logo.png';
-	import unrealenginelogo from '../../images/logos/ue-logo.png';
-	import usdalogo from '../../images/logos/usda-logo.png';
-	import vespalogo from '../../images/logos/vespa-logo.png';
-	import writerlogo from '../../images/logos/writer-logo.png';
-	import xilinxlogo from '../../images/logos/xilinx-logo.png';
+	import atlasLogo from '../../images/logos/ATLAS-logo.png';
+	import autodeskLogo from '../../images/logos/autodesk-logo.png';
+	import bazaarvoiceLogo from '../../images/logos/bazaarvoice-logo.png';
+	import camoLogo from '../../images/logos/camo-logo.png';
+	import cephableLogo from '../../images/logos/cephable-logo.png';
+	import clearbladeLogo from '../../images/logos/clearblade-logo.png';
+	import deezerLogo from '../../images/logos/deezer-logo.png';
+	import goodnotesLogo from '../../images/logos/goodnotes-logo.png';
+	import huggingfaceLogo from '../../images/logos/huggingface-logo.png';
+	import hypefactorsLogo from '../../images/logos/hypefactors-logo.png';
+	import infarmLogo from '../../images/logos/infarm-logo.png';
+	import intelLogo from '../../images/logos/intel-logo.png';
+	import intelligenzaeticaLogo from '../../images/logos/intelligenza-etica-logo.png';
+	import navitaireLogo from '../../images/logos/navitaire-amadeus-logo.png';
+	import nvidiaLogo from '../../images/logos/nvidia.png';
+	import opennlpLogo from '../../images/logos/opennlp-logo.png';
+	import oracleLogo from '../../images/logos/oracle-logo.png';
+	import peakspeedLogo from '../../images/logos/PeakSpeed_logo.png';
+	import piecesLogo from '../../images/logos/pieces-logo.png';
+	import ptwLogo from '../../images/logos/ptw-logo.png';
+	import redisLogo from '../../images/logos/redis-logo.png';
+	import rockchipLogo from '../../images/logos/Rockchip-logo.png';
+	import samtecLogo from '../../images/logos/samtec-logo.png';
+	import sasLogo from '../../images/logos/sas-logo.png';
+	import teradataLogo from '../../images/logos/teradata-logo.png';
+	import topazlabsLogo from '../../images/logos/topazlabs-logo.png';
+	import unrealengineLogo from '../../images/logos/ue-logo.png';
+	import usdaLogo from '../../images/logos/usda-logo.png';
+	import vespaLogo from '../../images/logos/vespa-logo.png';
+	import writerLogo from '../../images/logos/writer-logo.png';
+	import xilinxLogo from '../../images/logos/xilinx-logo.png';
 
 	const quotes = [
 		{
@@ -41,7 +43,7 @@
 			quote:
 				'With ONNX Runtime, Adobe Target got flexibility and standardization in one package: flexibility for our customers to train ML models in the frameworks of their choice, and standardization to robustly deploy those models at scale for fast inference, to deliver true, real-time personalized experiences.',
 			author: 'Georgiana Copil, Senior Computer Scientist, Adobe',
-			imgsrc: adobelogo,
+			imgsrc: adobeLogo,
 			imgalt: 'Adobe logo'
 		},
 		{
@@ -50,7 +52,7 @@
 				"The ONNX Runtime integration with AMD's ROCm open software ecosystem helps our customers leverage the power of AMD Instinct GPUs to accelerate and scale their large machine learning models with flexibility across multiple frameworks.",
 			author:
 				'Andrew Dieckmann, Corporate Vice President and General Manager, AMD Data Center GPU & Accelerated Processing',
-			imgsrc: amdlogo,
+			imgsrc: amdLogo,
 			imgalt: 'AMD logo'
 		},
 		{
@@ -58,7 +60,7 @@
 			quote:
 				'Using ONNX Runtime, we have improved the inference performance of many computer vision (CV) and natural language processing (NLP) models trained by multiple deep learning frameworks. These are part of the Alipay production system. We plan to use ONNX Runtime as the high-performance inference backend for more deep learning models in broad applications, such as click-through rate prediction and cross-modal prediction.',
 			author: 'Xiaoming Zhang, Head of Inference Team, Ant Group',
-			imgsrc: antgrouplogo,
+			imgsrc: antgroupLogo,
 			imgalt: 'Ant Group logo'
 		},
 		{
@@ -74,15 +76,23 @@
 			quote:
 				'At CERN in the ATLAS experiment, we have integrated the C++ API of ONNX Runtime into our software framework: Athena. We are currently performing inferences using ONNX models especially in the reconstruction of electrons and muons. We are benefiting from its C++ compatibility, platform*-to-ONNX converters (* Keras, TensorFlow, PyTorch, etc) and its thread safety.',
 			author: 'ATLAS Experiment team, CERN (European Organization for Nuclear Research)',
-			imgsrc: atlaslogo,
+			imgsrc: atlasLogo,
 			imgalt: 'Atlas Experiment logo'
 		},
+		{
+			title: 'Autodesk',
+			quote:
+				"Autodesk Flame's use of ONNX Runtime offers major advantages with cross-platform compatibility and performance, providing artists the flexibility and interactivity they expect. This allows them to make use of machine learning models directly in Flame's creative toolset, augmenting the quality of their work and increasing the software's expandability. Microsoft's ONNX Runtime team has provided expert guidance and support throughout the development process, enabling us to put AI-powered creative tools in the hands of artists seeking high-quality VFX and finishing solutions.",
+			author: 'Louis Martin, Sr. Manager of Software Development for Autodesk Flame',
+			imgsrc: autodeskLogo,
+			imgalt: 'Autodesk logo'
+		},
 		{
 			title: 'Bazaarvoice',
 			quote:
 				'Building and deploying AI solutions to the cloud at scale is complex. With massive datasets and performance considerations, finding a harmonious balance is crucial. ONNX Runtime provided us with the flexibility to package a scikit-learn model built with Python, deploy it serverlessly to a Node.js environment, and run it in the cloud with impressive performance.',
 			author: 'Matthew Leyburn, Software Engineer, Bazaarvoice',
-			imgsrc: bazaarvoicelogo,
+			imgsrc: bazaarvoiceLogo,
 			imgalt: 'Bazaarvoice logo'
 		},
 		{
@@ -90,7 +100,7 @@
 			quote:
 				"ONNX Runtime enables Camo Studio to deliver features like background segmentation and feature detection with speed and accuracy. It seamlessly integrated with our existing models and lets us target any processor, including the latest NPUs, saving us valuable development time and allowing us to bring innovative features to all our users. We recommend ONNX Runtime to any developer looking to streamline model deployment and unlock the full potential of their applications.",
 			author: 'Aidan Fitzpatrick, Founder & CEO, Reincubate',
-			imgsrc: camologo,
+			imgsrc: camoLogo,
 			imgalt: 'Camo logo'
 		},
 		{
@@ -98,7 +108,7 @@
 			quote:
 				"The ONNX Runtime allows us to simultaneously target CPU, GPU and NPU enabled devices. Converting a model to NPU, using ONNX Runtime and AI Hub reduced our engineering effort from 30 days to 7 days. Given the current state of the art, that would likely be only 3 days today. This allows us to deliver cutting edge performance to our users, minimizing impact of AI/ML workloads when running other applications, and leaves more time to focus on feature work.",
 			author: 'Jon Campbell, Director of Engineering, Cephable',
-			imgsrc: cephablelogo,
+			imgsrc: cephableLogo,
 			imgalt: 'Cephable logo'
 		},
 		{
@@ -106,7 +116,7 @@
 			quote:
 				"ClearBlade's integration of ONNX Runtime with our Enterprise loT and Edge Platforms enables customers and partners to build Al models using any industry Al tool they want to use. Using this solution, our customers can use the ONNX Runtime Go language APIs to seamlessly deploy any model to run on equipment in remote locations or on the factory floor!",
 			author: 'Aaron Allsbrook, CTO & Founder, ClearBlade',
-			imgsrc: clearbladelogo,
+			imgsrc: clearbladeLogo,
 			imgalt: 'Clearblade logo'
 		},
 		{
@@ -114,55 +124,63 @@
 			quote:
 				"At Deezer, we use ONNX Runtime for machine learning powered features for music recommendations in our streaming service. ONNX Runtime's C API is easy to integrate with our software stack and enables us to run and deploy transformer models with great performance for real-time use cases.",
 			author: 'Mathieu Morlon, Software Engineer, Deezer',
-			imgsrc: deezerlogo,
+			imgsrc: deezerLogo,
 			imgalt: 'Deezer logo'
 		},
 		{
-			title: 'Intelligenza Etica',
+			title: 'Goodnotes',
 			quote:
-				"We integrate AI models in various markets and regulated industries using many stacks and frameworks, merging R&D and Ethics. With ONNX Runtime, we provide maximum performance and flexibility to use the customers' preferred technology, from cloud to embedded systems.",
-			author: 'Mauro Bennici, AI Architect and AI Ethicist, Intelligenza Etica',
-			imgsrc: intelligenzaeticablogo,
-			imgalt: 'Intelligenza Etica logo'
+			"Thanks to ONNX Runtime Web, Goodnotes has seamlessly implemented Scribble to Erase. The first Goodnotes AI feature for Android, Windows, and Web, delivering lightning-fast performance and an incredibly smooth user experience. It's a game-changer!",
+			author: 'Pedro Gómez, Senior Software Engineer, Goodnotes',
+			imgsrc: goodnotesLogo,
+			imgalt: 'Goodnotes logo'
 		},
 		{
 			title: 'Hugging Face',
 			quote:
-				'We use ONNX Runtime to easily deploy thousands of open-source state-of-the-art models in the Hugging Face model hub and accelerate private models for customers of the Accelerated Inference API on CPU and GPU.',
+			'We use ONNX Runtime to easily deploy thousands of open-source state-of-the-art models in the Hugging Face model hub and accelerate private models for customers of the Accelerated Inference API on CPU and GPU.',
 			author: 'Morgan Funtowicz, Machine Learning Engineer, Hugging Face',
-			imgsrc: huggingfacelogo,
+			imgsrc: huggingfaceLogo,
 			imgalt: 'Hugging Face logo'
 		},
 		{
 			title: 'Hypefactors',
 			quote:
-				'ONNX Runtime powers many of our Natural Language Processing (NLP) and Computer Vision (CV) models that crunch the global media landscape in real-time. It is our go-to framework for scaling our production workload, providing important features ranging from built-in quantization tools to easy GPU and VNNI acceleration.',
+			'ONNX Runtime powers many of our Natural Language Processing (NLP) and Computer Vision (CV) models that crunch the global media landscape in real-time. It is our go-to framework for scaling our production workload, providing important features ranging from built-in quantization tools to easy GPU and VNNI acceleration.',
 			author: 'Viet Yen Nguyen, CTO, Hypefactors',
-			imgsrc: hypefactorslogo,
+			imgsrc: hypefactorsLogo,
 			imgalt: 'Hypefactors logo'
 		},
 		{
 			title: 'InFarm',
 			quote:
-				'InFarm delivers machine-learning powered solutions for intelligent farming, running computer vision models on a variety of hardware, including on-premise GPU clusters, edge computing devices like NVIDIA Jetsons, and cloud-based CPU and GPU clusters. ONNX Runtime enables InFarm to standardise the model formats and outputs of models generated across multiple teams to simplify deployment while also providing the best performance on all hardware targets.',
+			'InFarm delivers machine-learning powered solutions for intelligent farming, running computer vision models on a variety of hardware, including on-premise GPU clusters, edge computing devices like NVIDIA Jetsons, and cloud-based CPU and GPU clusters. ONNX Runtime enables InFarm to standardise the model formats and outputs of models generated across multiple teams to simplify deployment while also providing the best performance on all hardware targets.',
 			author: 'Ashley Walker, Chief Information and Technology Officer, InFarm',
-			imgsrc: infarmlogo,
+			imgsrc: infarmLogo,
 			imgalt: 'InFarm logo'
 		},
 		{
 			title: 'Intel',
 			quote:
-				'We are excited to support ONNX Runtime on the Intel® Distribution of OpenVINO™. This accelerates machine learning inference across Intel hardware and gives developers the flexibility to choose the combination of Intel hardware that best meets their needs from CPU to VPU or FPGA.',
+			'We are excited to support ONNX Runtime on the Intel® Distribution of OpenVINO™. This accelerates machine learning inference across Intel hardware and gives developers the flexibility to choose the combination of Intel hardware that best meets their needs from CPU to VPU or FPGA.',
 			author: 'Jonathan Ballon, Vice President and General Manager, Intel Internet of Things Group',
-			imgsrc: intellogo,
+			imgsrc: intelLogo,
 			imgalt: 'Intel logo'
 		},
+		{
+			title: 'Intelligenza Etica',
+			quote:
+				"We integrate AI models in various markets and regulated industries using many stacks and frameworks, merging R&D and Ethics. With ONNX Runtime, we provide maximum performance and flexibility to use the customers' preferred technology, from cloud to embedded systems.",
+			author: 'Mauro Bennici, AI Architect and AI Ethicist, Intelligenza Etica',
+			imgsrc: intelligenzaeticaLogo,
+			imgalt: 'Intelligenza Etica logo'
+		},
 		{
 			title: 'Navitaire',
 			quote:
 				"With customers around the globe, we're seeing increased interest in deploying more effective models to power pricing solutions via ONNX Runtime. ONNX Runtime's performance has given us the confidence to use this solution with our customers with more extreme transaction volume requirements.",
 			author: 'Jason Coverston, Product Director, Navitaire',
-			imgsrc: navitairelogo,
+			imgsrc: navitaireLogo,
 			imgalt: 'Navitaire Amadeus logo'
 		},
 		{
@@ -171,7 +189,7 @@
 				"ONNX Runtime enables our customers to easily apply NVIDIA TensorRT's powerful optimizations to machine learning models, irrespective of the training framework, and deploy across NVIDIA GPUs and edge devices.",
 			author:
 				'Kari Ann Briski, Sr. Director, Accelerated Computing Software and AI Product, NVIDIA',
-			imgsrc: nvidialogo,
+			imgsrc: nvidiaLogo,
 			imgalt: 'NVIDIA logo'
 		},
 		{
@@ -180,7 +198,7 @@
 				'The integration of ONNX Runtime into Apache OpenNLP 2.0 enables easy use of state-of-the-art Natural Language Processing (NLP) models in the Java ecosystem. For libraries and applications already using OpenNLP, such as Apache Lucene and Apache Solr, using ONNX Runtime via OpenNLP provides exciting new possibilities.',
 			author:
 				'Jeff Zemerick, Search Relevance Engineer at OpenSource Connections and Chair of the Apache OpenNLP project',
-			imgsrc: opennlplogo,
+			imgsrc: opennlpLogo,
 			imgalt: 'Apache OpenNLP logo'
 		},
 		{
@@ -188,7 +206,7 @@
 			quote:
 				'The ONNX Runtime API for Java enables Java developers and Oracle customers to seamlessly consume and execute ONNX machine-learning models, while taking advantage of the expressive power, high performance, and scalability of Java.',
 			author: 'Stephen Green, Director of Machine Learning Research Group, Oracle',
-			imgsrc: oraclelogo,
+			imgsrc: oracleLogo,
 			imgalt: 'Oracle logo'
 		},
 		{
@@ -196,7 +214,7 @@
 			quote:
 				'Using a common model and code base, the ONNX Runtime allows Peakspeed to easily flip between platforms to help our customers choose the most cost-effective solution based on their infrastructure and requirements.',
 			author: 'Oscar Kramer, Chief Geospatial Scientist, Peakspeed',
-			imgsrc: peakspeedlogo,
+			imgsrc: peakspeedLogo,
 			imgalt: 'Peakspeed logo'
 		},
 		{
@@ -204,7 +222,7 @@
 			quote:
 				'ONNX Runtime provides us with a lightweight runtime that focuses on performance, yet allows our ML engineers to choose the best frameworks and models for the task at hand.',
 			author: 'Brian Lambert, Machine Learning Engineer, Pieces.app',
-			imgsrc: pieceslogo,
+			imgsrc: piecesLogo,
 			imgalt: 'Pieces logo'
 		},
 		{
@@ -212,7 +230,7 @@
 			quote:
 				'The mission of PTW is to guarantee radiation therapy safely. Bringing an AI model from research into the clinic can be a challenge, however. These are very different software and hardware environments. ONNX Runtime bridges the gap and allows us to choose the best possible tools for research and be sure deployment into any environment will just work.',
 			author: 'Jan Weidner, Research Software Engineer, PTW Dosimetry',
-			imgsrc: ptwlogo,
+			imgsrc: ptwLogo,
 			imgalt: 'PTW logo'
 		},
 		{
@@ -220,7 +238,7 @@
 			quote:
 				"ONNX Runtime underpins RedisAI's distinctive capability to run machine-learning and deep-learning model inference seamlessly inside of Redis. This integration allows data scientists to train models in their preferred ML framework (PyTorch, TensorFlow, etc), and serve those models from Redis for low-latency inference.",
 			author: 'Sam Partee, Principal Engineer, Applied AI, Redis',
-			imgsrc: redislogo,
+			imgsrc: redisLogo,
 			imgalt: 'Redis logo'
 		},
 		{
@@ -228,7 +246,7 @@
 			quote:
 				'With support for ONNX Runtime, our customers and developers can cross the boundaries of the model training framework, easily deploy ML models in Rockchip NPU powered devices.',
 			author: 'Feng Chen, Senior Vice President, Rockchip',
-			imgsrc: rockchiplogo,
+			imgsrc: rockchipLogo,
 			imgalt: 'Rockchip logo'
 		},
 		{
@@ -236,7 +254,7 @@
 			quote:
 				"We needed a runtime engine to handle the transition from data science land to a high-performance production runtime system. ONNX Runtime (ORT) simply ‘just worked'. Having no previous experience with ORT, I was able to easily convert my models, and had prototypes running inference in multiple languages within just a few hours. ORT will be my go-to runtime engine for the foreseeable future.",
 			author: 'Bill McCrary, Application Architect, Samtec',
-			imgsrc: samteclogo,
+			imgsrc: samtecLogo,
 			imgalt: 'Samtec logo'
 		},
 		{
@@ -244,7 +262,7 @@
 			quote:
 				'The unique combination of ONNX Runtime and SAS Event Stream Processing changes the game for developers and systems integrators by supporting flexible pipelines and enabling them to target multiple hardware platforms for the same AI models without bundling and packaging changes. This is crucial considering the additional build and test effort saved on an ongoing basis.',
 			author: 'Saurabh Mishra, Senior Manager, Product Management, Internet of Things, SAS',
-			imgsrc: saslogo,
+			imgsrc: sasLogo,
 			imgalt: 'SAS logo'
 		},
 		{
@@ -252,7 +270,7 @@
 			quote:
 				'Teradata provides a highly extensible framework that enables importation and inference of previously trained Machine Learning (ML) and Deep Learning (DL) models. ONNX Runtime enables us to expand the capabilities of Vantage Bring Your Own Model (BYOM) and gives data scientists more options for ML and DL models integration, inference and production deployment within Teradata Vantage ecosystem.',
 			author: 'Michael Riordan, Director, Vantage Data Science and Analytics Products, Teradata',
-			imgsrc: teradatalogo,
+			imgsrc: teradataLogo,
 			imgalt: 'Teradata logo'
 		},
 		{
@@ -260,7 +278,7 @@
 			quote:
 				"ONNX Runtime's simple C API with DirectML provider enabled Topaz Labs to add support for AMD GPUs and NVIDIA Tensor Cores in just a couple of days. Furthermore, our models load many times faster on GPU than any other frameworks. Even our larger models with about 100 million parameters load within seconds.",
 			author: 'Suraj Raghuraman, Head of AI Engine, Topaz Labs',
-			imgsrc: topazlabslogo,
+			imgsrc: topazlabsLogo,
 			imgalt: 'Topaz Labs logo'
 		},
 		{
@@ -268,7 +286,7 @@
 			quote:
 				"We selected ONNX Runtime as the backend of Unreal Engine's Neural Network Interface (NNI) plugin inference system because of its extensibility to support the platforms that Unreal Engine runs on, while enabling ML practitioners to develop ML models in the frameworks of their choice. NNI evaluates neural networks in real time in Unreal Engine and acts as the foundation for game developers to use and deploy ML models to solve many development challenges, including animation, ML-based AI, camera tracking, and more.",
 			author: 'Francisco Vicente Carrasco, Research Engineering Lead, Epic Games',
-			imgsrc: unrealenginelogo,
+			imgsrc: unrealengineLogo,
 			imgalt: 'Unreal Engine logo'
 		},
 		{
@@ -277,7 +295,7 @@
 				'At the USDA we use ONNX Runtime in GuideMaker, a program we developed to design pools of guide RNAs needed for large-scale gene editing experiments with CRISPR-Cas. ONNX allowed us to make an existing model more interoperable and ONNX Runtime speeds up predictions of guide RNA binding.',
 			author:
 				'Adam Rivers, Computational Biologist, United States Department of Agriculture, Agricultural Research Service',
-			imgsrc: usdalogo,
+			imgsrc: usdaLogo,
 			imgalt: 'USDA logo'
 		},
 		{
@@ -285,7 +303,7 @@
 			quote:
 				"ONNX Runtime has vastly increased Vespa.ai's capacity for evaluating large models, both in performance and model types we support.",
 			author: 'Lester Solbakken, Principal Engineer, Vespa.ai',
-			imgsrc: vespalogo,
+			imgsrc: vespaLogo,
 			imgalt: 'Vespa logo'
 		},
 		{
@@ -293,7 +311,7 @@
 			quote:
 				'ONNX Runtime has been very helpful to us at Writer in optimizing models for production. It lets us deploy more powerful models and still deliver results to our customers with the latency they expect.',
 			author: 'Dave Buchanan, Director of AI and NLP, Writer',
-			imgsrc: writerlogo,
+			imgsrc: writerLogo,
 			imgalt: 'Writer logo'
 		},
 		{
@@ -301,7 +319,7 @@
 			quote:
 				'Xilinx is excited that Microsoft has announced Vitis™ AI interoperability and runtime support for ONNX Runtime, enabling developers to deploy machine learning models for inference to FPGA IaaS such as Azure NP series VMs and Xilinx edge devices.',
 			author: 'Sudip Nag, Corporate Vice President, Software & AI Products, Xilinx',
-			imgsrc: xilinxlogo,
+			imgsrc: xilinxLogo,
 			imgalt: 'Xilinx logo'
 		}
 	];
diff --git a/src/routes/testimonials/testimonial-card.svelte b/src/routes/testimonials/testimonial-card.svelte
index d432149556c88..3f0be61aa0d5d 100644
--- a/src/routes/testimonials/testimonial-card.svelte
+++ b/src/routes/testimonials/testimonial-card.svelte
@@ -24,7 +24,7 @@
 <article
 	on:mouseenter={handleEnter}
 	on:mouseleave={handleLeave}
-	class="max-w-md mx-auto bg-blue-100 rounded-sm overflow-hidden md:max-w-2xl"
+  class="max-w-md mx-auto bg-blue-300 text-primary-content rounded-sm overflow-hidden md:max-w-2xl"
 	id={title}
 >
 	<div class="md:flex">
@@ -32,10 +32,10 @@
 			<img class="md:h-48 w-full hidden md:flex" src={imgsrc} alt={imgalt} />
 		</div>
 		<div class="p-8">
-			<p class="block mt-1 leading-tight font-bold text-neutral text-lg">{title}</p>
-			<p class="mt-2 text-neutral">{description}</p>
+			<p class="block mt-1 leading-tight font-bold text-lg">{title}</p>
+			<p class="mt-2">{description}</p>
 			<br />
-			<p class="text-blue-700 text-right">-{author}</p>
+			<p class="text-right">-{author}</p>
 		</div>
 	</div>
 </article>
diff --git a/src/routes/training/+page.svelte b/src/routes/training/+page.svelte
index a51093a9cb397..fbdd2e4fd9da4 100644
--- a/src/routes/training/+page.svelte
+++ b/src/routes/training/+page.svelte
@@ -48,8 +48,8 @@
 			<br />
 			<div class="bg-white w-100 md:w-1/2 p-4">
 				<code>
-					<span class="text-red-500">- model = build_model() # User's PyTorch model</span><br />
-					<span class="text-green-500">+ model = ORTModule(build_model())</span>
+					<span class="text-red-600">- model = build_model() # User's PyTorch model</span><br />
+					<span class="text-green-700">+ model = ORTModule(build_model())</span>
 				</code>
 			</div>
 			<br /><br />
@@ -87,12 +87,12 @@
 				<h2 class="card-title">Part of the PyTorch ecosystem</h2>
 				<p>
 					ONNX Runtime Training is available via the <a
-						class="text-blue-700"
+						class="dark:text-blue-300 text-blue-800 underline"
 						href="https://pytorch.org/ort/">torch-ort</a
 					>
 					package as part of the
 					<a
-						class="text-blue-700"
+						class="dark:text-blue-300 text-blue-800 underline"
 						href="https://learn.microsoft.com/en-us/azure/machine-learning/resource-azure-container-for-pytorch?view=azureml-api-2"
 						>Azure Container for PyTorch (ACPT)</a
 					> and seamlessly integrates with existing training pipelines for PyTorch models.
@@ -103,11 +103,11 @@
 			<div class="card-body items-center text-center">
 				<h2 class="card-title">Composable with popular acceleration systems</h2>
 				<p>
-					Compose with <a href="https://github.com/microsoft/DeepSpeed" class="text-blue-700"
+					Compose with <a href="https://github.com/microsoft/DeepSpeed" class="dark:text-blue-300 text-blue-800 underline"
 						>DeepSpeed</a
 					>,
-					<a href="https://github.com/facebookresearch/fairscale" class="text-blue-700">FairScale</a
-					>, <a href="https://github.com/NVIDIA/Megatron-LM" class="text-blue-700">Megatron</a>, and
+					<a href="https://github.com/facebookresearch/fairscale" class="dark:text-blue-300 text-blue-800 underline">FairScale</a
+					>, <a href="https://github.com/NVIDIA/Megatron-LM" class="dark:text-blue-300 text-blue-800 underline">Megatron</a>, and
 					more for even faster and more efficient training.
 				</p>
 			</div>
@@ -118,7 +118,7 @@
 				<p>
 					ORT Training is turned on for curated models in the <a
 						href="https://ml.azure.com/"
-						class="text-blue-700">Azure AI | Machine Learning Studio</a
+						class="dark:text-blue-300 text-blue-800 underline">Azure AI | Machine Learning Studio</a
 					> model catalog.
 				</p>
 			</div>
@@ -129,7 +129,7 @@
 				<p>
 					ORT Training can be used to accelerate Hugging Face models like Llama-2-7b through <a
 						href="https://github.com/huggingface/optimum/blob/main/examples/onnxruntime/training/text-classification/README.md#onnx-runtime-training"
-						class="text-blue-700">these scripts</a
+						class="dark:text-blue-300 text-blue-800 underline">these scripts</a
 					>.
 				</p>
 			</div>
diff --git a/src/routes/windows/+page.svelte b/src/routes/windows/+page.svelte
index fb1b2246bc3fb..69f1ef69a8039 100644
--- a/src/routes/windows/+page.svelte
+++ b/src/routes/windows/+page.svelte
@@ -92,18 +92,18 @@
 				<h2 class="card-title">Windows ML Samples Gallery</h2>
 				<p>
 					This gallery demonstrates different machine learning scenarios and features using <a
-						class="text-blue-700"
+						class="dark:text-blue-300 text-blue-800 underline"
 						href="https://docs.microsoft.com/en-us/windows/ai/windows-ml/">Windows ML</a
 					>
 					in an interactive format. The app is an interactive companion that shows the integration of
 					<a
-						class="text-blue-700"
+						class="dark:text-blue-300 text-blue-800 underline"
 						href="https://docs.microsoft.com/en-us/uwp/api/windows.ai.machinelearning"
 						>Windows Machine Learning Library APIs</a
 					>
 					into a desktop
 					<a
-						class="text-blue-700"
+						class="dark:text-blue-300 text-blue-800 underline"
 						href="https://docs.microsoft.com/en-us/uwp/api/windows.ai.machinelearning">WinUI 3</a
 					> application.
 				</p>
diff --git a/tailwind.config.js b/tailwind.config.js
index e8ea2a1b5edc3..b6b66f109f43b 100644
--- a/tailwind.config.js
+++ b/tailwind.config.js
@@ -4,6 +4,7 @@ import flattenColorPalette from 'tailwindcss/lib/util/flattenColorPalette';
 
 /** @type {import('tailwindcss').Config} */
 export default {
+	darkMode: ['selector', '[data-theme=" darkmode"]'],
 	content: ['./src/**/*.{html,svelte,js,ts}'],
 	theme: {
 		extend: {