From d491241b8a0cae27f0b1b0d355c3117fd0ba2b1b Mon Sep 17 00:00:00 2001 From: Maanav Dalal Date: Tue, 20 Aug 2024 10:24:33 -0700 Subject: [PATCH 01/31] Added autodesk testimonial. (#21794) Also updated testimonial cards to be more legible. --- src/images/logos/autodesk-logo.png | Bin 0 -> 8494 bytes src/routes/components/customers.svelte | 6 ++++++ src/routes/testimonials/+page.svelte | 9 +++++++++ src/routes/testimonials/testimonial-card.svelte | 6 +++--- 4 files changed, 18 insertions(+), 3 deletions(-) create mode 100644 src/images/logos/autodesk-logo.png diff --git a/src/images/logos/autodesk-logo.png b/src/images/logos/autodesk-logo.png new file mode 100644 index 0000000000000000000000000000000000000000..cb7d223734dbbe638ea5a9bed589ed200c8eabf5 GIT binary patch literal 8494 zcmeHMi8mD78?Jn^g(zg5C`nlovYW3YLL_1A4WSGsChHhVAtaUTTcwYf!Pu9_c-SUT!h&6DLma-Z#Gc=){Rr zpnr{%?bs7r`)uR5I|+Mabm!z~r$WT>fa`^^9qhyj5y5}$q-XBHze{Kj%^n*bPyW~b z%fSCz2ELq`&OeU%@Z}?Os}uaH51+x`efpg6#^1?6hK=urnOng7*f4V??lTJ4q$D^l zi`ynP&+f7g)`kkT|B{y#rj~Ow_f|(+S!yS`9^OLJz?&pyzoI|_FWUbR^m~2nqSCuK870fJU*hh}Mw>>FBke6Az5*hTl?e7C4=s^0 z@xpM!sLZD+4V&q(-wmqt_&!+f<`VyBRjrDTjNd(WB$(jNiPs~_hbOjJ=fJlk(HC!n zK3-HXJ}Z2glSl9^I~OkUum9Zt<>ddUARrFE1V2xCAN&J&n6_AzxPkGQ3Uzi~BMX;} zcv9d8b+~I2g4_5u7U^TJPZgs-sD%pP_?=4ZNgLV7oc63*_kek|4_C1^7zVfa+jlZh z+_h+f?}cB+3L~oGiaF~(9q6(mt}hU)Y!Q098yLhYT@&KFQoP$uiE||f#kLX(l&+49 zU8S1P)RAl@Q)C3JXqZwdI*_x}XCT$tciv_bG|$`WYB~W$LKPHI0k5%p;U20wZNeO+ zu~)1#D;+&M=H}N28;SXZQYo5?ptwXlqE7z_0pTScqPcu;A9|?2MyhoKc+^7j$G(9rquJ7T?@JnOFuS~ewOc!TEWG8=YTw_Kx1^qRCXb2B|>21(b@eM|U zH;(U|ggQikm_X6ZpCO44_|;4xWFqEi_&Y8(-|8z&U=S&3MwLHql990aeVul*;6r^2 zZSJ5#WK=}LNtF6+U-hjuL4~?xwu3ryp=C0ud6nHCFt?ulcDaf=XoMT;I#f|d2N)TZ zJWjYGOJSyW-lnAT5?U!jV>+q-CGPzlrtC*)R5rD;Af~m16SLyOPc9lkIH>;qVhBpm z`N!qf&A{i)Sy(d=@u(O<$5^lS>(-5XO&NXw59`Wu*f;=;>!Oo^;a1b0<%N}7><}&) zZ4{X}!rB=GW{8G%kBiyNu$KX~dQuEQ21lYVX42i*lnxAW^(%vNjwO%k517O?(=0rt zp~6ihEvYWbS^zV-AjYI>^`*s~LHf0xfz+>m?4PN@_<)o5Vy!iMm!$}6sc(ddO1e7J zI2Rd|j+=(I^UA>}b?KonI7CnCs758(3M4BV5rFBcg(T40f1JHAqZf2qzisc-wfuF2E1=w?y17c4P ziuW9ZWN;a@+C#cSPPqS2v9AFrxr~_WpH0+5@kG{$Io^^vEI-YUT&tDea2jq934?|| zekzng475AU$2#J~YJVdJXM_f=9AG$nhFyP0{eV9C`;-8T4>{k=i*a{;_rRYIa)g=| zvA4|K8B~I7p{0VS0d`JM%8fky8jeD7zK_T>1<5tdgxk8eVlBeO#RzNb`3^++sYCj* zc&|EYq{@PIJ$}>)e=Yrn{JdQD(e?H3(R}1jLi`>Y$> zqX;|13GVKnX8;1hd$OK5$#L+cXw{b5^|B$I9+E@FLM4ZV=|7)xSq#l7;q`Hob5noCSeKVK z>anuo-uFfDfvo|0nlCo3&bOnl>0H1A>Ub`=mF;Mb~4}WJ-ptrzbcy0`6TC-(B{0hxD4e=>aIM z7sSVZUCX1V|2oqG?#*^nrb{&%?Vl=jA%K&^=*xGBLZhJ{z>48hBY$ZGmUycp~ICRUI3xiUM$T_ksPoEc#h;GcW@DM&reC=9~lfXmW0JphD;!o|l6ub-=a zQFPQENK;6&6l7hwfm*n=pM29w zcU0%*M$+z6t4QH4x2DMS@@OE-SAuo$!lOj4;@PXD#`U>9D`V|g* zKBW+>GMCj7Xa^6^2KDqx*x0#1H12f^$HF`GQR~`b=ZI1wYTez$bm&=f)B6ZsPRKTd z`Y=lg%Qd;}-E|F>UJ>5q!S=015mgf-cJ30C)#DHx>(`RU&p0yT6-$BYk*8HdPm?|` zHCp*JHrE_|1g&&Pq|o9T72$VW?_Xr1ebtQaCx{EFnHq}kylHT7Wx_&1>$&9`3iHsw z@jJ(_R30Cu$a1kRQY$Li<#|aT(N`Mf_*7ps2;ki}^$|RZ6ms?^t~QGoSx)>W#UGI@ zwoCcI4@kpf-Ap{kcIjx$-bR1h-4&kK<60TH&&(7|QBgml)b}fbP!L`*=g*7{h*pfX z{{zyR-tqR^Pb*m;-X>2?PD~->nBklh^0=L{(xWCC;9bVXJ9TJ0X zcn<3=Y(EEhc6>z3McC?FUdfxKt*s2a7imxoxmMNCGeaWRMyrRK6H#r&OwFjOT{Sn? z{rx1bqr(?p{VZ3^;K@j7ynSGty4K=E^LU|EApQi5@)ZreNJ%3}h}#06PiF#&gDnsplM3sr6&WgLh$0f76y z@%l{Q0AjX&@e|ZJ5M-M(_Iv+1L)s~?#M_S>`Jhh&=J?FQpNm}Qzhh2)$FbdWVXnLa zine(vep7+z@yMZwP3XKGbc5EGYT}3+gTP^4ei*uKfs=Ojlqw6--gHjamh9w~#OF=L zmK{ji@bcOttMD1_(6aZ4tk&CC`M~O|8Hs}1NtP10Yay6ym0>R;orQS-*3-hr;%OytV_1>M4GTW7B#pLQ$79cG+`{ici>VR%`Ri_K zrGxrDQ$}xlz(QYu^Zo{x@Zm%YqS6Jex5JQ%kl-#u&sJ=T!c7tRjs5J9gwtn`Th#Vqen;y`Y24i#uP-5`A1`|i%yild9nU<{LvyOZ~ASyQ}6BmL_1&Y^WgByN8g)- z|L9Jk7A-d7Z?&gP(AJGp(K+>hvP^`wit5y?_TJak)SkrKQ&3gox8^-QH)#&lZwD(X zfvdVS3dSx6N-dsEsk!>>^1W9l#W86-P6FVB4wzFFDoGlpoM^Y?f89b6Aa;aT!W8>) zc9Ys~khkW8RH3R%Cce@ogRj7nFUkX+%$s7^NoE6wNxg-?RFX$yhF2MMq9bl~xHsVC z=7xB}_2`&@KYU6Yy+iS5Agp%Zm!g-of42X8mn1w6=dd&@ijZ@GC~-(Y zHk^E#(b``?;-cg1iMi(Grp65yj<<7_RA^KLhwv2Ta-|QJGQaL4zI+2e&{OIju6c*P z76Dp1zv3ryuOU1hpjQxx^jt&-D&M$bQZocVdz3anZ?wHR@CQpug}L07rP5WvEV#j1!F|VYM{du~@5!;_$!}7@l`8pB8 zSm(32UY(BZw$HYlbzzhOj{5f*zD^_=*LdHUr5M@o9Tx};i%hHbv`iD0H($+$pa0>i z9b2?|-Jh|)!$8w@mmi^Si;@Di8*7!o2%pkDXHiy$8%TYVcgLA4O|@CR)5Y%@$eGa# zsGYRA0|Eh@lV{WbRowaJ@fQq}=;u*MDyY=oYFDansTz4m(Tlb6wo%6+3~xH%jg|uV z8`6KkQR&bR&{Ht3?(h)VcFeqlP4AuY0wcwMT7PU*?2h}}n1zJH0+t!yr)pW$>>S|z zNO^z32>nI(d4dnJ*9UFuP)a|t%^ZI^HdJb^HlIQZh)J``z3Us96IQCuJwHh2!OJQ@ zIYX4!d+ev9}MH8bQRNbl7N z?4r0LZ@A5Fqp0&nrY@y}54oxJy261C>d)V;g!OtfM2(&yNDEJ5-}R9NZ3@~JANwah zrRIste94CQ{`pj<6Gmn_1jTL2*Wg2Yg0`OI0kf9sx>Qh$sks>KgMu9mX8WC|oDe^r zRK2r)c^Oyu9cyqVM_vbOZk#YaNXr6Y9wC5XAeAf3b2axy=(aIv6Z6WP%|vW`#GxHN z4!IVZ-?*UTXZ%r;@TH9a_R9&vjcx4+Gv?3R+%DJlMcRWj2@Ct zIGy?C&m)cux}(SX%G%k6@=sbQSwq5()&>D>l@LmzV2X>oMC3kzVVuZPgn5U1Y_WYL z+AD|d1ub@oxTEh*_^M4q+TEGJnzjNp6AC&{??7^45!g8rqZ8oUfs>dCTpT^z(7&iDEv25 z!2N@q3O=OMIoRqT^vzPab84(y2OuRiC?DD}guav&{rwmL>G*`995BggwjFb4Zc!Kt!3d$uI`3{jwrcf@-*Bznm{XefDw4cc%x; z_$(1@HJiBx8xb*Ehh z#;HcTCyG-Vu(I$jebgfu)N#1%HFL2^?0L>zdP07H%K)At)<@OTG1z#)(BZO+veu@? zzKf#fBw#rY?Vf^KNSgmly3@T;EyUoP}m4?h~SoZe_bRL4(p~0qeKa z4i!cN9C6&5g-eH~a%svW-&X84KZiUZ*?uCH8%d^@xE(G*<9tkVkU%CnjDxHZu^Py- zD_EL*UX1tx6peUrIx6`H*9RM6Uy&k=Xyx83K6_qDAb>mLCmMy%&VJvhc{E%7K?FG% zX0n-Lr98T9jqQ)<4uT!N>8yX7t=TK90443ML=fEf`Aip`wo_voEhX!*jAz8%7Kk#R zS!3-w+Fq;_SCtoey7qES1OvY+7=Xistgkjr47b)ep?CaTyn!kVnV&%tD>14A~+ zmWbdzbN$dlJL_gk_b)BfoQ`2>Jwh;K+T<~)#=amy{}9uhZeurk<=%G*w82&ZVzg~~ zJNBd}`+FF7H}i`3e1|?buO?ot3Cw*Ii`ajP{KV;OT}xjcHDJ z`7LN<6_GcufVT&I*}oNozGh>{pF>?~R8ldMy{u4bYc#c-Sq${Jkh362s4cX;CrKAi zg@A;}y{#if8$k9x8~?v%(yZhDFk zP(phS@%~y6N_+e%BbB#~NCW>Re-Mb&^G_47vY)cQtnxe^zp4^83z1e$_va;N9rjH9 zok|;E-F42=YxueYiKT7iMc4YWIUHcN0{DsO26iYWS2N_dB_Y2Rg&U*1*_IQ8?X#Uq z=d)9PuXlB?MN}+>SpTO+VJO6#U-?cOnwopFWks{q5noONy7;+{g!u%OS9@IMprz{F z8UX)aQzHYMW#P(%jYqqhG(ofzvZLl)zfqVhFh!!74;$4u-{uCOG$5XhY z={C&xdn4lN-lDi`+CG(Y=Is*rI49pG_BEI@{a%JVp-`;gNN-!ks(j``>bVmQjsjH< zBt3=Fb@7)Sc9fIdT+cUYZTw#@(#$Uz6RKeEbA{BUv&!PWeQPiSGIRp?}&-3_$ELXR_;M4mZ0Oj|$+|Z4Uym@5<^fVd1Vi%mj69)aRhE2=K7O%ZDQ? zbR2GVFw|;ms)tWim(rZu7;>(0_DEOCKakF~;G#u-A~V_#uVR zGavR>=A@5lBS^BL)Ics2NLWN7fSdJRt*hNT`8fi8NUyNDQ4LZ#ji_Xx+u%xuRuoM7 zqet7ZFPm5E#Lr=1t_%*ERQS;>?>D02s8}fdWyQ(tyzue$35$VIg(ud?U_4h$pMD^I zEj+fbEb_Y(Hh8Bgzn96;Fnz5!PnK#nZuKdbU|PKiYQ81?pK9`-Wb~h2_OFJx9bw5e Vgo9w(yZ+U)?;D!mExGd~@_)0?gzo?V literal 0 HcmV?d00001 diff --git a/src/routes/components/customers.svelte b/src/routes/components/customers.svelte index a5da8146bea27..6e8431d9e7b81 100644 --- a/src/routes/components/customers.svelte +++ b/src/routes/components/customers.svelte @@ -8,6 +8,7 @@ import antgroupLogo from '../../images/logos/antgroup-logo.png'; import algoriddimLogo from '../../images/logos/algoriddim-logo.png'; import ATLASLogo from '../../images/logos/ATLAS-logo.png'; + import autodeskLogo from '../../images/logos/autodesk-logo.png'; import bazaarvoiceLogo from '../../images/logos/bazaarvoice-logo.png'; import camoLogo from '../../images/logos/camo-logo.png'; import cephableLogo from '../../images/logos/cephable-logo.png'; @@ -61,6 +62,11 @@ src: ATLASLogo, alt: 'ATLAS' }, + { + href: './testimonials#Autodesk', + src: autodeskLogo, + alt: 'Autodesk' + }, { href: './testimonials#Bazaarvoice', src: bazaarvoiceLogo, diff --git a/src/routes/testimonials/+page.svelte b/src/routes/testimonials/+page.svelte index ca71962dda7d6..a3a38f9d11087 100644 --- a/src/routes/testimonials/+page.svelte +++ b/src/routes/testimonials/+page.svelte @@ -6,6 +6,7 @@ import antgrouplogo from '../../images/logos/antgroup-logo.png'; import algoriddimLogo from '../../images/logos/algoriddim-logo.png'; import atlaslogo from '../../images/logos/ATLAS-logo.png'; + import autodesklogo from '../../images/logos/autodesk-logo.png'; import bazaarvoicelogo from '../../images/logos/bazaarvoice-logo.png'; import camologo from '../../images/logos/camo-logo.png'; import cephablelogo from '../../images/logos/cephable-logo.png'; @@ -77,6 +78,14 @@ imgsrc: atlaslogo, imgalt: 'Atlas Experiment logo' }, + { + title: 'Autodesk', + quote: + "Autodesk Flame's use of ONNX Runtime offers major advantages with cross-platform compatibility and performance, providing artists the flexibility and interactivity they expect. This allows them to make use of machine learning models directly in Flame's creative toolset, augmenting the quality of their work and increasing the software's expandability. Microsoft's ONNX Runtime team has provided expert guidance and support throughout the development process, enabling us to put AI-powered creative tools in the hands of artists seeking high-quality VFX and finishing solutions.", + author: 'Louis Martin, Sr. Manager of Software Development for Autodesk Flame', + imgsrc: autodesklogo, + imgalt: 'Autodesk logo' + }, { title: 'Bazaarvoice', quote: diff --git a/src/routes/testimonials/testimonial-card.svelte b/src/routes/testimonials/testimonial-card.svelte index d432149556c88..4da075bd4dcf4 100644 --- a/src/routes/testimonials/testimonial-card.svelte +++ b/src/routes/testimonials/testimonial-card.svelte @@ -24,7 +24,7 @@
@@ -32,8 +32,8 @@
-

{title}

-

{description}

+

{title}

+

{description}


-{author}

From 6265c3acf23059324e4e22355fac2446509e89e5 Mon Sep 17 00:00:00 2001 From: Jing Fang <126209182+fajin-corp@users.noreply.github.com> Date: Mon, 26 Aug 2024 18:12:13 -0700 Subject: [PATCH 02/31] Add 4bit quantizer to onnx runtime doc (#21835) ### Description Introduce how to use matmul_4bits_quantizer to do weight only quantization. ### Motivation and Context Add 4bit quantizer to onnx runtime doc --- .../model-optimizations/quantization.md | 53 +++++++++++++++++++ 1 file changed, 53 insertions(+) diff --git a/docs/performance/model-optimizations/quantization.md b/docs/performance/model-optimizations/quantization.md index c769b0889fa23..961cef10c6972 100644 --- a/docs/performance/model-optimizations/quantization.md +++ b/docs/performance/model-optimizations/quantization.md @@ -231,6 +231,59 @@ ONNX Runtime leverages the TensorRT Execution Provider for quantization on GPU n We provide two end-to end examples: [Yolo V3](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization/object_detection/trt/yolov3) and [resnet50](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization/image_classification/trt/resnet50). +## Quantize to Int4/UInt4 + +ONNX Runtime can quantize certain operators in a model to 4 bit integer types. Block-wise weight-only quantizaiton is applied to the operators. The supported op types are: +- [MatMul](https://github.com/onnx/onnx/blob/main/docs/Operators.md#matmul): + - The node is quantized only if the input `B` is constant + - support QOperator or QDQ format. + - If QOperator is selected, the node is converted to a [MatMulNBits](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#commicrosoftmatmulnbits) node. Weight `B` is blockwise quantized and saved in the new node. [HQQ](https://arxiv.org/pdf/2309.15531.pdf), [GPTQ](https://huggingface.co/docs/transformers/main/en/quantization/gptq) and RTN (default) algorithms are supported. + - If QDQ is selected, the MatMul node is replaced by a DequantizeLinear -> MatMul pair. Weight `B` is blockwise quantized and saved in the DequantizeLinear node as an initializer. +- [Gather](https://github.com/onnx/onnx/blob/main/docs/Operators.md#Gather): + - The node is quantized only if the input `data` is constant. + - support QOperator + - Gather is quantized to a [GatherBlockQuantized](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#commicrosoftgatherblockquantized) node. Input `data` is blockwise quantized and saved in the new node. Only support RTN algorithm. + +Since Int4/UInt4 types are introduced in [onnx opset 21](https://github.com/onnx/onnx/releases/tag/v1.16.0), if the model's onnx domain version is < 21, it is force upgraded to opset 21. Please make sure the operators in the model are compatible with onnx opset 21. + +To run a model that has GatherBlockQuantized nodes, ONNX Runtime 1.20 is needed. + +Code Examples: + +```python +from onnxruntime.quantization import ( + matmul_4bits_quantizer, + quant_utils, + quantize +) +from pathlib import Path + +model_fp32_path="path/to/orignal/model.onnx" +model_int4_path="path/to/save/quantized/model.onnx" + +quant_config = matmul_4bits_quantizer.DefaultWeightOnlyQuantConfig( + block_size=128, # 2's exponential and >= 16 + is_symmetric=True, # if true, quantize to Int4. otherwsie, quantize to uint4. + accuracy_level=4, # used by MatMulNbits, see https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#attributes-35 + quant_format=quant_utils.QuantFormat.QOperator, + op_types_to_quantize=("MatMul","Gather"), # specify which op types to quantize + quant_axes=(("MatMul", 0), ("Gather", 1),) # specify which axis to quantize for an op type. + +model = quant_utils.load_model_with_shape_infer(Path(model_fp32_path)) +quant = matmul_4bits_quantizer.MatMul4BitsQuantizer( + model, + nodes_to_exclude=None, # specify a list of nodes to exclude from quantizaiton + nodes_to_include=None, # specify a list of nodes to force include from quantization + algo_config=quant_config,) +quant.process() +quant.model.save_model_to_file( + model_int4_path, + True) # save data to external file + +``` + +For AWQ and GTPQ quantization usage, please refer to [Gen-AI model builder](https://github.com/microsoft/onnxruntime-genai/tree/main/src/python/py/models#quantized-pytorch-model). + ## FAQ ### Why am I not seeing performance improvements? {: .no_toc } From c4025042ae6968a23ed139de251cdc04d1f552f8 Mon Sep 17 00:00:00 2001 From: Preetha Veeramalai Date: Wed, 4 Sep 2024 11:17:36 -0700 Subject: [PATCH 03/31] Update documentation for ovep rel-5.4 (#21909) ### Description Contains doc updates on latest OVEP 5.4 release. --- docs/build/eps.md | 8 ++-- .../OpenVINO-ExecutionProvider.md | 37 ++++++++++++++----- 2 files changed, 32 insertions(+), 13 deletions(-) diff --git a/docs/build/eps.md b/docs/build/eps.md index 12fc4d3235bb3..6a76ce0fcbfd7 100644 --- a/docs/build/eps.md +++ b/docs/build/eps.md @@ -260,13 +260,13 @@ See more information on the OpenVINO™ Execution Provider [here](../execution-p ### Prerequisites {: .no_toc } -1. Install the OpenVINO™ offline/online installer from Intel® Distribution of OpenVINO™TM Toolkit **Release 2024.1** for the appropriate OS and target hardware: - * [Windows - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?VERSION=v_2023_1_0&OP_SYSTEM=WINDOWS&DISTRIBUTION=ARCHIVE). - * [Linux - CPU, GPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?VERSION=v_2023_1_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE) +1. Install the OpenVINO™ offline/online installer from Intel® Distribution of OpenVINO™TM Toolkit **Release 2024.3** for the appropriate OS and target hardware: + * [Windows - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=WINDOWS&DISTRIBUTION=ARCHIVE). + * [Linux - CPU, GPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE) Follow [documentation](https://docs.openvino.ai/2024/home.html) for detailed instructions. - *2024.1 is the current recommended OpenVINO™ version. [OpenVINO™ 2023.1](https://docs.openvino.ai/archive/2023.1/home.html) is minimal OpenVINO™ version requirement.* + *2024.3 is the current recommended OpenVINO™ version. [OpenVINO™ 2023.3](https://docs.openvino.ai/2023.3/home.html) is minimal OpenVINO™ version requirement.* 2. Configure the target hardware with specific follow on instructions: * To configure Intel® Processor Graphics(GPU) please follow these instructions: [Windows](https://docs.openvino.ai/latest/openvino_docs_install_guides_configurations_for_intel_gpu.html#gpu-guide-windows), [Linux](https://docs.openvino.ai/latest/openvino_docs_install_guides_configurations_for_intel_gpu.html#linux) diff --git a/docs/execution-providers/OpenVINO-ExecutionProvider.md b/docs/execution-providers/OpenVINO-ExecutionProvider.md index 39ec668bc0bf9..fa71f70b0c277 100644 --- a/docs/execution-providers/OpenVINO-ExecutionProvider.md +++ b/docs/execution-providers/OpenVINO-ExecutionProvider.md @@ -20,7 +20,7 @@ Accelerate ONNX models on Intel CPUs, GPUs, NPU with Intel OpenVINO™ Execution ## Install Pre-built packages and Docker images are published for OpenVINO™ Execution Provider for ONNX Runtime by Intel for each release. -* OpenVINO™ Execution Provider for ONNX Runtime Release page: [Latest v5.2 Release](https://github.com/intel/onnxruntime/releases) +* OpenVINO™ Execution Provider for ONNX Runtime Release page: [Latest v5.4 Release](https://github.com/intel/onnxruntime/releases) * Python wheels Ubuntu/Windows: [onnxruntime-openvino](https://pypi.org/project/onnxruntime-openvino/) * Docker image: [openvino/onnxruntime_ep_ubuntu20](https://hub.docker.com/r/openvino/onnxruntime_ep_ubuntu20) @@ -30,10 +30,9 @@ ONNX Runtime OpenVINO™ Execution Provider is compatible with three lastest rel |ONNX Runtime|OpenVINO™|Notes| |---|---|---| +|1.19.0|2024.3|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.4)| +|1.18.0|2024.1|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.3)| |1.17.1|2023.3|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.2)| -|1.16.0|2023.1|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.1)| -|1.15.0|2023.0|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.0.0)| -|1.14.0|2022.3|[Details](https://github.com/intel/onnxruntime/releases/tag/v4.3)| ## Build @@ -200,8 +199,30 @@ For more information on Multi-Device plugin of OpenVINO™, please refer to the [Intel OpenVINO™ Multi Device Plugin](https://docs.openvino.ai/latest/openvino_docs_OV_UG_Running_on_multiple_devices.html). ### Export OpenVINO Compiled Blob -Export the OpenVINO compiled blob as an ONNX model. Using this ONNX model for subsequent inferences avoids model recompilation and could have a positive impact on Session creation time. The exported model is saved to the same directory as the source model with the suffix -ov_{device}_blob.onnx where device can be one of the supported like CPU or NPU. This feature is currently enabled for fully supported models only. -Refer to [Configuration Options](#configuration-options) for more information about using these runtime options. +Export the OpenVINO compiled blob as an ONNX model. Using this ONNX model for subsequent inferences avoids model recompilation and could have a positive impact on Session creation time. This feature is currently enabled for fully supported models only. It complies with the ORT session config keys +``` + Ort::SessionOptions session_options; + + // Enable EP context feature to dump the partitioned graph which includes the EP context into Onnx file. + // "0": disable. (default) + // "1": enable. + + session_options.AddConfigEntry(kOrtSessionOptionEpContextEnable, "1"); + + // Flag to specify whether to dump the EP context into single Onnx model or pass bin path. + // "0": dump the EP context into separate file, keep the file name in the Onnx model. + // "1": dump the EP context into the Onnx model. (default). + + session_options.AddConfigEntry(kOrtSessionOptionEpContextEmbedMode, "1"); + + // Specify the file path for the Onnx model which has EP context. + // Defaults to /original_file_name_ctx.onnx if not specified + + session_options.AddConfigEntry(kOrtSessionOptionEpContextFilePath, ".\ov_compiled_epctx.onnx"); + + sess = onnxruntime.InferenceSession(, session_options) +``` +Refer to [Session Options](https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h) for more information about session options. ### Enable QDQ Optimizations Passes Optimizes ORT quantized models for the NPU device to only keep QDQs for supported ops and optimize for performance and accuracy.Generally this feature will give better performance/accuracy with ORT Optimizations disabled. @@ -239,8 +260,7 @@ The session configuration options are passed to SessionOptionsAppendExecutionPro ``` OrtOpenVINOProviderOptions options; -options.device_type = "GPU"; -options.precision = "FP32"; +options.device_type = "GPU_FP32"; options.num_of_threads = 8; options.cache_dir = ""; options.context = 0x123456ff; @@ -277,7 +297,6 @@ The following table lists all the available configuration options for API 2.0 an | context | string | OpenCL Context | void* | This option is only available when OpenVINO EP is built with OpenCL flags enabled. It takes in the remote context i.e the cl_context address as a void pointer.| | enable_opencl_throttling | string | True/False | boolean | This option enables OpenCL queue throttling for GPU devices (reduces CPU utilization when using GPU). | | enable_qdq_optimizer | string | True/False | boolean | This option enables QDQ Optimization to improve model performance and accuracy on NPU. | -| export_ep_ctx_blob | string | True/False | boolean | This options enables exporting the OpenVINO Compiled Blob as an ONNX Operator EPContext. | Valid Hetero or Multi or Auto Device combinations: From fb43a3387dc93c26cc4d561d901deb4c100a2b76 Mon Sep 17 00:00:00 2001 From: Scott McKay Date: Fri, 6 Sep 2024 09:29:18 +1000 Subject: [PATCH 04/31] Update CoreML supported ops lists (#21627) ### Description Update CoreML ops lists with recent additions. ### Motivation and Context --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> --- .../CoreML-ExecutionProvider.md | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md index af752b1a85e7e..6ffa77edc60b5 100644 --- a/docs/execution-providers/CoreML-ExecutionProvider.md +++ b/docs/execution-providers/CoreML-ExecutionProvider.md @@ -128,10 +128,12 @@ Operators that are supported by the CoreML Execution Provider when a NeuralNetwo |ai.onnx.ReduceSum|| |ai.onnx:Relu|| |ai.onnx:Reshape|| -|ai.onnx:Resize|| +|ai.onnx:Resize|4D input.
`coordinate_transformation_mode` == `asymmetric`.
`mode` == `linear` or `nearest`.
`nearest_mode` == `floor`.
`exclude_outside` == false
`scales` or `sizes` must be constant.| |ai.onnx:Shape|Attribute `start` with non-default value is not supported.
Attribute `end` is not supported.| |ai.onnx:Sigmoid|| |ai.onnx:Slice|Inputs `starts`, `ends`, `axes`, and `steps` should be constant. Empty slice is not supported.| +|ai.onnx:Softmax|| +|ai.onnx:Split|If provided, `splits` must be constant.| |ai.onnx:Squeeze|| |ai.onnx:Sqrt|| |ai.onnx:Sub|| @@ -147,15 +149,26 @@ Operators that are supported by the CoreML Execution Provider when a MLProgram m |ai.onnx:Add|| |ai.onnx:AveragePool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.| |ai.onnx:Clip|| +|ai.onnx:Concat|| |ai.onnx:Conv|Only 1D/2D Conv is supported.
Bias if provided must be constant.| +|ai.onnx:ConvTranspose|Weight and bias must be constant.
padding_type of SAME_UPPER/SAME_LOWER is not supported.
kernel_shape must have default values.
output_shape is not supported.
output_padding must have default values.| +|ai.onnx.DepthToSpace|If 'mode' is 'CRD' the input must have a fixed shape.| |ai.onnx:Div|| |ai.onnx:Gemm|Input B must be constant.| |ai.onnx:GlobalAveragePool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.| |ai.onnx:GlobalMaxPool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.| +|ai.onnx:GridSample|4D input.
'mode' of 'linear' or 'zeros'.
(mode==linear && padding_mode==reflection && align_corners==0) is not supported.| +|ai.onnx.LeakyRelu|| |ai.onnx:MatMul|Only support for transA == 0, alpha == 1.0 and beta == 1.0 is currently implemented.| |ai.onnx:MaxPool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.| |ai.onnx:Mul|| |ai.onnx:Pow|Only supports cases when both inputs are fp32.| |ai.onnx:Relu|| |ai.onnx:Reshape|| +|ai.onnx:Resize|See [resize_op_builder.cc](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/coreml/builders/impl/resize_op_builder.cc) implementation. There are too many permutations to describe the valid combinations.| +|ai.onnx.Slice|starts/ends/axes/steps must be constant initializers.| +|ai.onnx.Split|If provided, `splits` must be constant.| |ai.onnx:Sub|| +|ai.onnx:Sigmoid|| +|ai.onnx:Tanh|| +|ai.onnx:Transpose|| From 99ddaedc5d72d4726162192819240d9a41d6de76 Mon Sep 17 00:00:00 2001 From: Emma Ning <43255631+EmmaNingMS@users.noreply.github.com> Date: Fri, 6 Sep 2024 13:17:39 -0700 Subject: [PATCH 05/31] NimbleEdge blog update (#22017) Update according to the change request from NimbleEdge ### Description ### Motivation and Context --- src/routes/blogs/nimbleedge-x-onnxruntime/+page.svx | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/routes/blogs/nimbleedge-x-onnxruntime/+page.svx b/src/routes/blogs/nimbleedge-x-onnxruntime/+page.svx index 7dc2f326cb3f5..48efc63953143 100644 --- a/src/routes/blogs/nimbleedge-x-onnxruntime/+page.svx +++ b/src/routes/blogs/nimbleedge-x-onnxruntime/+page.svx @@ -32,7 +32,7 @@ url: 'https://onnxruntime.ai/blogs/nimbleedge-x-onnxruntime' [NimbleEdge](https://www.nimbleedge.com/) is an on-device Machine Learning (ML) platform that enables real-time personalization in mobile apps, executing data capture, processing and ML inference on end users' mobile devices vs. on cloud. Using mobile compute efficiently to deliver optimal performance with minimal device resource usage is a key priority for NimbleEdge. For this, NimbleEdge leverages various ML inference runtimes, including, prominently, **ONNX Runtime**. -In this blog post, we'll explore how on-device compute can be leveraged for cost-efficient, privacy-preserving real-time ML in mobile apps, and how NimbleEdge leverages ONNX Runtime to enable this. We also share results from NimbleEdge's on-device deployment with Dream11, India's largest fantasy gaming platform with 200Mn+ users. +In this blog post, we'll explore how on-device compute can be leveraged for cost-efficient, privacy-preserving real-time ML in mobile apps, and how NimbleEdge leverages ONNX Runtime to enable this. We also share results from NimbleEdge’s on-device deployment with one of India’s largest fantasy gaming platforms with hundreds of millions of users. ### **Introduction** @@ -102,17 +102,17 @@ For inference execution, NimbleEdge utilizes a number of runtimes, prominently i Through the capabilities listed here, NimbleEdge's comprehensive on-device ML platform enables high performance real-time ML deployments in days vs. months. -### **Case Study: Real time ranking of fantasy sports contests for Dream11** +### **Case Study: Real time ranking of fantasy sports contests for leading Indian fantasy gaming co** -Dream11 is an Indian fantasy sports platform (like Fanduel/ Draftkings in USA) with 200M+ users, and a peak concurrency of ~15 million users. Dream11 offers thousands of fantasy contests across dozens of matches from 10+ sports, with each contest varying in contest entry amount, win %, and participant count. +Fantasy Gaming co (name obscured for confidentiality) is an Indian fantasy sports platform (like Fanduel/ Draftkings in USA) with hundreds of millions of users, and a peak concurrency of several million users. Fantasy Gaming co offers thousands of fantasy contests across dozens of matches from 10+ sports, with each contest varying in contest entry amount, win %, and no. of participants. -To streamline the user journey, Dream11 was running a recommendation system that delivered personalized contest recommendations to users, based on historical interactions. Dream11 analyzed customer clickstream data, and identified that incorporating in-session user interactions in the recommender systems would significantly improve quality of recommendations vs. leveraging batch predictions generated hourly. +To streamline the user journey, Fantasy Gaming co was running a recommendation system that delivered personalized contest recommendations to users, based on historical interactions. They analyzed customer clickstream data, and identified that incorporating in-session user interactions in the recommender systems would significantly improve quality of recommendations vs. leveraging batch predictions generated hourly. -Due to this, Dream11 was keen to deploy real-time, session-aware recommendations, but implementation was challenging due to the aforementioned challenges in real-time ML on cloud. Hence, Dream11 turned to on-device ML with NimbleEdge for implementing real-time personalized contest recommendations. +Due to this, Fantasy Gaming co was keen to deploy real-time, session-aware recommendations, but implementation was challenging due to the aforementioned challenges in real-time ML on cloud. Hence, Fantasy Gaming co turned to on-device ML with NimbleEdge for implementing real-time personalized contest recommendations. **Results** -With NimbleEdge, Dream11 is now able to generate features and predictions based on real-time user interactions, resulting in improved relevance of recommendations for millions of users. Additionally, inference was delivered at millisecond latency, with minimal battery and CPU usage impact! +With NimbleEdge, Fantasy Gaming co is now able to generate features and predictions based on real-time user interactions, resulting in improved relevance of recommendations for millions of users. Additionally, inference was delivered at millisecond latency, with minimal battery and CPU usage impact! **No. of inferences:** `7B+` From e892a56c472c3a015f256e9e77c2f4c70061be7a Mon Sep 17 00:00:00 2001 From: Maanav Dalal Date: Mon, 9 Sep 2024 20:37:54 -0700 Subject: [PATCH 06/31] Fixed many (not all) accessibility issues. (#22002) Likely will need to change HLJS theme for the rest. Should be good to instantly approve/merge, but feel free to review as necessary. Not currently deploying a preview as another more significant PR is being reviewed, but I can do it if requested :) --- .../on-device-training/android-app.md | 36 ++++++------------- docs/tutorials/on-device-training/ios-app.md | 33 ++++++----------- .../blogs/pytorch-on-the-edge/+page.svelte | 32 ++++++++--------- src/routes/components/footer.svelte | 6 ++-- src/routes/events/+page.svelte | 4 +-- src/routes/events/event-post.svelte | 4 +-- src/routes/getting-started/+page.svelte | 6 ++-- src/routes/huggingface/+page.svelte | 32 ++++++++--------- .../testimonials/testimonial-card.svelte | 4 +-- src/routes/training/+page.svelte | 18 +++++----- src/routes/windows/+page.svelte | 6 ++-- tailwind.config.js | 1 + 12 files changed, 77 insertions(+), 105 deletions(-) diff --git a/docs/tutorials/on-device-training/android-app.md b/docs/tutorials/on-device-training/android-app.md index b9b0ae49c7bec..ab528a5a1c1ad 100644 --- a/docs/tutorials/on-device-training/android-app.md +++ b/docs/tutorials/on-device-training/android-app.md @@ -7,15 +7,15 @@ nav_order: 1 --- # On-Device Training: Building an Android Application - +{: .no_toc } In this tutorial, we will explore how to build an Android application that incorporates ONNX Runtime's On-Device Training solution. On-device training refers to the process of training a machine learning model directly on an edge device without relying on cloud services or external servers. Here is what the application will look like at the end of this tutorial: - +an image classification app with Tom Cruise in the middle. ## Introduction - +{: .no_toc } We will guide you through the steps to create an Android app that can train a simple image classification model using on-device training techniques. This tutorial showcases the `transfer learning` technique where knowledge gained from training a model on one task is leveraged to improve the performance of a model on a different but related task. Instead of starting the learning process from scratch, transfer learning allows us to transfer the knowledge or features learned by a pre-trained model to a new task. For this tutorial, we will leverage the `MobileNetV2` model which has been trained on large-scale image datasets such as ImageNet (which has 1,000 classes). We will use this model for classifying custom data into one of four classes. The initial layers of MobileNetV2 serve as a feature extractor, capturing generic visual features applicable to various tasks, and only the final classifier layer will be trained for the task at hand. @@ -24,26 +24,10 @@ In this tutorial, we will use data to learn to: - Classify animals into one of four categories using a pre-packed animals dataset. - Classify celebrities into one of four categories using a custom celebrities dataset. -## Contents - -- [Introduction](#introduction) -- [Prerequisites](#prerequisites) -- [Offline Phase - Building the training artifacts](#offline-phase---building-the-training-artifacts) - - [Export the model to ONNX](#op1) - - [Define the trainable and non trainable parameters](#op2) - - [Generate the training artifacts](#op3) -- [Training Phase - Android application development](#training-phase---android-application-development) - - [Setting up the project in Android Studio](#tp1) - - [Adding the ONNX Runtime dependency](#tp2) - - [Packaging the Prebuilt Training Artifacts and Dataset](#tp3) - - [Interfacing with ONNX Runtime - C++ Code](#tp4) - - [Image Preprocessing](#tp5) - - [Application Frontend](#tp6) -- [Training Phase - Running the application on a device](#training-phase---running-the-application-on-a-device) - - [Running the application on a device](#tp7) - - [Training with a pre-loaded dataset - Animals](#tp8) - - [Training with a custom dataset - Celebrities](#tp9) -- [Conclusion](#conclusion) + +## Table of Contents +* TOC placeholder +{:toc} ## Prerequisites @@ -791,7 +775,7 @@ To follow this tutorial, you should have a basic understanding of Android app de b. Launching the application on the device should look like this: - + Barebones ORT Personalize app 2.
Training with a pre-loaded dataset - Animals @@ -805,7 +789,7 @@ To follow this tutorial, you should have a basic understanding of Android app de e. Use any animal image from your library for inferencing now. - + ORT Personalize app with an image of a cow As can be seen from the image above, the model correctly predicted `Cow`. @@ -825,7 +809,7 @@ To follow this tutorial, you should have a basic understanding of Android app de g. That's it!. Hopefully the application classified the image correctly. - + an image classification app with Tom Cruise in the middle. ## Conclusion diff --git a/docs/tutorials/on-device-training/ios-app.md b/docs/tutorials/on-device-training/ios-app.md index fff1347923ef0..e61bab68596ff 100644 --- a/docs/tutorials/on-device-training/ios-app.md +++ b/docs/tutorials/on-device-training/ios-app.md @@ -7,7 +7,7 @@ nav_order: 2 --- # Building an iOS Application - +{: .no_toc } In this tutorial, we will explore how to build an iOS application that incorporates ONNX Runtime's On-Device Training solution. On-device training refers to the process of training a machine learning model directly on an edge device without relying on cloud services or external servers. In this tutorial, we will build a simple speaker identification app that learns to identify a speaker's voice. We will take a look at how to train a model on-device, export the trained model, and use the trained model to perform inference. @@ -18,6 +18,7 @@ Here is what the application will look like: application demo, with buttons for voice, train, and infer. ## Introduction +{: .no_toc } We will guide you through the process of building an iOS application that can train a simple audio classification model using on-device training techniques. The tutorial showcases the `transfer learning` technique where knowledge gained from training a model on one task is leveraged to improve the performance of a model on a different but related task. Instead of starting the learning process from scratch, transfer learning allows us to transfer the knowledge or features learned by a pre-trained model to a new task. In this tutorial, we will leverage the [`wav2vec`](https://huggingface.co/superb/wav2vec2-base-superb-sid) model which has been trained on large-scale celebrity speech data such as `VoxCeleb1`. We will use the pre-trained model to extract features from the audio data and train a binary classifier to identify the speaker. The initial layers of the model serve as a feature extractor, capturing the important features of the audio data. Only the last layer of the model is trained to perform the classification task. @@ -29,23 +30,9 @@ In the tutorial, we will: - Use the exported model to perform inference -## Contents -- [Building an iOS Application](#building-an-ios-application) - - [Introduction](#introduction) - - [Contents](#contents) - - [Prerequisites](#prerequisites) - - [Generating the training artifacts](#generating-the-training-artifacts) - - [Building the iOS application](#building-the-ios-application) - - [Xcode Setup](#xcode-setup) - - [Application Overview](#application-overview) - - [Training the model](#training-the-model) - - [Inference with the trained model](#inference-with-the-trained-model) - - [Recording Audio](#recording-audio) - - [Train View](#train-view) - - [Infer View](#infer-view) - - [ContentView](#contentview) - - [Running the iOS application](#running-the-ios-application) - - [Conclusion](#conclusion) +## Table of Contents +* TOC placeholder +{:toc} ## Prerequisites @@ -947,27 +934,27 @@ Now, we are ready to run the application. You can run the application on the sim a. Now, when you run the application, you should see the following screen: - +My Voice application with Train and Infer buttons b. Next, click on the `Train` button to navigate to the `TrainView`. The `TrainView` will prompt you to record your voice. You will need to record your voice `kNumRecordings` times. - +My Voice application with words to record c. Once all the recordings are complete, the application will train the model on the given data. You will see the progress bar indicating the progress of the training. - +Loading bar while the app is training d. Once the training is complete, you will see the following screen: - +The app informs you training finished successfully! e. Now, click on the `Infer` button to navigate to the `InferView`. The `InferView` will prompt you to record your voice. Once the recording is complete, it will perform inference with the trained model and display the result of the inference. - +My Voice application allows you to record and infer whether it's you or not. That's it! Hopefully, it identified your voice correctly. diff --git a/src/routes/blogs/pytorch-on-the-edge/+page.svelte b/src/routes/blogs/pytorch-on-the-edge/+page.svelte index 83ab6d2d49db6..d0a9d765cd5f1 100644 --- a/src/routes/blogs/pytorch-on-the-edge/+page.svelte +++ b/src/routes/blogs/pytorch-on-the-edge/+page.svelte @@ -179,9 +179,9 @@ fun run(audioTensor: OnnxTensor): Result {

Run PyTorch models on the edge

- By: Natalie Kershaw + By: Natalie Kershaw and - Prasanth Pulavarthi

@@ -217,12 +217,12 @@ fun run(audioTensor: OnnxTensor): Result { anywhere that is outside of the cloud, ranging from large, well-resourced personal computers to small footprint devices such as mobile phones. This has been a challenging task to accomplish in the past, but new advances in model optimization and software like - ONNX Runtime + ONNX Runtime make it more feasible - even for new generative AI and large language models like Stable Diffusion, Whisper, and Llama2.

-

Considerations for PyTorch models on the edge

+

Considerations for PyTorch models on the edge

There are several factors to keep in mind when thinking about running a PyTorch model on the @@ -292,7 +292,7 @@ fun run(audioTensor: OnnxTensor): Result { -

Tools for PyTorch models on the edge

+

Tools for PyTorch models on the edge

We mentioned ONNX Runtime several times above. ONNX Runtime is a compact, standards-based @@ -305,7 +305,7 @@ fun run(audioTensor: OnnxTensor): Result { format that doesn't require the PyTorch framework and its gigabytes of dependencies. PyTorch has thought about this and includes an API that enables exactly this - torch.onnxtorch.onnx. ONNX is an open standard that defines the operators that make up models. The PyTorch ONNX APIs take the Pythonic PyTorch code and turn it into a functional graph that captures the operators that are needed to run the model without Python. As with everything @@ -318,7 +318,7 @@ fun run(audioTensor: OnnxTensor): Result { The popular Hugging Face library also has APIs that build on top of this torch.onnx functionality to export models to the ONNX format. Over 130,000 models130,000 models are supported making it very likely that the model you care about is one of them.

@@ -328,7 +328,7 @@ fun run(audioTensor: OnnxTensor): Result { and web browsers) via various languages (from C# to JavaScript to Swift).

-

Examples of PyTorch models on the edge

+

Examples of PyTorch models on the edge

Stable Diffusion on Windows

@@ -345,7 +345,7 @@ fun run(audioTensor: OnnxTensor): Result {

You don't have to export the fifth model, ClipTokenizer, as it is available in ONNX Runtime extensionsONNX Runtime extensions, a library for pre and post processing PyTorch models.

@@ -353,7 +353,7 @@ fun run(audioTensor: OnnxTensor): Result { To run this pipeline of models as a .NET application, we build the pipeline code in C#. This code can be run on CPU, GPU, or NPU, if they are available on your machine, using ONNX Runtime's device-specific hardware accelerators. This is configured with the ExecutionProviderTargetExecutionProviderTarget below.

@@ -366,7 +366,7 @@ fun run(audioTensor: OnnxTensor): Result {

You can build the application and run it on Windows with the detailed steps shown in this tutorialtutorial.

@@ -374,7 +374,7 @@ fun run(audioTensor: OnnxTensor): Result {

Running a PyTorch model locally in the browser is not only possible but super simple with - the transformers.js library. Transformers.js uses ONNX Runtime Web as its backend. Many models are already converted to ONNX and served by the tranformers.js CDN, making inference in the browser a matter of writing @@ -407,7 +407,7 @@ fun run(audioTensor: OnnxTensor): Result { All components of the Whisper Tiny model (audio decoder, encoder, decoder, and text sequence generation) can be composed and exported to a single ONNX model using the Olive frameworkOlive framework. To run this model as part of a mobile application, you can use ONNX Runtime Mobile, which supports Android, iOS, react-native, and MAUI/Xamarin.

@@ -420,7 +420,7 @@ fun run(audioTensor: OnnxTensor): Result {

The relevant snippet of a example Android mobile appAndroid mobile app that performs speech transcription on short samples of audio is shown below:

@@ -476,11 +476,11 @@ fun run(audioTensor: OnnxTensor): Result {

You can read the full Speaker Verification tutorialSpeaker Verification tutorial, and build and run the application from sourcebuild and run the application from source.

diff --git a/src/routes/components/footer.svelte b/src/routes/components/footer.svelte index b030524976742..e6b855d0ca129 100644 --- a/src/routes/components/footer.svelte +++ b/src/routes/components/footer.svelte @@ -9,7 +9,7 @@