This repository has been archived by the owner on Oct 25, 2024. It is now read-only.
Intel® Extension for Transformers v1.0.0 Release
- Highlights
- Features
- Productivity
- Examples
- Bug Fixing
- Documentation
Highlights
- Provide the optimal model packages for large language model (LLM) such as GPT-J, GPT-NEOX, T5-large/base, Flan-T5, and Stable Diffusion
- Provide the end-to-end optimized workflows such as SetFit-based sentiment analysis, Document Level Sentiment Analysis (DLSA), and Length Adaptive Transformer for inference
- Support NeuralChat, a custom Chatbot based on domain knowledge fine-tuning and demonstrate less than one hour fine-tuning with PEFT on 4 SPR nodes
- Demonstrate the industry-leading sparse model inference solution in MLPerf v3.0 open submission with up to 1.6x over other submissions
Features
- Model Optimization
- LLM quantization including GPT-J (6B), GPT-NEOX (2.7B), T5-large, T5-base, Flan-T5, BLOOM-176B
- Enable basic Neural Architecture Search (commit 6cae)
- Transformers-accelerated Neural Engine
- Transformers-accelerated Libraries
Productivity
- Support native PyTorch model as input of Neural Engine (commit bc38)
- Refine the Benchmark API to provide apple-to-apple benchmark ability. (commit e135)
- Simplify end-to-end example usage (commit 6b9c)
- N in M/ N x M PyTorch Pruning API enhancement (commit da4d)
- Deliver engine-only wheel with size reduce 60% (commit 02ac)
Examples
- End-to-end solution for Length Adaptive with Neural Engine, achieves over 11x speed up compared with BERT Base on SPR (commit 95c6)
- End-to-end Documentation Level Sentiment Analysis(DLSA) workflow (commit 154a)
- N in M/ N x M BERT Large and BERT Base pruning in PyTorch (commit da4d)
- Sparse pruning example for Longformer with 80% sparsity (commit 5c5a)
- Distillation for quantization for BERT and Stable Diffusion (commit 8856 4457)
- Smooth quantization with BLOOM (commit edc9)
- Longformer quantization with question-answering task (commit 8805)
- Provide SETFIT workflow notebook (commit 6b9c 2851)
- Support Text Generation task (commit c593)
Bug Fixing
- Enhance BERT QAT tuning duration (commit 6b9c)
- Fix Length Adaptive Transformer regression (commit 5473)
- Fix accelerated lib compile error when enabling Vtune (commit b5cd)
Documentation
- Refine contents of all readme files
- API Helper based on GitHub io page (commit e107 )
- devcatalog for Mt. Whitney (commit acb6)
Validated Configurations
- Centos 8.4 & Ubuntu 20.04 & Windows 10
- Python 3.7, 3.8, 3.9, 3.10
- Intel® Extension for TensorFlow 2.10.1, 2.11.0
- PyTorch 1.12.0+cpu, 1.13.0+cpu
- Intel® Extension for PyTorch 1.12.0+cpu,1.13.0+cpu