Add new INT4 quantization features to model builder #940

kunal-vaishnavi · 2024-09-27T19:27:14Z

Description

This PR adds new INT4 quantization features to the model builder.

The model builder can now quantize the embedding layer and the language modeling head to INT4 precision by default.
For already-quantized PyTorch models that are passed to the model builder, any ops that are still created with MatMul can now be quantized to MatMulNBits via RTN.
A new optional flag in the extra options called int4_op_type_to_quantize has been added to allow more flexibility with INT4 quantization.

Motivation and Context

With these PR changes, the size of the ONNX models can be reduced by quantizing the embedding layer and/or the language modeling head.

For the ONNX models built from already-quantized PyTorch models, one example is with using AutoAWQ. AutoAWQ does not quantize the language modeling head. The resulting ONNX model typically contains a MatMul op for the language modeling head. Now, that MatMul op will be quantized via RTN to MatMulNBits to reduce memory.

yufenglee · 2024-09-30T21:53:44Z

src/python/py/models/builder.py

                "accuracy_level": int(extra_options["int4_accuracy_level"]) if "int4_accuracy_level" in extra_options else 0,   # Default is 0 for non-QDQ formats, default is 4 for QDQ formats
+                "block_size": int(extra_options["int4_block_size"]) if "int4_block_size" in extra_options else 32,
+                "op_types_to_quantize": (extra_options["op_type_to_quantize"], ) if "int4_op_type_to_quantize" in extra_options else ("MatMul", "Gather", ),


Could we make the default "MatMul"? The reason is that the quantization of Gather is optional and not all the EPs support it.

kunal-vaishnavi added 2 commits September 27, 2024 11:03

Quantize LM head to int4 for already-quantized models

c1d44e1

Quantize embeddings and LM head to int4 by default

0b3d492

kunal-vaishnavi requested a review from yufenglee September 27, 2024 19:27

Remove extra parenthesis

01164db

yufenglee reviewed Sep 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new INT4 quantization features to model builder #940

Add new INT4 quantization features to model builder #940

kunal-vaishnavi commented Sep 27, 2024

yufenglee Sep 30, 2024 •

edited

Loading

Add new INT4 quantization features to model builder #940

Are you sure you want to change the base?

Add new INT4 quantization features to model builder #940

Conversation

kunal-vaishnavi commented Sep 27, 2024

Description

Motivation and Context

yufenglee Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

yufenglee Sep 30, 2024 •

edited

Loading