Skip to content

Commit

Permalink
Update index.html - HQ explanation
Browse files Browse the repository at this point in the history
  • Loading branch information
mobicham authored Nov 14, 2023
1 parent bbd65f8 commit 721e868
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ <h2 id="hqq" class="">Half-Quadratic Quantization</h2>
<p>Basic quantization often results in a loss of model accuracy, especially in Large Language Models (LLMs). This is because the weights in these models can have a wide range of values that can be significantly altered after the quantization process. Weights that deviate notably (known as outliers) pose a particular challenge.
<a href="https://arxiv.org/abs/2210.17323">Group-wise Precision Tuning Quantization (GPTQ)</a> and <a href="https://arxiv.org/abs/2306.00978">Activation-Aware Layer Quantization (AWQ)</a> are algorithms that try to compensate for the outliers by relying on calibration data to minimize the error on layer outputs.
</p>
<p>Unlike these approaches, our method focuses specifically on minimizing errors in the <i>weights</i> rather than the layer activation error. Additionally, by incorporating a sparsity-promoting loss, such as the \( {l_{p<1}} \)-norm, we effectively model outliers through a hyper-Laplacian distribution. This distribution more accurately captures the heavy-tailed nature of outlier errors compared to a dense loss, like the squared error, resulting in a more nuanced representation of error distribution.
<p>Unlike these approaches, our method focuses specifically on minimizing errors in the <i>weights</i> rather than the layer activation error. Additionally, by incorporating a sparsity-promoting loss, such as the \( {l_{p<1}} \)-norm, we effectively model outliers through a hyper-Laplacian distribution. This distribution more accurately captures the heavy-tailed nature of outlier errors compared to the squared error, resulting in a more nuanced representation of error distribution.
</p>
<p>We propose a robust optimization formulation to find the quantization parameters (zero-point \( z \) and scaling \( s \)). More specifically, we use a sparsity-promoting loss function \( \phi() \) such as the \( {l_{p}} \) norm between the original weights \( W \) and their dequantized version:</p>

Expand All @@ -131,7 +131,7 @@ <h2 id="hqq" class="">Half-Quadratic Quantization</h2>
Q_{z,s}^{-1}(W_{q})=s(W_{q}-z)
\end{array}$$

<p>The use of the \( {l_{p<1}} \)-norm makes the problem non-convex. To find a solution, we adopt a <a href="https://ieeexplore.ieee.org/document/120331">Half-Quadratic solver</a> by introducing an extra variable \( W_{e} \). Moreover, to make the problem simpler, we fix the scaling \( s \) parameter and only optimize for the zero-point \( z \). </p>
<p>The use of the \( {l_{p<1}} \)-norm makes the problem non-convex. To find a solution, we adopt a <a href="https://ieeexplore.ieee.org/document/120331">Half-Quadratic solver</a> by introducing an extra variable \( W_{e} \). This additional parameter allows us to split the main problem into sub-problems that are easier to solve. Moreover, to make the problem simpler, we fix the scaling \( s \) parameter and only optimize for the zero-point \( z \). </p>

$$\underset{z,W_{e}}{\text{argmin}}\,\phi(W_{e})+\frac{\beta}{2}||W_{e}-(W-Q_{z}^{-1}(Q_{z}(W))||_{2}^{2}$$

Expand Down

0 comments on commit 721e868

Please sign in to comment.