Update index.html - HQ explanation

mobiusml · Nov 14, 2023 · 721e868 · 721e868
1 parent bbd65f8
commit 721e868
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/index.html b/index.html
@@ -119,7 +119,7 @@ <h2 id="hqq" class="">Half-Quadratic Quantization</h2>
 			<p>Basic quantization often results in a loss of model accuracy, especially in Large Language Models (LLMs). This is because the weights in these models can have a wide range of values that can be significantly altered after the quantization process. Weights that deviate notably (known as outliers) pose a particular challenge.
 			<a href="https://arxiv.org/abs/2210.17323">Group-wise Precision Tuning Quantization (GPTQ)</a> and <a href="https://arxiv.org/abs/2306.00978">Activation-Aware Layer Quantization (AWQ)</a> are algorithms that try to compensate for the outliers by relying on calibration data to minimize the error on layer outputs.  
 			</p>
-			<p>Unlike these approaches, our method focuses specifically on minimizing errors in the <i>weights</i> rather than the layer activation error. Additionally, by incorporating a sparsity-promoting loss, such as the  \( {l_{p<1}} \)-norm, we effectively model outliers through a hyper-Laplacian distribution. This distribution more accurately captures the heavy-tailed nature of outlier errors compared to a dense loss, like the squared error, resulting in a more nuanced representation of error distribution.
+			<p>Unlike these approaches, our method focuses specifically on minimizing errors in the <i>weights</i> rather than the layer activation error. Additionally, by incorporating a sparsity-promoting loss, such as the  \( {l_{p<1}} \)-norm, we effectively model outliers through a hyper-Laplacian distribution. This distribution more accurately captures the heavy-tailed nature of outlier errors compared to the squared error, resulting in a more nuanced representation of error distribution.
 			</p>
 			<p>We propose a robust optimization formulation to find the quantization parameters (zero-point \( z \) and scaling \( s \)). More specifically, we use a sparsity-promoting loss function \( \phi() \) such as the \( {l_{p}} \) norm between the original weights \( W \) and their dequantized version:</p>
 
@@ -131,7 +131,7 @@ <h2 id="hqq" class="">Half-Quadratic Quantization</h2>
                         Q_{z,s}^{-1}(W_{q})=s(W_{q}-z)
                         \end{array}$$
 
-                        <p>The use of the \( {l_{p<1}} \)-norm  makes the problem non-convex. To find a solution, we adopt a <a href="https://ieeexplore.ieee.org/document/120331">Half-Quadratic solver</a> by introducing an extra variable \( W_{e} \). Moreover, to make the problem simpler, we fix the scaling \( s \) parameter and only optimize for the zero-point \( z \). </p>
+                        <p>The use of the \( {l_{p<1}} \)-norm  makes the problem non-convex. To find a solution, we adopt a <a href="https://ieeexplore.ieee.org/document/120331">Half-Quadratic solver</a> by introducing an extra variable \( W_{e} \). This additional parameter allows us to split the main problem into sub-problems that are easier to solve. Moreover, to make the problem simpler, we fix the scaling \( s \) parameter and only optimize for the zero-point \( z \). </p>
 
                         $$\underset{z,W_{e}}{\text{argmin}}\,\phi(W_{e})+\frac{\beta}{2}||W_{e}-(W-Q_{z}^{-1}(Q_{z}(W))||_{2}^{2}$$