clarify theory sections

unitaryfund · Oct 11, 2024 · b60e112 · b60e112
1 parent 6d1af9c
commit b60e112
Show file tree

Hide file tree

Showing 2 changed files with 37 additions and 18 deletions.
diff --git a/docs/source/guide/lre-5-theory.md b/docs/source/guide/lre-5-theory.md
@@ -18,43 +18,50 @@ The user guide for LRE in Mitiq is currently under construction.
 
 # What is the theory behind LRE?
 
-Layerwise Richardson Extrapolation (LRE), an error mitigation technique, introduced in
-{cite}`Russo_2024_LRE` extends the ideas found in ZNE by allowing users to create multiple noise-scaled variations of the input
-circuit such that the noiseless expectation value is extrapolated from the execution of each
-noisy circuit.
-
-Similar to [ZNE](zne.md), this process works in two steps:
+Similar to [ZNE](zne.md), LRE works in two steps:
 
 - **Step 1:** Intentionally create multiple noise-scaled but logically equivalent circuits by scaling each layer or chunk of the input circuit through unitary folding.
 
 - **Step 2:** Extrapolate to the noiseless limit using multivariate richardson extrapolation.
 
-LRE leverages the flexible configuration space of layerwise unitary folding,
+The noise-scaled circuits in ZNE are scaled by the user choosing which layers of the input circuit to fold whereas in LRE
+each noise scaled circuit scales the layers in the input circuit in a specific pattern. LRE leverages the flexible configuration space of layerwise unitary folding,
 allowing for a more nuanced mitigation of errors by treating the noise level of each layer of
 the quantum circuit as an independent variable.
 
 ## Step 1: Create noise-scaled circuits
 
 The goal is to create noise-scaled circuits of different depths where the layers in each circuit are scaled in
-a specific pattern as a result of unitary folding. This pattern is often described by the vector of scale factor vectors
-generated by the fold multiplier and the chosen degree for multivariate Richardson extrapolation polynomial. For more information
-on unitary folding, go to [What is the theory behind ZNE?](zne-5-theory.md).
+a specific pattern as a result of [unitary folding](zne-5-theory.md). This pattern is often described by the vector of scale factor vectors generated by the fold multiplier and the chosen degree for multivariate Richardson extrapolation polynomial.
 
 Suppose we're interested in the value of some observable in an $n$-qubit circuit with $l$ layers.
 
-Each layer can have a different scale factor and we can create $M$ such variations of the scaled circuit. Let $\{λ_1, λ_2, λ_3, \ldots, λ_M\}$ be the scale factors vectors used to create multiple variations of the noise-scaled circuits $\{C_{λ_1}, C_{λ_2}, C_{λ_3}, \ldots, C_{λ_M}\}$ such that each vector $λ_i$ defines the scale factors for the different layers in the input circuit $\{{λ^1}_i, {λ^2}_i, {λ^3}_i, \ldots, {λ^l}_i\}^T$.
+Each layer can have a different scale factor and we can create $M$ such variations of the scaled circuit. Let
+$\Lambda = (λ_1, λ_2, λ_3, \ldots, λ_M)^T$ be the scale factors vectors used to create multiple variations of the
+noise-scaled circuits $(C_{λ_1}, C_{λ_2}, C_{λ_3}, \ldots, C_{λ_M})^T$ such that each vector $λ_i$ defines the scale
+factors for the different layers in the input circuit $({λ^1}_i, {λ^2}_i, {λ^3}_i, \ldots, {λ^l}_i)^T$.
+
+If $d$ is the chosen degree of our multivariate polynomial, we define $M_j(λ_i, d)$ to be the terms in the polynomial
+arranged in increasing order. In general, the number of monomial terms with $l$ variables up to degree $d$ can be determined
+through the [stars and bars method](https://en.wikipedia.org/wiki/Stars_and_bars_%28combinatorics%29).
 
-If $d$ is the chosen degree of our multivariate polynomial, we define $M_j(λ_i, d)$ to be the terms in the polynomial arranged in increasing order. In general, the number of monomial terms with $l$ variables up to degree $d$ can be determined through the [stars and bars method](https://en.wikipedia.org/wiki/Stars_and_bars_%28combinatorics%29).
+For example, if the input circuit has 2 layers in the input circuit and our choice of degree of the extrapolating polynomial is 2, the basis of monomial terms contains 6 terms in total due to $\binom{2 + 2}{2}$ using the formula below where $d=2, l=2$. We skip the detailed discussion about the stars and bars method but the monomial basis for our example is $\{1, λ_1, λ_2, {λ_1}^2, λ_1 . λ_2, {λ_2}^2 \}$.
 
 $$
 \text{total number of terms in the monomial basis with max degree } d = \binom{d + l}{d}
 $$
 
+As the choice for the degree of the extrapolating polynomial is 2, we search for the number of terms with total degree 2 using the following formula:
+
 $$
 \text{number of terms in the monomial basis with total degree } d = \binom{d + l - 1}{d}
 $$
 
-These monomial terms define the rows of the square sample matrix as shown below:
+Terms with total degree 2 are 3 calculated by $\binom{2 + 2 -1}{2} = 3$: $\{{λ_1}^2, λ_1 . λ_2, {λ_2}^2 \}$
+
+Similarly, number of terms with total degree 1 and 0 can be calculated as $\binom{1 + 2 -1}{1} = 2:\{λ_1, λ_2\}$ and $\binom{0 + 2 -1}{0}= 1: \{1\}$ respectively. 
+
+These terms in the monomial basis define the rows of the square sample matrix as shown below:
 
 $$
 \mathbf{A}(\Lambda, d) = 
@@ -66,9 +73,9 @@ $$
 \end{bmatrix}
 $$
 
-Each monomial term in the sample matrix $\mathbf{A}$ is evaluated using the values in the scale factor vectors. In Step 2, we aim to define $O_{\mathrm{LRE}}$ as a linear combination of the noisy expectation values.
+For our example circuit of $l=2$ and $d=2$, each row defined by the generic monomial terms $M_i(λ_i, d)$ in the sample matrix $\mathbf{A}$ will instead be replaced by $\{1, λ_1, λ_2, {λ_1}^2, λ_1 . λ_2, {λ_2}^2 \}$.
 
-Finding the coefficients in the linear combination becomes a problem solvable through a system of linear equations $\mathbf{A} c = z$ where $c$ is the coefficients vector $(\eta_1, \eta_2, \ldots, \eta_N)^T$, $z$ is the vector of the noisy expectation values and $\mathbf{A}$ is the sample matrix evaluated using the values in the scale factor vectors.
+Here, each monomial term in the sample matrix $\mathbf{A}$ is then evaluated using the values in the scale factor vectors. In Step 2, this sample matrix will be utilized to obtain our mitigated expectation value.
 
 ## Step 2: Extrapolate to the noiseless limit
 
@@ -79,10 +86,17 @@ $$
 O_{\mathrm{LRE}} = \sum_{i=1}^{M} \eta_i \langle O(λ_i) \rangle.
 $$
 
-The system of linear equations is used to find the numerous $\eta_i$ in vector $c$. As we only need to find the noiseless expectation value, we can skip calculating the full vector of linear combination coefficients if we use the [Lagrange interpolation formula](https://files.eric.ed.gov/fulltext/EJ1231189.pdf) evaluated at $λ = 0$.
+Finding the coefficients in the linear combination becomes a problem solvable through a system of linear equations $\mathbf{A} c = z$ where $c$ is the coefficients vector $(\eta_1, \eta_2, \ldots, \eta_N)^T$, $z$ is the vector of the noisy expectation values and $\mathbf{A}$ is the sample matrix evaluated using the values in the scale factor vectors.
+
+The [general multivariate Lagrange interpolation formula](https://www.siam.org/media/wkvnvame/a_simple_expression_for_multivariate.pdf) is defined by replacing the $i$-th row of the sample matrix $\mathbf{A}$ with monomial terms evaluated using the generic variable λ. This matrix $\mathbf{B}_i$ defines an interpolating polynomial in variable λ of degree $d$. As we only need to find the noiseless expectation value, we can skip calculating the full vector of linear combination coefficients if we use the [Lagrange interpolation formula](https://files.eric.ed.gov/fulltext/EJ1231189.pdf) evaluated at $λ = 0$ i.e. the zero-noise limit.
+
+To get the matrix $\mathbf{B}_i(\mathbf{0})$, replace the $i$-th row of the sample matrix $\mathbf{A}$ by $\mathbf{e}_i=(1, 0, \ldots, 0)$ where except $M_1(0, d) = 1$ all the other monomial terms are zero when $λ=0$.
 
 $$
-O_{\rm LRE} = \sum_{i=1}^M \langle O (\boldsymbol{\lambda}_i)\rangle  \frac{\det \left(\mathbf{M}_i (\boldsymbol{0}) \right)}{\det \left(\mathbf{A}\right)}.
+O_{\rm LRE} = \sum_{i=1}^M \langle O (\boldsymbol{\lambda}_i)\rangle  \frac{\det \left(\mathbf{B}_i (\boldsymbol{0}) \right)}{\det \left(\mathbf{A}\right)}
 $$
 
-To get the matrix $\mathbf{M}_i(\mathbf{0})$, replace the $i$-th row of the sample matrix $\mathbf{A}$ by $\mathbf{e}_1=(1, 0, \ldots, 0)^T$ where except $M_1(0, d) = 1$ all the other monomial terms are zero.
+To summarize, based on a user's choice of degree of extrapolating polynomial for some circuit, expectation values from noise scaled circuits created in a specific pattern along with multivariate Lagrange interpolation of the sample matrix evaluated using the scale factor vectors are used to find error mitigated expectation value. 
+
+Additional details on the LRE functionality are available in the [API-doc](https://mitiq.readthedocs.io/en/stable/apidoc.html#module-mitiq.lre.multivariate_scaling.layerwise_folding).
+
diff --git a/docs/source/guide/lre.md b/docs/source/guide/lre.md
@@ -5,6 +5,11 @@ The user guide for LRE in Mitiq is currently under construction.
 
 # Layerwise Richardson Extrapolation
 
+Layerwise Richardson Extrapolation (LRE), an error mitigation technique, introduced in
+{cite}`Russo_2024_LRE` extends the ideas found in ZNE by allowing users to create multiple noise-scaled variations of the input
+circuit such that the noiseless expectation value is extrapolated from the execution of each
+noisy circuit.
+
 Layerwise Richardson Extrapolation (LRE), an error mitigation technique, introduced in
 {cite}`Russo_2024_LRE` works by creating multiple noise-scaled variations of the input
 circuit such that the noiseless expectation value is extrapolated from the execution of each