Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify the TimeIndependentMDCObjectiveFunction class #515

Merged
merged 18 commits into from
Jan 20, 2025

Conversation

rileyjmurray
Copy link
Contributor

@rileyjmurray rileyjmurray commented Dec 16, 2024

TimeIndependentMDCObjectiveFunction is one of the most complicated classes I've encountered in pyGSTi. Right now ...

  • It's 1000 lines long.
  • It has 12 private instance methods.
  • It has hard-to-track dependencies between functions for computing "lsvec-related" quantities and "terms-related" quantities.

This PR simplifies TimeIndependentMDCObjectiveFunction so that

  • It's 577 lines long. (Granted, it would be more like high-600s if I kept repetitive docstrings in.)
  • It has 4 private instance methods.
  • All "lsvec-related" quantities are computed by applying simple transformations to "terms-related" quantities.

These changes should materially improve maintainability of TimeIndependentMDCObjectiveFunction, as well as its 8 subclasses that collectively span ~500 lines. The outward behavior of this class should be the same as before up to floating point rounding errors.

For the curious, I went down this rabbit hole when working on PR #506.

Incidental changes related to MPI testing: moved some lightweight MPI tests from test/test_packages to test/unit. This move was prompted by this PR because this PR touches code that involves a lot of MPI. While I was at it I renamed an MPI performance profiling file so that it didn't get misinterpreted as a test file.

@rileyjmurray rileyjmurray changed the title Simplify the TimeIndependentMDCObjectiveFunction class WIP: simplify the TimeIndependentMDCObjectiveFunction class Dec 16, 2024
@rileyjmurray rileyjmurray marked this pull request as ready for review December 17, 2024 11:48
@rileyjmurray rileyjmurray requested a review from a team as a code owner December 17, 2024 11:48
@rileyjmurray rileyjmurray requested a review from sserita December 17, 2024 11:48
@rileyjmurray rileyjmurray changed the title WIP: simplify the TimeIndependentMDCObjectiveFunction class Simplify the TimeIndependentMDCObjectiveFunction class Dec 17, 2024
@rileyjmurray
Copy link
Contributor Author

rileyjmurray commented Dec 18, 2024

I encourage anyone who's reading this comment to skip to the Conclusion heading.

Investigating possible changes in numerical behavior

During the 12/17 dev meeting Erik raised the question of how numerics might be different by focusing on "terms" and computing "lsvec" as a function of terms.

changes to TimeIndependentMDCObjectiveFunction.dterms

The screenshot-of-screenshots below shows the new code and old code.
Code that's hidden in folded regions (or otherwise off-screen) is changed only in incidental ways.
From what is shown, the operations are equivalent even in finite-precision arithmetic.
Nothing to worry about here!
image

changes to TimeIndependentMDCObjectiveFunction.dlsvec

This one is trickier to explain. Let's start with new code and old code.
image
In order to assess differences we need to dig into the call that the old code makes to self.raw_objfn.dlsvec_and_lsvec, and specifically into the nature of the first of this function's two returned arrays.
There are three definitions of this function in objectivefns.py.
image
Clearly, only the first of these definitions is nontrivial.
To dig further we need to investigate raw_objfn.dlsvec functions. There is a default implementation of this function and an implementation in RawChi2Function.

Let's dig into the default implementation first.
image
I'll call attention to line 635. The definition of pt5_over_lsvec looks odd, but it's actually equivalent to the definition of p5over_lsvec at lines 4605-4606 in the new code objectivefns.py.
Putting things all together, the old code computes dlsvec by evaluating

dprobs * (raw_dterms * pt5_over_raw_lsvec)[:, None]

while the new code computes dlsvec as

(dprobs * raw_dterms[:, None]) * pt5_over_raw_lsvec[:, None]

The only differences that could arise here are from non-associativity of floating point multiplication. Such differences can certainly happen. However, there is no fundamental reason to prefer the current approach over the new approach. Overall conclusion: no cause for worry when we end up calling RawObjectiveFunction.dlsvec.

So ... what about RawChi2Function.dlsvec? The implementation there is nontrivial, unfortunately. It relies on two private helper functions: _weights and _dweights:
image
The new code I introduce in this PR does end up calling those functions. But they get passed through the default implementation of RawObjectiveFunction.terms and RawObjectiveFunctions.dterms, which look like this
image
So inverting these transformations (particularly, dividing by lsvec) could be numerically messy when components of lsvec are very small. But as long as we're not passing transformed quantities through n-ary reductions (e.g., a sum over many terms) the numerical differences should be modest.

changes to TimeIndependentMDCObjectiveFunction.terms

There are no numerical changes here.

changes to TimeIndependentMDCObjectiveFunction.lsvec

Here's the new and old code.
image
The old implementation basically just calls rawobj_fn.lsvec. The new implementation relies on TimeIndependentMDCObjective.terms, which ends up basically just calling raw_objfn.terms. So we need to need to see how raw_objfn.lsvec and raw_objfn.terms can compare. Another property of the new implementation is the addition of an awkward extra argument called "raw_objfn_lsvec_signs," which defaults to True. It's pretty obvious that this extra argument makes sure that signs of the returned lsvec in the new implementation can match those of the old implementation. I'll speak to why this matters in a bit. For now, let's look at implementations of lsvec and terms.

The default implementations of lsvec and terms RawObjectiveFunction have a circular dependency (see below) so it's necessary for subclasses to override one of these functions.
image
Two subclasses of RawObjectiveFunction end up overriding RawObjectiveFunction.lsvec. The implementation in RawChi2Function is
image
The main takeaway there is that its lsvec can contain negative components. Meanwhile, the RawPoissonPicDeltaLogLFunction class uses
image

Based on this, the relationship between the new and old outputs of TimeIndependentMDCObjectiveFunction.lsvec when using these raw objective functions lsvec_new = ( lsvec_old ** 2) ** 0.5 when raw_objfn_lsvec_signs==False and lsvec_new = np.sign(lsvec_old) * ( lsvec_old ** 2) ** 0.5 when raw_objfn_lsvec_signs==True. From a numerical perspective the presence of a square and then a square root should have a pretty minor effect. As with my analysis of changes to behavior of dlsvec, what matters is that we're not changing any n-ary operations, only binary and unary operations.

Now let's return to why there's the stupid raw_objfn_lsvec_signs argument in the new implementation. From a mathematical perspective we should be able to make lsvec nonnegative without loss of generality. Indeed, if think about the (nonlinear) least squares problem in which the objective appears, it's clear that any component of lsvec which is negative can be made positive if you multiply the corresponding row of dlsvec by -1.

Here is the problem: there are unit tests that are written in a way which does not respect the freedom of sign choice for lsvec. I tried to adapt the tests to acknowledge the sign freedom a few weeks ago, but I got nowhere. It was much simpler to bite the bullet in the new lsvec implementation and have it get signs from raw_objfn.lsvec by default. This has two downsides, both of which are minor.

  • Downside 1. It makes TimeIndependentMDCObjective.lsvec twice as expensive by default. This should be fine because evaluating the lsvec is nowhere near as expensive as evaluating dlsvec.
  • Downside 2. It means that subclasses TimeIndependentMDCObjective call the lsvec method of their raw_objfn by default. This is undesirable because a major reason for this PR effort was to make it so a new RawObjectiveFunction can implement "terms" and "dterms" and then be done with things. Yet, this is a minor issue since any such TimeIndependentMDCStore subclass only needs to implement its lsvec method as
    def lsvec(self, paramvec=None, oob_check=False):
        return TimeIndependentMDCObjective.lsvec(self, paramvec, oob_check, False)

Conclusion

These changes should not be problematic from a numerical perspective. I've taken extra steps to ensure minimal outward behavior change (i.e., keeping the possibility that TimeIndependentMDCStore.lsvec returns a vector with negative components). On the off chance that we do see an uptick in numerical issues with GST model fitting, we have this detailed investigation to refer back to.

@sserita, @coreyostrove, please put me out of my misery.

@rileyjmurray rileyjmurray force-pushed the simplify-TimeIndependentMDCObjectiveFunction branch from 18a29b7 to 1c23251 Compare January 16, 2025 20:19
@rileyjmurray rileyjmurray force-pushed the simplify-TimeIndependentMDCObjectiveFunction branch from 1c23251 to c33542e Compare January 16, 2025 20:21
Copy link
Contributor

@sserita sserita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the huge effort @rileyjmurray, and the very detailed post explaining the differences. That helped a lot.
I tried to have maximal mercy but I couldn't stop myself from having two minor comment-related requests - add those in and it'll be a quick approve from me.

pygsti/objectivefns/objectivefns.py Show resolved Hide resolved
test/unit/mpi/test_mpi.py Show resolved Hide resolved
@@ -237,7 +237,7 @@ def test_stdgst_prunedpath(self):
target_model.sim = pygsti.forwardsims.TermForwardSimulator(mode='pruned', max_order=3, desired_perr=0.01,
allowed_perr=0.1, max_paths_per_outcome=1000,
perr_heuristic='meanscaled', max_term_stages=5)
target_model.from_vector(1e-10 * np.ones(target_model.num_params)) # to seed term calc (starting with perfect zeros causes trouble)
target_model.from_vector(1e-9 * np.ones(target_model.num_params)) # to seed term calc (starting with perfect zeros causes trouble)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor Author

@rileyjmurray rileyjmurray Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhhh .... if I didn't make it, then the test failed nastily. The optimizer just didn't converge.

I can compare side by side and see what the threshold is for the old code vs new code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sserita, I'm very glad you asked this question. It turns out that the non-convergence I observed stemmed from a bug in the LM optimizer. Specifically, it stemmed from how the LM optimizer decided to update the "best_x."

The implementations on develop (i.e., both the default SimplerLMOptimizer and the backup CustomLMOptimizer) would only update best_x if norm_f < min_norm_f and a boolean flag called new_x_is_known_in_bounds was True. But new_x_is_known_in_bounds is really more of a sufficient condition for being in bounds than a necessary condition. It makes more sense to do this:

If norm_f < min_norm_f, then check if new_x_is_known_in_bounds == True. If it is, then we take this flag's word for it and update best_x[:] = x. Otherwise, we explicitly check if x is in bounds and we update best_x[:] = x if and only if that's the case.

If I make that change then the test passes without making the change that prompted your comment.

See below for my investigation, written to a future pyGSTi developer who might come across this.

Investigation

Stefan's comment concerned a change I made in test/test_packages/drivers/test_calcmethods1Q.py::test_stdgst_prunedpath. The change was to replace a scalar that was hard-coded to 1e-10 with a scalar hard-coded to 1e-9. Let's call that scalar perturb. So the old value is perturb=1e-10.

I ran this test on develop with different values of perturb; those runs passed when perturb >= 5e-13 and failed when perturb <= 2.5e-13. Then I ran this test on the current branch using the commit at the time that Stefan left his comment. Those runs passed when perturb >= 2e-10 and failed when perturb <= 1.125.

I started to write a report of these observations when I noticed something funny. In the test log it was clear that the LM optimizer was terminating outer iterations with a much larger value for norm_f than the minimum value seen across the outer iterations (specific logs are shown below). This made me suspect that best_x wasn't being set properly. Sure enough, I found this issue where a sufficient condition for an iterate being in-bounds (new_x_is_known_in_bounds) was effectively used as a necessary condition. After updating the termination criteria I was able to get this problematic unit test to pass when perturb==1e-10. In fact, it passed with perturb==1e-12. I didn't bother checking smaller values.

develop, passing with 5e-13

  Last iteration:
  --- dlogl GST ---
  Initial Term-stage model has 0 failures and uses 10.0% of allowed paths (ok=True).
    --- Outer Iter 0: norm_f = 166109, mu=1, |x|=2.44949e-12, |J|=2835.48
    --- Outer Iter 1: norm_f = 166104, mu=979.996, |x|=0.00324549, |J|=2835.51
    --- Outer Iter 2: norm_f = 608.149, mu=326.665, |x|=0.255707, |J|=3040.87
    --- Outer Iter 3: norm_f = 95.8116, mu=168.158, |x|=0.276211, |J|=3017.42
    --- Outer Iter 4: norm_f = 31.8847, mu=56.0528, |x|=0.268233, |J|=3017.41
    --- Outer Iter 5: norm_f = 31.864, mu=18.6843, |x|=0.268133, |J|=3017.33
    --- Outer Iter 6: norm_f = 166104, mu=979.996, |x|=0.00324549, |J|=3017.33
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
    Least squares message = Objective function out-of-bounds! STOP
  2*Delta(log(L)) = 332207 (66 data params - 24 (approx) model params = expected mean of 42; p-value = 0)
Pruned path-integral: kept 13212 paths (10.0%) w/magnitude 135.6 (target=134.4, #circuits=66, #failed=0)
  (avg per circuit paths=200, magnitude=2.055, target=2.036)
  After adapting paths, num failures = 0, 10.0% of allowed paths used.
    --- Outer Iter 0: norm_f = 166104, mu=979.996, |x|=0.00324549, |J|=2835.51
    --- Outer Iter 1: norm_f = 608.149, mu=326.665, |x|=0.255707, |J|=3040.87
    --- Outer Iter 2: norm_f = 95.8116, mu=168.158, |x|=0.276211, |J|=3017.42
    --- Outer Iter 3: norm_f = 31.8847, mu=56.0528, |x|=0.268233, |J|=3017.41
    --- Outer Iter 4: norm_f = 31.864, mu=18.6843, |x|=0.268133, |J|=3017.33
    --- Outer Iter 5: norm_f = 608.149, mu=326.665, |x|=0.255707, |J|=3017.33
    --- Outer Iter 6: norm_f = 607.403, mu=1.52598e+07, |x|=0.255282, |J|=3040.98
    --- Outer Iter 7: norm_f = 591.59, mu=5.08659e+06, |x|=0.255303, |J|=3040.85
    --- Outer Iter 8: norm_f = 548.357, mu=1.69553e+06, |x|=0.255395, |J|=3040.46
    --- Outer Iter 9: norm_f = 448.118, mu=565177, |x|=0.255827, |J|=3039.43
    --- Outer Iter 10: norm_f = 289.143, mu=188392, |x|=0.257372, |J|=3037.08
    --- Outer Iter 11: norm_f = 150.619, mu=62797.4, |x|=0.260537, |J|=3032.8
    --- Outer Iter 12: norm_f = 68.2405, mu=20932.5, |x|=0.26454, |J|=3026.4
    --- Outer Iter 13: norm_f = 36.1194, mu=6977.49, |x|=0.267382, |J|=3020.41
    --- Outer Iter 14: norm_f = 31.9729, mu=2325.83, |x|=0.268035, |J|=3017.84
    --- Outer Iter 15: norm_f = 31.8644, mu=775.277, |x|=0.268122, |J|=3017.37
    --- Outer Iter 16: norm_f = 31.864, mu=258.426, |x|=0.268135, |J|=3017.33
    Least squares message = Both actual and predicted relative reductions in the sum of squares are at most 1e-06
  2*Delta(log(L)) = 63.7279 (66 data params - 24 (approx) model params = expected mean of 42; p-value = 0.0168613)
  Term-states Converged!
  Completed in 0.7s
  Final optimization took 0.7s

develop, passing with 2e-10

  Last iteration:
  --- dlogl GST ---
  Initial Term-stage model has 0 failures and uses 10.0% of allowed paths (ok=True).
    --- Outer Iter 0: norm_f = 166109, mu=1, |x|=9.79796e-10, |J|=2835.48
    --- Outer Iter 1: norm_f = 166090, mu=979.996, |x|=0.00324842, |J|=2851.83
    --- Outer Iter 2: norm_f = 26560.4, mu=334505, |x|=0.094016, |J|=3265.36
    --- Outer Iter 3: norm_f = 3024.04, mu=117205, |x|=0.198562, |J|=3065.56
    --- Outer Iter 4: norm_f = 223.51, mu=39068.3, |x|=0.250841, |J|=3037.55
    --- Outer Iter 5: norm_f = 61.1352, mu=13022.8, |x|=0.264528, |J|=3026.48
    --- Outer Iter 6: norm_f = 34.14, mu=4340.92, |x|=0.267594, |J|=3019.84
    --- Outer Iter 7: norm_f = 31.9082, mu=1446.97, |x|=0.268025, |J|=3017.7
    --- Outer Iter 8: norm_f = 31.8645, mu=482.324, |x|=0.26812, |J|=3017.36
    --- Outer Iter 9: norm_f = 31.864, mu=160.775, |x|=0.268135, |J|=3017.33
    --- Outer Iter 10: norm_f = 166090, mu=979.996, |x|=0.00324842, |J|=3017.33
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
    Least squares message = Objective function out-of-bounds! STOP
  2*Delta(log(L)) = 332181 (66 data params - 24 (approx) model params = expected mean of 42; p-value = 0)
Pruned path-integral: kept 13212 paths (10.0%) w/magnitude 135.6 (target=134.4, #circuits=66, #failed=0)
  (avg per circuit paths=200, magnitude=2.055, target=2.036)
  After adapting paths, num failures = 0, 10.0% of allowed paths used.
    --- Outer Iter 0: norm_f = 166090, mu=979.996, |x|=0.00324842, |J|=2851.83
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
    --- Outer Iter 1: norm_f = 26560.4, mu=334505, |x|=0.094016, |J|=3265.36
    --- Outer Iter 2: norm_f = 3024.04, mu=117205, |x|=0.198562, |J|=3065.56
    --- Outer Iter 3: norm_f = 223.51, mu=39068.3, |x|=0.250841, |J|=3037.55
    --- Outer Iter 4: norm_f = 61.1352, mu=13022.8, |x|=0.264528, |J|=3026.48
    --- Outer Iter 5: norm_f = 34.14, mu=4340.92, |x|=0.267594, |J|=3019.84
    --- Outer Iter 6: norm_f = 31.9082, mu=1446.97, |x|=0.268025, |J|=3017.7
    --- Outer Iter 7: norm_f = 31.8645, mu=482.324, |x|=0.26812, |J|=3017.36
    --- Outer Iter 8: norm_f = 31.864, mu=160.775, |x|=0.268135, |J|=3017.33
    --- Outer Iter 9: norm_f = 26560.4, mu=334505, |x|=0.094016, |J|=3017.33
    --- Outer Iter 10: norm_f = 24207.2, mu=111502, |x|=0.0993976, |J|=3245.77
    --- Outer Iter 11: norm_f = 1743.06, mu=37167.3, |x|=0.216206, |J|=3057.74
    --- Outer Iter 12: norm_f = 177.863, mu=12389.1, |x|=0.26331, |J|=3032.07
    --- Outer Iter 13: norm_f = 47.4228, mu=4129.7, |x|=0.269905, |J|=3020.44
    --- Outer Iter 14: norm_f = 31.995, mu=1376.57, |x|=0.268271, |J|=3017.63
    --- Outer Iter 15: norm_f = 31.8641, mu=458.855, |x|=0.268137, |J|=3017.34
    --- Outer Iter 16: norm_f = 31.864, mu=137.657, |x|=0.268136, |J|=3017.33
    Least squares message = Both actual and predicted relative reductions in the sum of squares are at most 1e-06
  2*Delta(log(L)) = 63.7279 (66 data params - 24 (approx) model params = expected mean of 42; p-value = 0.0168613)
  Term-states Converged!
  Completed in 0.6s
  Final optimization took 0.6s

This PR, passing with 2e-10

  Last iteration:
  --- dlogl GST ---
  Initial Term-stage model has 0 failures and uses 10.0% of allowed paths (ok=True).
    --- Outer Iter 0: norm_f = 166109, mu=1, |x|=9.79796e-10, |J|=2835.48
    --- Outer Iter 1: norm_f = 166090, mu=979.996, |x|=0.00324842, |J|=2851.83
    --- Outer Iter 2: norm_f = 26560.4, mu=334505, |x|=0.094016, |J|=3265.36
    --- Outer Iter 3: norm_f = 3024.04, mu=117205, |x|=0.198562, |J|=3065.56
    --- Outer Iter 4: norm_f = 223.51, mu=39068.3, |x|=0.250841, |J|=3037.55
    --- Outer Iter 5: norm_f = 61.1352, mu=13022.8, |x|=0.264528, |J|=3026.48
    --- Outer Iter 6: norm_f = 34.14, mu=4340.92, |x|=0.267594, |J|=3019.84
    --- Outer Iter 7: norm_f = 31.9082, mu=1446.97, |x|=0.268025, |J|=3017.7
    --- Outer Iter 8: norm_f = 31.8645, mu=482.324, |x|=0.26812, |J|=3017.36
    --- Outer Iter 9: norm_f = 31.864, mu=160.775, |x|=0.268135, |J|=3017.33
    --- Outer Iter 10: norm_f = 166090, mu=979.996, |x|=0.00324842, |J|=3017.33
    --- Outer Iter 11: norm_f = 166090, mu=293.999, |x|=0.0032486, |J|=2851.52
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
    Least squares message = Objective function out-of-bounds! STOP
  2*Delta(log(L)) = 332180 (66 data params - 24 (approx) model params = expected mean of 42; p-value = 0)
Pruned path-integral: kept 13212 paths (10.0%) w/magnitude 135.6 (target=134.4, #circuits=66, #failed=0)
  (avg per circuit paths=200, magnitude=2.055, target=2.036)
  After adapting paths, num failures = 0, 10.0% of allowed paths used.
    --- Outer Iter 0: norm_f = 166090, mu=293.999, |x|=0.0032486, |J|=2851.52
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
    --- Outer Iter 1: norm_f = 696.596, mu=100352, |x|=0.273829, |J|=3033.45
    --- Outer Iter 2: norm_f = 145.292, mu=33450.5, |x|=0.26568, |J|=3028.08
    --- Outer Iter 3: norm_f = 50.8609, mu=11150.2, |x|=0.267615, |J|=3021.55
    --- Outer Iter 4: norm_f = 33.0403, mu=3716.73, |x|=0.268405, |J|=3017.96
    --- Outer Iter 5: norm_f = 31.8829, mu=1238.91, |x|=0.268202, |J|=3017.31
    --- Outer Iter 6: norm_f = 31.8642, mu=412.97, |x|=0.268145, |J|=3017.31
    --- Outer Iter 7: norm_f = 31.864, mu=123.891, |x|=0.268137, |J|=3017.33
    --- Outer Iter 8: norm_f = 696.596, mu=100352, |x|=0.273829, |J|=3017.33
    --- Outer Iter 9: norm_f = 696.594, mu=30105.5, |x|=0.273829, |J|=3033.45
    --- Outer Iter 10: norm_f = 80.7071, mu=10035.2, |x|=0.269167, |J|=3023.24
    --- Outer Iter 11: norm_f = 34.0435, mu=3345.05, |x|=0.268725, |J|=3018.22
    --- Outer Iter 12: norm_f = 31.886, mu=1115.02, |x|=0.268213, |J|=3017.33
    --- Outer Iter 13: norm_f = 31.8642, mu=371.673, |x|=0.268145, |J|=3017.32
    --- Outer Iter 14: norm_f = 31.864, mu=111.502, |x|=0.268137, |J|=3017.33
    Least squares message = Both actual and predicted relative reductions in the sum of squares are at most 1e-06
  2*Delta(log(L)) = 63.7279 (66 data params - 24 (approx) model params = expected mean of 42; p-value = 0.0168613)
  Term-states Converged!
  Completed in 0.6s
  Final optimization took 0.6s

This PR, failing with 1e-10

  Last iteration:
  --- dlogl GST ---
  Initial Term-stage model has 0 failures and uses 10.0% of allowed paths (ok=True).
    --- Outer Iter 0: norm_f = 166109, mu=1, |x|=4.89898e-10, |J|=2835.48
    --- Outer Iter 1: norm_f = 166100, mu=979.996, |x|=0.00324622, |J|=2839.6
    --- Outer Iter 2: norm_f = 52952, mu=334505, |x|=0.0493353, |J|=3716.22
    --- Outer Iter 3: norm_f = 8281.27, mu=203584, |x|=0.159582, |J|=3101.88
    --- Outer Iter 4: norm_f = 631.08, mu=67861.3, |x|=0.234876, |J|=3044.93
    --- Outer Iter 5: norm_f = 97.1942, mu=22620.4, |x|=0.260353, |J|=3031.15
    --- Outer Iter 6: norm_f = 40.7621, mu=7540.14, |x|=0.266602, |J|=3022.47
    --- Outer Iter 7: norm_f = 32.2466, mu=2513.38, |x|=0.26786, |J|=3018.4
    --- Outer Iter 8: norm_f = 31.8699, mu=837.794, |x|=0.268084, |J|=3017.45
    --- Outer Iter 9: norm_f = 31.864, mu=279.265, |x|=0.26813, |J|=3017.34
    --- Outer Iter 10: norm_f = 31.864, mu=83.7794, |x|=0.268136, |J|=3017.33
    --- Outer Iter 11: norm_f = 166100, mu=979.996, |x|=0.00324622, |J|=3017.33
    --- Outer Iter 12: norm_f = 166100, mu=293.999, |x|=0.00324625, |J|=2839.55
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
    Least squares message = Objective function out-of-bounds! STOP
  2*Delta(log(L)) = 332201 (66 data params - 24 (approx) model params = expected mean of 42; p-value = 0)
Pruned path-integral: kept 13212 paths (10.0%) w/magnitude 135.6 (target=134.4, #circuits=66, #failed=0)
  (avg per circuit paths=200, magnitude=2.055, target=2.036)
  After adapting paths, num failures = 0, 10.0% of allowed paths used.
    --- Outer Iter 0: norm_f = 166100, mu=293.999, |x|=0.00324625, |J|=2839.55
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
    --- Outer Iter 1: norm_f = 8859.56, mu=100352, |x|=0.157859, |J|=3102.33
    --- Outer Iter 2: norm_f = 392.573, mu=33450.5, |x|=0.243907, |J|=3037.75
    --- Outer Iter 3: norm_f = 56.3503, mu=11150.2, |x|=0.265337, |J|=3024.63
    --- Outer Iter 4: norm_f = 33.251, mu=3716.73, |x|=0.268011, |J|=3018.92
    --- Outer Iter 5: norm_f = 31.8752, mu=1238.91, |x|=0.268108, |J|=3017.49
    --- Outer Iter 6: norm_f = 31.864, mu=412.97, |x|=0.268133, |J|=3017.34
    --- Outer Iter 7: norm_f = 8859.56, mu=100352, |x|=0.157859, |J|=3017.34
    Least squares message = Relative change, |dx|/|x|, is at most 1e-08
  2*Delta(log(L)) = 17719.1 (66 data params - 24 (approx) model params = expected mean of 42; p-value = 0)
  Term-states Converged!
  Completed in 0.6s
  Final optimization took 0.6s
    -- Performing 'go0' gauge optimization on GateSetTomography estimate --
MISFIT nSigma =  1928.7314162397222

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work tracking this down, @rileyjmurray!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @coreyostrove! I feel pretty good about this one. Who knows how many past calls to LM suffered from this bug. It's wild to think about, since LM was already performing very well, and now it'll perform even better :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, nice work and as always, the detailed writeups are incredibly appreciated!

pygsti/objectivefns/objectivefns.py Show resolved Hide resolved
pygsti/objectivefns/objectivefns.py Outdated Show resolved Hide resolved
pygsti/objectivefns/objectivefns.py Outdated Show resolved Hide resolved
@sserita sserita added this to the 0.9.13.1 milestone Jan 17, 2025
@rileyjmurray
Copy link
Contributor Author

@sserita once tests pass we should be good to merge :)

Copy link
Contributor

@sserita sserita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks as always for the great work, and tracking down that bug in the LM code!

@sserita sserita merged commit d5a3421 into develop Jan 20, 2025
4 checks passed
@sserita sserita deleted the simplify-TimeIndependentMDCObjectiveFunction branch January 20, 2025 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants