-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify the TimeIndependentMDCObjectiveFunction class #515
Simplify the TimeIndependentMDCObjectiveFunction class #515
Conversation
I encourage anyone who's reading this comment to skip to the Conclusion heading. Investigating possible changes in numerical behaviorDuring the 12/17 dev meeting Erik raised the question of how numerics might be different by focusing on "terms" and computing "lsvec" as a function of terms. changes to TimeIndependentMDCObjectiveFunction.dtermsThe screenshot-of-screenshots below shows the new code and old code. changes to TimeIndependentMDCObjectiveFunction.dlsvecThis one is trickier to explain. Let's start with new code and old code. Let's dig into the default implementation first. dprobs * (raw_dterms * pt5_over_raw_lsvec)[:, None] while the new code computes dlsvec as (dprobs * raw_dterms[:, None]) * pt5_over_raw_lsvec[:, None] The only differences that could arise here are from non-associativity of floating point multiplication. Such differences can certainly happen. However, there is no fundamental reason to prefer the current approach over the new approach. Overall conclusion: no cause for worry when we end up calling RawObjectiveFunction.dlsvec. So ... what about RawChi2Function.dlsvec? The implementation there is nontrivial, unfortunately. It relies on two private helper functions: changes to TimeIndependentMDCObjectiveFunction.termsThere are no numerical changes here. changes to TimeIndependentMDCObjectiveFunction.lsvecHere's the new and old code. The default implementations of lsvec and terms Based on this, the relationship between the new and old outputs of Now let's return to why there's the stupid Here is the problem: there are unit tests that are written in a way which does not respect the freedom of sign choice for lsvec. I tried to adapt the tests to acknowledge the sign freedom a few weeks ago, but I got nowhere. It was much simpler to bite the bullet in the new lsvec implementation and have it get signs from raw_objfn.lsvec by default. This has two downsides, both of which are minor.
def lsvec(self, paramvec=None, oob_check=False):
return TimeIndependentMDCObjective.lsvec(self, paramvec, oob_check, False) ConclusionThese changes should not be problematic from a numerical perspective. I've taken extra steps to ensure minimal outward behavior change (i.e., keeping the possibility that TimeIndependentMDCStore.lsvec returns a vector with negative components). On the off chance that we do see an uptick in numerical issues with GST model fitting, we have this detailed investigation to refer back to. @sserita, @coreyostrove, please put me out of my misery. |
18a29b7
to
1c23251
Compare
…me a file in test/performance so that it doesnt get discovered by testing frameworks like pytest
…t commit will simpify further, which may have consequences from the perspective of floating point arithmetic. Checking in NOW in case we want to revert to this version for numerical reasons.
1c23251
to
c33542e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the huge effort @rileyjmurray, and the very detailed post explaining the differences. That helped a lot.
I tried to have maximal mercy but I couldn't stop myself from having two minor comment-related requests - add those in and it'll be a quick approve from me.
@@ -237,7 +237,7 @@ def test_stdgst_prunedpath(self): | |||
target_model.sim = pygsti.forwardsims.TermForwardSimulator(mode='pruned', max_order=3, desired_perr=0.01, | |||
allowed_perr=0.1, max_paths_per_outcome=1000, | |||
perr_heuristic='meanscaled', max_term_stages=5) | |||
target_model.from_vector(1e-10 * np.ones(target_model.num_params)) # to seed term calc (starting with perfect zeros causes trouble) | |||
target_model.from_vector(1e-9 * np.ones(target_model.num_params)) # to seed term calc (starting with perfect zeros causes trouble) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhhh .... if I didn't make it, then the test failed nastily. The optimizer just didn't converge.
I can compare side by side and see what the threshold is for the old code vs new code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sserita, I'm very glad you asked this question. It turns out that the non-convergence I observed stemmed from a bug in the LM optimizer. Specifically, it stemmed from how the LM optimizer decided to update the "best_x."
The implementations on develop (i.e., both the default SimplerLMOptimizer and the backup CustomLMOptimizer) would only update best_x
if norm_f < min_norm_f
and a boolean flag called new_x_is_known_in_bounds
was True. But new_x_is_known_in_bounds
is really more of a sufficient condition for being in bounds than a necessary condition. It makes more sense to do this:
If
norm_f < min_norm_f
, then check ifnew_x_is_known_in_bounds == True
. If it is, then we take this flag's word for it and updatebest_x[:] = x
. Otherwise, we explicitly check ifx
is in bounds and we updatebest_x[:] = x
if and only if that's the case.
If I make that change then the test passes without making the change that prompted your comment.
See below for my investigation, written to a future pyGSTi developer who might come across this.
Investigation
Stefan's comment concerned a change I made in test/test_packages/drivers/test_calcmethods1Q.py::test_stdgst_prunedpath
. The change was to replace a scalar that was hard-coded to 1e-10 with a scalar hard-coded to 1e-9. Let's call that scalar perturb
. So the old value is perturb=1e-10
.
I ran this test on develop with different values of perturb
; those runs passed when perturb >= 5e-13
and failed when perturb <= 2.5e-13
. Then I ran this test on the current branch using the commit at the time that Stefan left his comment. Those runs passed when perturb >= 2e-10
and failed when perturb <= 1.125
.
I started to write a report of these observations when I noticed something funny. In the test log it was clear that the LM optimizer was terminating outer iterations with a much larger value for norm_f
than the minimum value seen across the outer iterations (specific logs are shown below). This made me suspect that best_x
wasn't being set properly. Sure enough, I found this issue where a sufficient condition for an iterate being in-bounds (new_x_is_known_in_bounds
) was effectively used as a necessary condition. After updating the termination criteria I was able to get this problematic unit test to pass when perturb==1e-10
. In fact, it passed with perturb==1e-12
. I didn't bother checking smaller values.
develop, passing with 5e-13
Last iteration:
--- dlogl GST ---
Initial Term-stage model has 0 failures and uses 10.0% of allowed paths (ok=True).
--- Outer Iter 0: norm_f = 166109, mu=1, |x|=2.44949e-12, |J|=2835.48
--- Outer Iter 1: norm_f = 166104, mu=979.996, |x|=0.00324549, |J|=2835.51
--- Outer Iter 2: norm_f = 608.149, mu=326.665, |x|=0.255707, |J|=3040.87
--- Outer Iter 3: norm_f = 95.8116, mu=168.158, |x|=0.276211, |J|=3017.42
--- Outer Iter 4: norm_f = 31.8847, mu=56.0528, |x|=0.268233, |J|=3017.41
--- Outer Iter 5: norm_f = 31.864, mu=18.6843, |x|=0.268133, |J|=3017.33
--- Outer Iter 6: norm_f = 166104, mu=979.996, |x|=0.00324549, |J|=3017.33
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Least squares message = Objective function out-of-bounds! STOP
2*Delta(log(L)) = 332207 (66 data params - 24 (approx) model params = expected mean of 42; p-value = 0)
Pruned path-integral: kept 13212 paths (10.0%) w/magnitude 135.6 (target=134.4, #circuits=66, #failed=0)
(avg per circuit paths=200, magnitude=2.055, target=2.036)
After adapting paths, num failures = 0, 10.0% of allowed paths used.
--- Outer Iter 0: norm_f = 166104, mu=979.996, |x|=0.00324549, |J|=2835.51
--- Outer Iter 1: norm_f = 608.149, mu=326.665, |x|=0.255707, |J|=3040.87
--- Outer Iter 2: norm_f = 95.8116, mu=168.158, |x|=0.276211, |J|=3017.42
--- Outer Iter 3: norm_f = 31.8847, mu=56.0528, |x|=0.268233, |J|=3017.41
--- Outer Iter 4: norm_f = 31.864, mu=18.6843, |x|=0.268133, |J|=3017.33
--- Outer Iter 5: norm_f = 608.149, mu=326.665, |x|=0.255707, |J|=3017.33
--- Outer Iter 6: norm_f = 607.403, mu=1.52598e+07, |x|=0.255282, |J|=3040.98
--- Outer Iter 7: norm_f = 591.59, mu=5.08659e+06, |x|=0.255303, |J|=3040.85
--- Outer Iter 8: norm_f = 548.357, mu=1.69553e+06, |x|=0.255395, |J|=3040.46
--- Outer Iter 9: norm_f = 448.118, mu=565177, |x|=0.255827, |J|=3039.43
--- Outer Iter 10: norm_f = 289.143, mu=188392, |x|=0.257372, |J|=3037.08
--- Outer Iter 11: norm_f = 150.619, mu=62797.4, |x|=0.260537, |J|=3032.8
--- Outer Iter 12: norm_f = 68.2405, mu=20932.5, |x|=0.26454, |J|=3026.4
--- Outer Iter 13: norm_f = 36.1194, mu=6977.49, |x|=0.267382, |J|=3020.41
--- Outer Iter 14: norm_f = 31.9729, mu=2325.83, |x|=0.268035, |J|=3017.84
--- Outer Iter 15: norm_f = 31.8644, mu=775.277, |x|=0.268122, |J|=3017.37
--- Outer Iter 16: norm_f = 31.864, mu=258.426, |x|=0.268135, |J|=3017.33
Least squares message = Both actual and predicted relative reductions in the sum of squares are at most 1e-06
2*Delta(log(L)) = 63.7279 (66 data params - 24 (approx) model params = expected mean of 42; p-value = 0.0168613)
Term-states Converged!
Completed in 0.7s
Final optimization took 0.7s
develop, passing with 2e-10
Last iteration:
--- dlogl GST ---
Initial Term-stage model has 0 failures and uses 10.0% of allowed paths (ok=True).
--- Outer Iter 0: norm_f = 166109, mu=1, |x|=9.79796e-10, |J|=2835.48
--- Outer Iter 1: norm_f = 166090, mu=979.996, |x|=0.00324842, |J|=2851.83
--- Outer Iter 2: norm_f = 26560.4, mu=334505, |x|=0.094016, |J|=3265.36
--- Outer Iter 3: norm_f = 3024.04, mu=117205, |x|=0.198562, |J|=3065.56
--- Outer Iter 4: norm_f = 223.51, mu=39068.3, |x|=0.250841, |J|=3037.55
--- Outer Iter 5: norm_f = 61.1352, mu=13022.8, |x|=0.264528, |J|=3026.48
--- Outer Iter 6: norm_f = 34.14, mu=4340.92, |x|=0.267594, |J|=3019.84
--- Outer Iter 7: norm_f = 31.9082, mu=1446.97, |x|=0.268025, |J|=3017.7
--- Outer Iter 8: norm_f = 31.8645, mu=482.324, |x|=0.26812, |J|=3017.36
--- Outer Iter 9: norm_f = 31.864, mu=160.775, |x|=0.268135, |J|=3017.33
--- Outer Iter 10: norm_f = 166090, mu=979.996, |x|=0.00324842, |J|=3017.33
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Least squares message = Objective function out-of-bounds! STOP
2*Delta(log(L)) = 332181 (66 data params - 24 (approx) model params = expected mean of 42; p-value = 0)
Pruned path-integral: kept 13212 paths (10.0%) w/magnitude 135.6 (target=134.4, #circuits=66, #failed=0)
(avg per circuit paths=200, magnitude=2.055, target=2.036)
After adapting paths, num failures = 0, 10.0% of allowed paths used.
--- Outer Iter 0: norm_f = 166090, mu=979.996, |x|=0.00324842, |J|=2851.83
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
--- Outer Iter 1: norm_f = 26560.4, mu=334505, |x|=0.094016, |J|=3265.36
--- Outer Iter 2: norm_f = 3024.04, mu=117205, |x|=0.198562, |J|=3065.56
--- Outer Iter 3: norm_f = 223.51, mu=39068.3, |x|=0.250841, |J|=3037.55
--- Outer Iter 4: norm_f = 61.1352, mu=13022.8, |x|=0.264528, |J|=3026.48
--- Outer Iter 5: norm_f = 34.14, mu=4340.92, |x|=0.267594, |J|=3019.84
--- Outer Iter 6: norm_f = 31.9082, mu=1446.97, |x|=0.268025, |J|=3017.7
--- Outer Iter 7: norm_f = 31.8645, mu=482.324, |x|=0.26812, |J|=3017.36
--- Outer Iter 8: norm_f = 31.864, mu=160.775, |x|=0.268135, |J|=3017.33
--- Outer Iter 9: norm_f = 26560.4, mu=334505, |x|=0.094016, |J|=3017.33
--- Outer Iter 10: norm_f = 24207.2, mu=111502, |x|=0.0993976, |J|=3245.77
--- Outer Iter 11: norm_f = 1743.06, mu=37167.3, |x|=0.216206, |J|=3057.74
--- Outer Iter 12: norm_f = 177.863, mu=12389.1, |x|=0.26331, |J|=3032.07
--- Outer Iter 13: norm_f = 47.4228, mu=4129.7, |x|=0.269905, |J|=3020.44
--- Outer Iter 14: norm_f = 31.995, mu=1376.57, |x|=0.268271, |J|=3017.63
--- Outer Iter 15: norm_f = 31.8641, mu=458.855, |x|=0.268137, |J|=3017.34
--- Outer Iter 16: norm_f = 31.864, mu=137.657, |x|=0.268136, |J|=3017.33
Least squares message = Both actual and predicted relative reductions in the sum of squares are at most 1e-06
2*Delta(log(L)) = 63.7279 (66 data params - 24 (approx) model params = expected mean of 42; p-value = 0.0168613)
Term-states Converged!
Completed in 0.6s
Final optimization took 0.6s
This PR, passing with 2e-10
Last iteration:
--- dlogl GST ---
Initial Term-stage model has 0 failures and uses 10.0% of allowed paths (ok=True).
--- Outer Iter 0: norm_f = 166109, mu=1, |x|=9.79796e-10, |J|=2835.48
--- Outer Iter 1: norm_f = 166090, mu=979.996, |x|=0.00324842, |J|=2851.83
--- Outer Iter 2: norm_f = 26560.4, mu=334505, |x|=0.094016, |J|=3265.36
--- Outer Iter 3: norm_f = 3024.04, mu=117205, |x|=0.198562, |J|=3065.56
--- Outer Iter 4: norm_f = 223.51, mu=39068.3, |x|=0.250841, |J|=3037.55
--- Outer Iter 5: norm_f = 61.1352, mu=13022.8, |x|=0.264528, |J|=3026.48
--- Outer Iter 6: norm_f = 34.14, mu=4340.92, |x|=0.267594, |J|=3019.84
--- Outer Iter 7: norm_f = 31.9082, mu=1446.97, |x|=0.268025, |J|=3017.7
--- Outer Iter 8: norm_f = 31.8645, mu=482.324, |x|=0.26812, |J|=3017.36
--- Outer Iter 9: norm_f = 31.864, mu=160.775, |x|=0.268135, |J|=3017.33
--- Outer Iter 10: norm_f = 166090, mu=979.996, |x|=0.00324842, |J|=3017.33
--- Outer Iter 11: norm_f = 166090, mu=293.999, |x|=0.0032486, |J|=2851.52
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Least squares message = Objective function out-of-bounds! STOP
2*Delta(log(L)) = 332180 (66 data params - 24 (approx) model params = expected mean of 42; p-value = 0)
Pruned path-integral: kept 13212 paths (10.0%) w/magnitude 135.6 (target=134.4, #circuits=66, #failed=0)
(avg per circuit paths=200, magnitude=2.055, target=2.036)
After adapting paths, num failures = 0, 10.0% of allowed paths used.
--- Outer Iter 0: norm_f = 166090, mu=293.999, |x|=0.0032486, |J|=2851.52
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
--- Outer Iter 1: norm_f = 696.596, mu=100352, |x|=0.273829, |J|=3033.45
--- Outer Iter 2: norm_f = 145.292, mu=33450.5, |x|=0.26568, |J|=3028.08
--- Outer Iter 3: norm_f = 50.8609, mu=11150.2, |x|=0.267615, |J|=3021.55
--- Outer Iter 4: norm_f = 33.0403, mu=3716.73, |x|=0.268405, |J|=3017.96
--- Outer Iter 5: norm_f = 31.8829, mu=1238.91, |x|=0.268202, |J|=3017.31
--- Outer Iter 6: norm_f = 31.8642, mu=412.97, |x|=0.268145, |J|=3017.31
--- Outer Iter 7: norm_f = 31.864, mu=123.891, |x|=0.268137, |J|=3017.33
--- Outer Iter 8: norm_f = 696.596, mu=100352, |x|=0.273829, |J|=3017.33
--- Outer Iter 9: norm_f = 696.594, mu=30105.5, |x|=0.273829, |J|=3033.45
--- Outer Iter 10: norm_f = 80.7071, mu=10035.2, |x|=0.269167, |J|=3023.24
--- Outer Iter 11: norm_f = 34.0435, mu=3345.05, |x|=0.268725, |J|=3018.22
--- Outer Iter 12: norm_f = 31.886, mu=1115.02, |x|=0.268213, |J|=3017.33
--- Outer Iter 13: norm_f = 31.8642, mu=371.673, |x|=0.268145, |J|=3017.32
--- Outer Iter 14: norm_f = 31.864, mu=111.502, |x|=0.268137, |J|=3017.33
Least squares message = Both actual and predicted relative reductions in the sum of squares are at most 1e-06
2*Delta(log(L)) = 63.7279 (66 data params - 24 (approx) model params = expected mean of 42; p-value = 0.0168613)
Term-states Converged!
Completed in 0.6s
Final optimization took 0.6s
This PR, failing with 1e-10
Last iteration:
--- dlogl GST ---
Initial Term-stage model has 0 failures and uses 10.0% of allowed paths (ok=True).
--- Outer Iter 0: norm_f = 166109, mu=1, |x|=4.89898e-10, |J|=2835.48
--- Outer Iter 1: norm_f = 166100, mu=979.996, |x|=0.00324622, |J|=2839.6
--- Outer Iter 2: norm_f = 52952, mu=334505, |x|=0.0493353, |J|=3716.22
--- Outer Iter 3: norm_f = 8281.27, mu=203584, |x|=0.159582, |J|=3101.88
--- Outer Iter 4: norm_f = 631.08, mu=67861.3, |x|=0.234876, |J|=3044.93
--- Outer Iter 5: norm_f = 97.1942, mu=22620.4, |x|=0.260353, |J|=3031.15
--- Outer Iter 6: norm_f = 40.7621, mu=7540.14, |x|=0.266602, |J|=3022.47
--- Outer Iter 7: norm_f = 32.2466, mu=2513.38, |x|=0.26786, |J|=3018.4
--- Outer Iter 8: norm_f = 31.8699, mu=837.794, |x|=0.268084, |J|=3017.45
--- Outer Iter 9: norm_f = 31.864, mu=279.265, |x|=0.26813, |J|=3017.34
--- Outer Iter 10: norm_f = 31.864, mu=83.7794, |x|=0.268136, |J|=3017.33
--- Outer Iter 11: norm_f = 166100, mu=979.996, |x|=0.00324622, |J|=3017.33
--- Outer Iter 12: norm_f = 166100, mu=293.999, |x|=0.00324625, |J|=2839.55
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Least squares message = Objective function out-of-bounds! STOP
2*Delta(log(L)) = 332201 (66 data params - 24 (approx) model params = expected mean of 42; p-value = 0)
Pruned path-integral: kept 13212 paths (10.0%) w/magnitude 135.6 (target=134.4, #circuits=66, #failed=0)
(avg per circuit paths=200, magnitude=2.055, target=2.036)
After adapting paths, num failures = 0, 10.0% of allowed paths used.
--- Outer Iter 0: norm_f = 166100, mu=293.999, |x|=0.00324625, |J|=2839.55
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
Paths are insufficient! (Using meanscaled heuristic with error bound of 0.1)
--- Outer Iter 1: norm_f = 8859.56, mu=100352, |x|=0.157859, |J|=3102.33
--- Outer Iter 2: norm_f = 392.573, mu=33450.5, |x|=0.243907, |J|=3037.75
--- Outer Iter 3: norm_f = 56.3503, mu=11150.2, |x|=0.265337, |J|=3024.63
--- Outer Iter 4: norm_f = 33.251, mu=3716.73, |x|=0.268011, |J|=3018.92
--- Outer Iter 5: norm_f = 31.8752, mu=1238.91, |x|=0.268108, |J|=3017.49
--- Outer Iter 6: norm_f = 31.864, mu=412.97, |x|=0.268133, |J|=3017.34
--- Outer Iter 7: norm_f = 8859.56, mu=100352, |x|=0.157859, |J|=3017.34
Least squares message = Relative change, |dx|/|x|, is at most 1e-08
2*Delta(log(L)) = 17719.1 (66 data params - 24 (approx) model params = expected mean of 42; p-value = 0)
Term-states Converged!
Completed in 0.6s
Final optimization took 0.6s
-- Performing 'go0' gauge optimization on GateSetTomography estimate --
MISFIT nSigma = 1928.7314162397222
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work tracking this down, @rileyjmurray!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @coreyostrove! I feel pretty good about this one. Who knows how many past calls to LM suffered from this bug. It's wild to think about, since LM was already performing very well, and now it'll perform even better :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, nice work and as always, the detailed writeups are incredibly appreciated!
…rease 1e-10 to 2e-10 instead of increasing 1e-10 to 1e-9).
@sserita once tests pass we should be good to merge :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks as always for the great work, and tracking down that bug in the LM code!
TimeIndependentMDCObjectiveFunction is one of the most complicated classes I've encountered in pyGSTi. Right now ...
This PR simplifies TimeIndependentMDCObjectiveFunction so that
These changes should materially improve maintainability of TimeIndependentMDCObjectiveFunction, as well as its 8 subclasses that collectively span ~500 lines. The outward behavior of this class should be the same as before up to floating point rounding errors.
For the curious, I went down this rabbit hole when working on PR #506.
Incidental changes related to MPI testing: moved some lightweight MPI tests from
test/test_packages
totest/unit
. This move was prompted by this PR because this PR touches code that involves a lot of MPI. While I was at it I renamed an MPI performance profiling file so that it didn't get misinterpreted as a test file.