Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More efficient method for check_parallel_jacobian #1395

Open
wants to merge 37 commits into
base: main
Choose a base branch
from

Conversation

dallan-keylogic
Copy link
Contributor

@dallan-keylogic dallan-keylogic commented Apr 10, 2024

Depends on #1429

Summary/Motivation:

Presently in check_parallel_jacobian, many arithmetic operations are being carried out in Python code, featuring a triple for loop for some reason. This new method is simpler and takes advantage of more vectorized operations. However, it doesn't knock out "negligible coefficients" like the old method did, and so some of the answers it produces are different given the current criterion for parallel constraints:

diff <= tolerance or diff <= tolerance * max(unorm, vnorm)

We should revisit this criterion---just using a relative tolerance might be enough.

Legal Acknowledgement

By contributing to this software project, I agree to the following terms and conditions for my contribution:

  1. I agree my contributions are submitted under the license terms described in the LICENSE.txt file at the top level of this directory.
  2. I represent I am authorized to make the contributions and grant the license. If my employer has rights to intellectual property that includes these contributions, I represent that I have received permission to make contributions and grant the required license on behalf of that employer.

# List to store pairs of parallel components
parallel = []

for row, col, val in zip(*find(upper_tri), strict=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I try to test this, I get an error due to this keyword argument passed to zip. I assume strict should be passed to triu, but I'm not sure exactly what else is going on here.

Copy link
Contributor Author

@dallan-keylogic dallan-keylogic Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that. No, strict was an argument for zip. It throws an error if the iterators passed to zip are not the same length. However, it appears it was only added in Python 3.10, so I can't use it.

For reference find returns a tuple of three arrays, one containing row indices, one containing column indices, and one containing matrix values for all the nonzero entries of upper_tri. * unpacks the tuple into arguments for zip. It's a neater way of writing find_tuple = find(upper_tri) and
zip(find_tuple[0], find_tuple[1], find_tuple[2]).


# Take product of all rows/columns with all rows/columns by taking outer
# product of matrix with itself
outer = mat @ mat.transpose()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous implementation was designed to avoid taking every possible dot product by first sorting vectors by their nonzero structure. I would have expected this to be slow due to unnecessary dot products, but in some testing, it doesn't seem slower.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sparse matrix multiplication carried out by Scipy. The key here is that 1) Sorting through which products to take should be carried out in the sparse matrix multiplication routine, not manually and
2) The looping required to do this sorting and multiplication is done in compiled C++ code, not Python code, and (should) be much faster than doing it in Python code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scipy matrix product is taking every possible combination of non-zero dot products, while the previous implementation takes only dot products between vectors with the same sparsity structure. This is fewer dot products, although, in practice, probably not by a wide margin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I don't entirely follow. Scipy's matrix multiply, as near as I can tell, does the same thing. It implements the SMMP algorithm which occurs in two steps: the first to determine the nonzero structure of the matrix, the second to actually compute the values (as near as I can tell). That's basically what you're doing, except 1) it isn't filtering out entries as being "too small to contribute" and 2) it's doing it in C++ instead of Python, which is much faster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, if two rows are (1, 2, 3) and (1, 2, 0), they cannot possibly be parallel. The previous implementation will not compute this dot product, while the new implementation will.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're ruling out rows being parallel based on one having an element for the third index (3) and the other having zero for the third index? That can fail if the first row was (1, 2, 3e-8) instead of (1, 2, 3): (1, 2, 3e-8) is still effectively parallel to (1, 2, 0) even if they structurally differ.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this after filtering out small entries.

@Robbybp
Copy link
Member

Robbybp commented Apr 11, 2024

I just profiled this with the following script, which uses my branch: https://github.com/Robbybp/idaes-pse/tree/parallel_constraints-timing. This relies on the distill_DAE.py and distill.dat files from the Pyomo examples directory.

from pyomo.environ import *                                                                                                                                                          
from pyomo.dae import *                                                                                                                                                              
from distill_DAE import model                                                                                                                                                        
                                                                                                                                                                                     
instance = model.create_instance('distill.dat')                                                                                                                                      
                                                                                                                                                                                     
# Discretize using Finite Difference Approach                                                                                                                                        
discretizer = TransformationFactory('dae.finite_difference')                                                                                                                         
discretizer.apply_to(instance, nfe=1000, scheme='BACKWARD')                                                                                                                          
                                                                                                                                                                                     
from idaes.core.util.model_diagnostics import check_parallel_jacobian, check_parallel_jacobian_old                                                                                   
from pyomo.common.timing import TicTocTimer, HierarchicalTimer                                                                                                                       
                                                                                                                                                                                     
timer = TicTocTimer()                                                                                                                                                                
htimer = HierarchicalTimer()                                                                                                                                                         
                                                                                                                                                                                     
timer.tic()                                                                                                                                                                          
htimer.start("old")                                                                                                                                                                  
check_parallel_jacobian_old(instance, timer=htimer)                                                                                                                                  
htimer.stop("old")                                                                                                                                                                   
timer.toc("old")                                                                                                                                                                     
htimer.start("new")                                                                                                                                                                  
check_parallel_jacobian(instance, timer=htimer)                                                                                                                                      
htimer.stop("new")                                                                                                                                                                   
timer.toc("new")                                                                                                                                                                     
                                                                                                                                                                                     
print()                                                                                                                                                                              
print(htimer)                                                                                                                                                                        

I see a profile that looks like this:

[    0.00] Resetting the tic/toc delta timer
[+  33.05] old
[+  31.19] new

Identifier            ncalls   cumtime   percall      %
-------------------------------------------------------
new                        1    31.189    31.189   48.6
     --------------------------------------------------
     check-dotprods        1     0.273     0.273    0.9
     get-jacobian          1    28.324    28.324   90.8
     matmul                1     0.012     0.012    0.0
     norms                 1     2.508     2.508    8.0
     other               n/a     0.073       n/a    0.2
     ==================================================
old                        1    33.048    33.048   51.4
     --------------------------------------------------
     get-jacobian          1    28.638    28.638   86.7
     norm             100068     0.343     0.000    1.0
     sort-by-nz            1     0.227     0.227    0.7
     vectors               1     3.682     3.682   11.1
     other               n/a     0.158       n/a    0.5
     ==================================================
=======================================================

The new implementation is indeed slightly faster, although both are dominated by the cost of getting the Jacobian.

I can make some small performance improvements to the old implementation, giving the following profile:

[    0.00] Resetting the tic/toc delta timer
[+  31.44] old                         
[+  30.82] new
                                             
Identifier            ncalls   cumtime   percall      %
-------------------------------------------------------
new                        1    30.817    30.817   49.5
     --------------------------------------------------
     check-dotprods        1     0.268     0.268    0.9               
     get-jacobian          1    27.966    27.966   90.7
     matmul                1     0.011     0.011    0.0
     norms                 1     2.501     2.501    8.1                                                                                                                              
     other               n/a     0.071       n/a    0.2
     ==================================================
old                        1    31.443    31.443   50.5
     --------------------------------------------------
     get-jacobian          1    28.255    28.255   89.9        
     norm             100068     0.330     0.000    1.1
     sort-by-nz            1     2.667     2.667    8.5
     vectors               1     0.029     0.029    0.1
     other               n/a     0.162       n/a    0.5
     ==================================================
=======================================================

Nothing too major. Both implementations seem to be dominated by the cost of extracting vectors from the Jacobian. I have no real problem with the new implementation. Do you have any performance benchmarks that motivated it?

@dallan-keylogic
Copy link
Contributor Author

dallan-keylogic commented Apr 11, 2024

Thank you for profiling it @Robbybp . The main motivation was trying to read the old code, finding it unnecessarily complex, seeing a triple for loop in Python, and deciding to do an implementation myself. For a while, I did work in Octave, in which there was a huge performance difference between using built-ins and writing for-loops. I guess there isn't as big difference in Python, at least in this instance.

I'm curious about the time spent in norms, though. Is that dominated by lookup times, or by list creation? I just pushed a version that preallocates a Numpy array then fills it with norms---could you see how that performs?

Edit: Just realized you posted the script you used to profile it. Will check myself then.

@dallan-keylogic
Copy link
Contributor Author

On a different note, there's the failing test. I am also having problems with test_display_near_parallel_variables, which I know is changed in #1380 .

The original test expects only (v1, v2), (v1,v4), and (v2, v4) as parallel. The new method additionally returns (v1, v3), (v4, v3), and (v2, v3). Here's the transposed Jacobian matrix, which has these vectors as rows:

matrix([[ 1.00000e+00,  1.00000e+00,  1.00000e+08],
        [-1.00000e+00,  0.00000e+00,  1.00000e+10],
        [ 9.99990e-01,  1.00001e+00,  1.00000e+08],
        [ 0.00000e+00, -1.00000e-08, -1.00000e-06]])

Frankly, I'm shocked that the (v1, v3) pair was missed by the old method (and that seems to have become undone with minor changes).

@Robbybp
Copy link
Member

Robbybp commented Apr 11, 2024

Frankly, I'm shocked that the (v1, v3) pair was missed by the old method (and that seems to have become undone with minor changes).

This is because, with a tolerance of 1e-4, the v3 vector was considered a zero-vector, and was not compared with other vectors. I think you could reasonably argue for considering a zero vector as parallel with everything or parallel with nothing.

@dallan-keylogic
Copy link
Contributor Author

This is what I get with my latest version. Indeed, creating norms was the bottleneck, probably because it wasn't preallocated.

Identifier            ncalls   cumtime   percall      %
-------------------------------------------------------
new                        1    58.482    58.482   50.3
     --------------------------------------------------
     check-dotprods        1     4.389     4.389    7.5
     get-jacobian          1    53.934    53.934   92.2
     matmul                1     0.022     0.022    0.0
     norms                 1     0.002     0.002    0.0
     other               n/a     0.135       n/a    0.2
     ==================================================
old                        1    57.727    57.727   49.7
     --------------------------------------------------
     get-jacobian          1    51.437    51.437   89.1
     norm             100068     0.623     0.000    1.1
     sort-by-nz            1     5.356     5.356    9.3
     vectors               1     0.048     0.048    0.1
     other               n/a     0.263       n/a    0.5
     ==================================================
=======================================================

We have 4.548 s for the new method after subtracting get-jacobian and 6.290 s for the old method.

@dallan-keylogic
Copy link
Contributor Author

Frankly, I'm shocked that the (v1, v3) pair was missed by the old method (and that seems to have become undone with minor changes).

This is because, with a tolerance of 1e-4, the v3 vector was considered a zero-vector, and was not compared with other vectors. I think you could reasonably argue for considering a zero vector as parallel with everything or parallel with nothing.

I ended up just screening out elements based on a norm_tolerance argument in my latest version. When it's called by the diagnostics toolbox, it's going to be given the same tolerance we're using for jacobian_rows and jacobian_columns so it gets flagged there.

@ksbeattie ksbeattie added the Priority:Normal Normal Priority Issue or PR label Apr 11, 2024
@dallan-keylogic
Copy link
Contributor Author

I did some (probably excessive) optimization, and now we have:

Identifier            ncalls   cumtime   percall      %
-------------------------------------------------------
new                        1     0.211     0.211    3.6
     --------------------------------------------------
     check-dotprods        1     0.145     0.145   68.5
     get-jacobian          1     0.000     0.000    0.0
     matmul                1     0.016     0.016    7.4
     norms                 1     0.002     0.002    0.8
     other               n/a     0.049       n/a   23.3
     ==================================================
old                        1     5.593     5.593   96.4
     --------------------------------------------------
     get-jacobian          1     0.000     0.000    0.0
     norm             100068     0.576     0.000   10.3
     sort-by-nz            1     4.773     4.773   85.3
     vectors               1     0.052     0.052    0.9
     other               n/a     0.192       n/a    3.4
     ==================================================
=======================================================

Note that I precalculated the Jacobian and passed it in so that it didn't take so long to run and we didn't have to subtract it out.

@dallan-keylogic
Copy link
Contributor Author

The test failures in Python 3.8 are because some element of Numpy or Scipy decided to change the order it reports elements. The set of parallel vectors is the same, it's just being presented in a different order.

@lbianchi-lbl
Copy link
Contributor

The test failures in Python 3.8 are because some element of Numpy or Scipy decided to change the order it reports elements. The set of parallel vectors is the same, it's just being presented in a different order.

Something to keep in mind that might or might not be relevant is that the versions of NumPy (and possibly SciPy as well) running on Python 3.8 will be older than for Python 3.9+, as NumPy dropped support for 3.8 a while ago. So this difference in sorting (or lack thereof) might be due to the NumPy version rather than the Python version.

@andrewlee94
Copy link
Member

andrewlee94 commented Apr 12, 2024

A related note: Numpy is in the course of releasing v2.0 which may have impacts both here and in our code in general. The Pyomo team is tracking things on their end, but we should be aware of this coming change.

@Robbybp
Copy link
Member

Robbybp commented Jun 20, 2024

This PR still displays way more parallel constraints than I expect on my example, which has been merged into the examples repo. Here is code to reproduce:

from idaes_examples.mod.diagnostics.gas_solid_contactors.example import (
    create_square_model_with_new_variable_and_constraint
)
from idaes.core.util.model_diagnostics import DiagnosticsToolbox

m = create_square_model_with_new_variable_and_constraint()
dt = DiagnosticsToolbox(m)

dt.report_numerical_issues()
dt.display_near_parallel_constraints()

On main, this gives 11 parallel constraints. Here is the relevant output:

====================================================================================
Model Statistics

    Jacobian Condition Number: Undefined (Exactly Singular)

------------------------------------------------------------------------------------
4 WARNINGS

    WARNING: 110 Constraints with large residuals (>1.0E-05)
    WARNING: 77 Variables with extreme Jacobian values (<1.0E-08 or >1.0E+08)
    WARNING: 77 Constraints with extreme Jacobian values (<1.0E-08 or >1.0E+08)
    WARNING: 11 pairs of constraints are parallel (to tolerance 1.0E-08)

------------------------------------------------------------------------------------
6 Cautions

    Caution: 1254 Variables with value close to zero (tol=1.0E-08)
    Caution: 1058 Variables with extreme value (<1.0E-04 or >1.0E+04)
    Caution: 99 Variables with None value
    Caution: 1619 Variables with extreme Jacobian values (<1.0E-04 or >1.0E+04)
    Caution: 1870 Constraints with extreme Jacobian values (<1.0E-04 or >1.0E+04)
    Caution: 3553 extreme Jacobian Entries (<1.0E-04 or >1.0E+04)

------------------------------------------------------------------------------------
Suggested next steps:

    display_constraints_with_large_residuals()
    compute_infeasibility_explanation()
    display_variables_with_extreme_jacobians()
    display_constraints_with_extreme_jacobians()
    display_near_parallel_constraints()

====================================================================================
====================================================================================
The following pairs of constraints are nearly parallel:

    fs.MB.solid_super_vel[0.0], fs.MB.density_flowrate_constraint[0.0,1.0]
    fs.MB.solid_super_vel[30.0], fs.MB.density_flowrate_constraint[30.0,1.0]
    fs.MB.solid_super_vel[60.0], fs.MB.density_flowrate_constraint[60.0,1.0]
    fs.MB.solid_super_vel[90.0], fs.MB.density_flowrate_constraint[90.0,1.0]
    fs.MB.solid_super_vel[120.0], fs.MB.density_flowrate_constraint[120.0,1.0]
    fs.MB.solid_super_vel[150.0], fs.MB.density_flowrate_constraint[150.0,1.0]
    fs.MB.solid_super_vel[180.0], fs.MB.density_flowrate_constraint[180.0,1.0]
    fs.MB.solid_super_vel[210.0], fs.MB.density_flowrate_constraint[210.0,1.0]
    fs.MB.solid_super_vel[240.0], fs.MB.density_flowrate_constraint[240.0,1.0]
    fs.MB.solid_super_vel[270.0], fs.MB.density_flowrate_constraint[270.0,1.0]
    fs.MB.solid_super_vel[300.0], fs.MB.density_flowrate_constraint[300.0,1.0]

====================================================================================

Here is the output of report_numerical_issues with this branch:

====================================================================================
Model Statistics

    Jacobian Condition Number: Undefined (Exactly Singular)

------------------------------------------------------------------------------------
5 WARNINGS

    WARNING: 110 Constraints with large residuals (>1.0E-05)
    WARNING: 77 Variables with extreme Jacobian values (<1.0E-08 or >1.0E+08)
    WARNING: 77 Constraints with extreme Jacobian values (<1.0E-08 or >1.0E+08)
    WARNING: 1012 pairs of constraints are parallel (to tolerance 1.0E-08)
    WARNING: 462 pairs of variables are parallel (to tolerance 1.0E-08)

------------------------------------------------------------------------------------
6 Cautions

    Caution: 1254 Variables with value close to zero (tol=1.0E-08)
    Caution: 1058 Variables with extreme value (<1.0E-04 or >1.0E+04)
    Caution: 99 Variables with None value
    Caution: 1619 Variables with extreme Jacobian values (<1.0E-04 or >1.0E+04)
    Caution: 1870 Constraints with extreme Jacobian values (<1.0E-04 or >1.0E+04)
    Caution: 3553 extreme Jacobian Entries (<1.0E-04 or >1.0E+04)

------------------------------------------------------------------------------------
Suggested next steps:

    display_constraints_with_large_residuals()
    compute_infeasibility_explanation()
    display_variables_with_extreme_jacobians()
    display_constraints_with_extreme_jacobians()
    display_near_parallel_constraints()
    display_near_parallel_variables()

====================================================================================

Note that 1012 constraints are considered parallel. I would like this to be fixed before merging.

@dallan-keylogic
Copy link
Contributor Author

It isn't clear to me that there's anything there to "fix". The Jacobian is singular to machine precision. Something is clearly wrong with the example numerically. I can go through and calculate the angle between those constraints or variables by hand, but if they're getting flagged it's less than 0.01 degrees.

Do you have scaling for your example? In order to get the diagnostics toolbox to recognize them, you need to do a Pyomo scaling transformation and create a toolbox with the scaled model.

@dallan-keylogic
Copy link
Contributor Author

Okay, if we tighten dt.config.parallel_component_tolerance from 1e-8 to 1e-15, we get the expeced output:

dt.config.parallel_component_tolerance = 1e-15
dt.display_near_parallel_constraints()
====================================================================================
The following pairs of constraints are nearly parallel:

    fs.MB.solid_super_vel[0.0], fs.MB.density_flowrate_constraint[0.0,1.0]
    fs.MB.solid_super_vel[30.0], fs.MB.density_flowrate_constraint[30.0,1.0]
    fs.MB.solid_super_vel[60.0], fs.MB.density_flowrate_constraint[60.0,1.0]
    fs.MB.solid_super_vel[90.0], fs.MB.density_flowrate_constraint[90.0,1.0]
    fs.MB.solid_super_vel[120.0], fs.MB.density_flowrate_constraint[120.0,1.0]
    fs.MB.solid_super_vel[150.0], fs.MB.density_flowrate_constraint[150.0,1.0]
    fs.MB.solid_super_vel[180.0], fs.MB.density_flowrate_constraint[180.0,1.0]
    fs.MB.solid_super_vel[210.0], fs.MB.density_flowrate_constraint[210.0,1.0]
    fs.MB.solid_super_vel[240.0], fs.MB.density_flowrate_constraint[240.0,1.0]
    fs.MB.solid_super_vel[270.0], fs.MB.density_flowrate_constraint[270.0,1.0]
    fs.MB.solid_super_vel[300.0], fs.MB.density_flowrate_constraint[300.0,1.0]

====================================================================================

The problem seems to be that we're presently testing 1-<u,v>/(||u||*||v||) against a tolerance of 1e-8. If there's a variable in one of them that has a scaling factor off by a factor of 10^4, the scaling error is squared in the inner product, resulting in an error of 10^8, which then shows up in the parallel constraints. (This is the same issue that you run into when solving the normal equations or performing the SVD. If you form certain intermediate matrices, you lose half your digits of precision, going from 1e-16 to 1e-8). Presently, we're just giving cautions for badly scaled vars/constraints with magnitude 10^4, so this method is much more sensitive than our other methods. I'll make the default tolerance 1e-15 to resolve the inconsistency.

This method really should only be used on a model that has already had scaling issues resolved, and the example, being a legacy model, has severe scaling issues. However, this change is good enough to get by now. Maybe later we can add autoscaling to the example when @andrewlee94 finishes with that.

@Robbybp
Copy link
Member

Robbybp commented Jun 20, 2024

I'm not sure how I feel about further reducing the tolerance. Then Jacobian rows (1.0, 1.0) and (1.0, 1.0+1e-14) will not be considered parallel, which doesn't seem like what we want.

This method really should only be used on a model that has already had scaling issues resolved

I'm not sure I agree with this.

@Robbybp
Copy link
Member

Robbybp commented Jun 20, 2024

Then Jacobian rows (1.0, 1.0) and (1.0, 1.0+1e-14) will not be considered parallel

This is not correct. I see your point that our current measure of "distance-from-parallel" scales with epsilon**2 for two vectors (1.0, 1.0) and (1.0, 1.0+epsilon). Maybe we should use sqrt(unorm*vnorm - uvnorm) as the measure? It would be nice for the tolerance to be the same order of magnitude as a perturbation that will cause two vectors to no longer be parallel.

@dallan-keylogic
Copy link
Contributor Author

Maybe we should use sqrt(unorm*vnorm - uvnorm) as the measure?

Is the goal here to make the measure of colinearity extensive? Then we run into problems if we have rows/columns that are small in norm. We could also make sqrt(1 - <u,v>/(||u||*||v||) ) the measure of colinearity, which is intensive. However, I'm not sure whether using an epsilon smaller than 1e-8 would be meaningful in that case unless the vectors were parallel to machine precision due to floating point error in the calculation. There are ways of computing the angle in a more precise way (check out this StackExchange post and the PDF it links to) but they'll take longer to calculate (we'd probably revert to your version of the code, but take out the part that automatically declares two vectors "not parallel" if they're structurally different) and be overkill for our purposes.

@dallan-keylogic
Copy link
Contributor Author

One of the tests we're now failing is this one:

        @pytest.mark.component
        def test_display_near_parallel_constraints(self):
            model = ConcreteModel()
            model.v1 = Var(initialize=1e-8)
            model.v2 = Var()
            model.v3 = Var()

            model.c1 = Constraint(expr=model.v1 == model.v2)
            model.c2 = Constraint(expr=model.v1 == 1e-8 * model.v3)
            model.c3 = Constraint(expr=1e8 * model.v1 + 1e10 * model.v2 == 1e-6 * model.v3)
            model.c4 = Constraint(expr=-model.v1 == -0.99999 * model.v2)

            dt = DiagnosticsToolbox(model=model)

            stream = StringIO()
            dt.display_near_parallel_constraints(stream)

            expected = """====================================================================================
    The following pairs of constraints are nearly parallel:

        c1, c4

    ====================================================================================
    """

It turns out they're parallel only to a tolerance of 1e-11

(Pdb) from numpy.linalg import norm
(Pdb) v1 = np.array([1, -1])
(Pdb) v2 = np.array([-1, 0.99999])
(Pdb) norm(v1)
1.4142135623730951
(Pdb) norm(v2)
1.414206491322961
(Pdb) v1.dot(v2)
-1.99999
(Pdb) v1.dot(v2)/(norm(v1)*norm(v2))
-0.9999999999874997
(Pdb) out = 1- np.abs(v1.dot(v2)/(norm(v1)*norm(v2)))
(Pdb) out
1.2500334101162025e-11
(Pdb)

So the solution to @Robbybp 's example might just be to turn down the tolerance there (to 1e-15) and we can leave the default value looser. Is this solution satisfactory?

Copy link
Member

@Robbybp Robbybp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still thinking about what to do with the tolerance issue, but here are a few comments in the meantime.

# The expression (1 - abs(values) < tolerance) returns an array with values
# of ones and zeros, depending on whether the condition is fulfilled or not.
# We then find indices where it is filled using np.nonzero.
parallel_1D = np.nonzero(1 - abs(values) < tolerance)[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation for checking 1 - abs(u.dot(v)) / (unorm * vnorm) instead of unorm*vnorm - abs(u.dot(v))? The previous implementation, as well as Gurobi's implementation (https://github.com/Gurobi/gurobi-modelanalyzer/blob/a8a34f70e09f3e439b1b7f308b68010a2dfac5b3/src/gurobi_modelanalyzer/results_analyzer.py#L1241), use the latter quantity, which is fewer operations (less round-off error). Division in particular seems to be risky if the denominator is small, and with the default zero_norm_tolerance here, it can approach 1e-16.

Copy link
Contributor Author

@dallan-keylogic dallan-keylogic Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the previous implementation and the Gurobi implementation checked against a semi-normalized epsilon:

                #
                # Normalize the tolerance in the test; want the same
                # result for vectors u and v and 1000u and 1000v
                #
                if abs(abs(dotprod) - norm1 * norm2) > partol * norm1:
                    continue

Note that this formulation doesn't scale the tolerance with norm2 for some reason. Also, possibly for reasons of efficiency, they use 1-norms instead of 2-norms. That's a false economy, though, because it's no longer guaranteed that abs(dotprod) <= norm1 * norm2 (Cauchy-Schwartz inequality). The equation we're using comes from the formula for angle between two vectors: u.dot(v) = unorm*vnorm*cos(theta) along with Taylor expansions around theta=0 and theta=pi.

We could implement the check like unorm*vnorm - abs(u.dot(v)) < tolerance*unorm*vnorm, but that has the disadvantage of being harder to vectorize. Floating point division, even division by some number on the order of 1e-16, is well defined and can be computed accurately. From Wikipedia:

There are no cancellation or absorption problems with multiplication or division, though small errors may accumulate as operations are performed in succession.

It's addition and subtraction that can cause problems with catastrophic cancellation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would advocate for unorm*vnorm - abs(u.dot(v)) < tolerance*max(1, unorm*vnorm), where all norms are 2-norms. Point taken on division, but the above is still fewer operations than the current implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the intention with the max there? Even relatively orthogonal vectors will be considered "parallel" if the rows have small norm: [1e-6, 0] and [1e-6, 1e-6)] are considered parallel by this metric.

I don't think avoiding vectorized operations is going to be a major improvement in performance, especially since we'll be adding two multiplications (unorm*vnorm and tol*unorm*vnorm) for each pair for which u.dot(v) is nonzero.

idaes/core/util/model_diagnostics.py Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Jun 28, 2024

Codecov Report

Attention: Patch coverage is 90.82569% with 10 lines in your changes missing coverage. Please review.

Project coverage is 76.95%. Comparing base (96237b9) to head (10250a5).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
idaes/models/unit_models/separator.py 72.22% 4 Missing and 1 partial ⚠️
idaes/core/base/control_volume1d.py 75.00% 3 Missing and 1 partial ⚠️
idaes/models/unit_models/feed_flash.py 92.85% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1395      +/-   ##
==========================================
- Coverage   76.99%   76.95%   -0.05%     
==========================================
  Files         382      382              
  Lines       61993    61923      -70     
  Branches    10146    10164      +18     
==========================================
- Hits        47733    47654      -79     
- Misses      11852    11864      +12     
+ Partials     2408     2405       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Robbybp
Copy link
Member

Robbybp commented Jun 29, 2024

@dallan-keylogic I opened a PR into this branch, dallan-keylogic#4, that attempts to implement the original algorithm in a vectorized manner. Take a look when you have a chance.

@andrewlee94
Copy link
Member

@Robbybp @dallan-keylogic We don't seem to be making any progress towards converging on a solution to this - we have lots of ideas but nothing seems to stand out as a clear winner yet. Could someone summarise what ideas have been proposed, along with the pros and cons of each so we can start to work out what the path forward should be.

Also, if the underlying issue is that these checks are inherently sensitive to scaling, then maybe we should just wait until we have the new scaling tools ready so that we can test on actual scaled models.

@Robbybp
Copy link
Member

Robbybp commented Jul 11, 2024

@andrewlee94 I propose we do the following:

  1. Merge scaling tools ( Add new AutoScaler and CustomScalerBase classes #1429 )
  2. I update my example to (a) use scaling tools and (b) still not solve the numerically singular optimization problem.
  3. Merge this PR

@dallan-keylogic
Copy link
Contributor Author

@Robbybp , the scaling tools have been merged, so you can try to update your example to use those tools.

@Robbybp
Copy link
Member

Robbybp commented Oct 10, 2024

Thanks, I will try to carve out some time to update the example and get this merged

@ksbeattie
Copy link
Member

Thanks, I will try to carve out some time to update the example and get this merged

@Robbybp, we've set some dates for the Nov release, namely Nov 14th for the release candidate. Hopefully you can get to this by then?

@Robbybp
Copy link
Member

Robbybp commented Oct 25, 2024

@dallan-keylogic I seem to remember that you said auto-scaling the model from my example led to the vectorized method producing a reasonable output. If this is correct, do you have the code you used to scale the model?

@Robbybp
Copy link
Member

Robbybp commented Oct 25, 2024

I'm running this with AutoScaler.scale_model, and still getting 66 parallel constraints and 8 parallel variables. Here's the code I'm running:

from idaes_examples.mod.diagnostics.gas_solid_contactors.example import (
    create_square_model_with_new_variable_and_constraint
)
from idaes.core.util.model_diagnostics import DiagnosticsToolbox
from idaes.core.scaling import AutoScaler
from idaes.core.util.scaling import get_jacobian
from pyomo.common.timing import TicTocTimer
from pyomo.contrib.pynumero.interfaces.pyomo_nlp import PyomoNLP
import scipy.sparse as sps
import logging

logging.getLogger("pyomo").setLevel(logging.CRITICAL)
logging.getLogger("idaes").setLevel(logging.CRITICAL)

timer = TicTocTimer()
timer.tic()

m = create_square_model_with_new_variable_and_constraint()
timer.toc("make model")
AutoScaler().scale_model(m)
timer.toc("scale model")
dt = DiagnosticsToolbox(m)
dt.report_numerical_issues(scaled=True)
timer.toc("report numerical issues")
dt.display_near_parallel_constraints(scaled=True)
dt.display_near_parallel_variables(scaled=True)
timer.toc("display near parallel")

def display_vectors(jac, con1, con2, transpose=False):
    comps = [con1, con2]
    if transpose:
        jac = jac.transpose().tocsr()
        # transpose indicates we are using variables...
        indices = nlp.get_primal_indices(comps)
    else:
        jac = jac.tocsr()
        indices = nlp.get_equality_constraint_indices(comps)
    for i, comp in enumerate(comps):
        print(f"{comp.name}")
        print(f"{jac[indices[i], :]}")
    u = jac[indices[0], :]
    v = jac[indices[1], :]
    dotprod = abs(u.dot(v.transpose()).data[0])
    norm1 = sps.linalg.norm(u, ord="fro")
    norm2 = sps.linalg.norm(v, ord="fro")
    normprod = norm1 * norm2
    ratio = dotprod / normprod
    print(f"dotprod  = {dotprod}")
    print(f"||u||    = {norm1}")
    print(f"||v||    = {norm2}")
    print(f"normprod = {normprod}")
    print(f"ratio    = {ratio}")

jac, nlp = get_jacobian(m, scaled=True)
con1 = m.fs.MB.solid_phase_heat_transfer[300.0,1.0]
con2 = m.fs.MB.gas_phase_heat_transfer[300.0,1.0]
print()
display_vectors(jac, con1, con2)

var1 = m.fs.MB.gas_phase.heat[0.0,1.0]
var2 = m.fs.MB.gas_phase.energy_accumulation[0.0,1.0,"Vap"]
print()
display_vectors(jac, var1, var2, transpose=True)

And the relevant part of the output:

[+  23.02] scale model
====================================================================================
Model Statistics

    Jacobian Condition Number: Undefined (Exactly Singular)

------------------------------------------------------------------------------------
3 WARNINGS

    WARNING: 110 Constraints with large residuals (>1.0E-05)
    WARNING: 66 pairs of constraints are parallel (to tolerance 1.0E-10)
    WARNING: 8 pairs of variables are parallel (to tolerance 1.0E-10)

------------------------------------------------------------------------------------
5 Cautions

    ...

------------------------------------------------------------------------------------
Suggested next steps:

    ...

====================================================================================
[+   4.20] report numerical issues
====================================================================================
The following pairs of constraints are nearly parallel:

    fs.MB.solid_super_vel[0.0], fs.MB.density_flowrate_constraint[0.0,1.0]
    fs.MB.solid_super_vel[30.0], fs.MB.density_flowrate_constraint[30.0,1.0]
    fs.MB.solid_super_vel[60.0], fs.MB.density_flowrate_constraint[60.0,1.0]
    fs.MB.solid_super_vel[90.0], fs.MB.density_flowrate_constraint[90.0,1.0]
    fs.MB.solid_super_vel[120.0], fs.MB.density_flowrate_constraint[120.0,1.0]
    fs.MB.solid_super_vel[150.0], fs.MB.density_flowrate_constraint[150.0,1.0]
    fs.MB.solid_super_vel[180.0], fs.MB.density_flowrate_constraint[180.0,1.0]
    fs.MB.solid_super_vel[210.0], fs.MB.density_flowrate_constraint[210.0,1.0]
    fs.MB.solid_super_vel[240.0], fs.MB.density_flowrate_constraint[240.0,1.0]
    fs.MB.solid_super_vel[270.0], fs.MB.density_flowrate_constraint[270.0,1.0]
    fs.MB.solid_super_vel[300.0], fs.MB.density_flowrate_constraint[300.0,1.0]
    fs.MB.solid_phase_heat_transfer[0.0,0.6], fs.MB.gas_phase_heat_transfer[0.0,0.6]
    ...
    fs.MB.solid_phase_heat_transfer[300.0,1.0], fs.MB.gas_phase_heat_transfer[300.0,1.0]

====================================================================================
====================================================================================
The following pairs of variables are nearly parallel:

    fs.MB.solid_phase.heat[0.0,0.7], fs.MB.solid_phase.energy_accumulation[0.0,0.7,Sol]
    fs.MB.solid_phase.heat[0.0,0.8], fs.MB.solid_phase.energy_accumulation[0.0,0.8,Sol]
    fs.MB.solid_phase.heat[0.0,0.9], fs.MB.solid_phase.energy_accumulation[0.0,0.9,Sol]
    fs.MB.gas_phase.heat[0.0,0.6], fs.MB.gas_phase.energy_accumulation[0.0,0.6,Vap]
    fs.MB.gas_phase.heat[0.0,0.7], fs.MB.gas_phase.energy_accumulation[0.0,0.7,Vap]
    fs.MB.gas_phase.heat[0.0,0.8], fs.MB.gas_phase.energy_accumulation[0.0,0.8,Vap]
    fs.MB.gas_phase.heat[0.0,0.9], fs.MB.gas_phase.energy_accumulation[0.0,0.9,Vap]
    fs.MB.gas_phase.heat[0.0,1.0], fs.MB.gas_phase.energy_accumulation[0.0,1.0,Vap]

====================================================================================
[+   4.43] display near parallel

fs.MB.solid_phase_heat_transfer[300.0,1.0]
  (0, 199)	-0.9999999999998741
  (0, 1242)	2.8977677529252563e-07
  (0, 2651)	2.8977677529252563e-07
  (0, 7274)	2.897767752616698e-07
fs.MB.gas_phase_heat_transfer[300.0,1.0]
  (0, 199)	0.9999999999998741
  (0, 1242)	-2.8977677529252563e-07
  (0, 2651)	-2.8977677529252563e-07
  (0, 3507)	2.897767752616698e-07
dotprod  = 0.9999999999999161
||u||    = 1.0
||v||    = 1.0
normprod = 1.0
ratio    = 0.9999999999999161

fs.MB.gas_phase.heat[0.0,1.0]
  (0, 1352)	2.897767752616698e-07
  (0, 2726)	-0.5749980449253075
fs.MB.gas_phase.energy_accumulation[0.0,1.0,Vap]
  (0, 2726)	0.09017618863215587
dotprod  = 0.051851132162305365
||u||    = 0.5749980449253805
||v||    = 0.09017618863215587
normprod = 0.05185113216231195
ratio    = 0.999999999999873

@Robbybp
Copy link
Member

Robbybp commented Oct 25, 2024

I see these same results when using scale_variables_by_magnitude and scale_constraints_by_jacobian_norm (maybe that is what scale_model is doing under the hood)

@andrewlee94
Copy link
Member

@Robbybp Yes, scale_model just calls scale_variables_by_magnitude and scale_constraints_by_jacobian_norm - it is intended to be the easy point of entry for new users and run the basic scaling routine in one command (whilst leaving more advanced users the option to call the individual methods as they desire).

@Robbybp
Copy link
Member

Robbybp commented Nov 7, 2024

If we can't update the example to give a small explanation (the 11 pairs of constraints that are identical), I propose that we add a parallel_jacobian_method option that allows users to select between the two implementations. The option would accept something like a ParallelConstraintMethod enum. I have no preference about which method should be the default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority:Normal Normal Priority Issue or PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants