Opened 7 years ago

Closed 6 years ago

Last modified 6 years ago

#4395 closed enhancement (fixed)

Importance of residual scaling for the proper operation of the Kinsol solver

Reported by: casella Owned by: wbraun
Priority: high Milestone: 1.13.0
Component: Run-time Version:
Keywords: Cc:

Description

The Kinsol solver provides means to improve the numerical robustness by allowing to set scaling factors for the unknowns and for the residuals. In the current OMC implementation, the former is used, but the latter is not.

I have prepared some simple test cases that demonstrate how this can actually be a problem, since the use of SIunits, which is the standard in Modelica, can often lead to such situations. For example, a steam power plant model can contain variables whose order of magnitude is 1e-6 (the turbine admittance, which is a manipulated variable for the power/frequency controller), and others whose order of magnitude is 1e9 (the turbine power, which is a controlled variable).

The test model (attached) represents the series connection of N nonlinear springs, with linearly distributed stiffness coefficients to add some asymmetry, just in case. The endpoint displacement is fixed to 1.

The model is more or less nonlinear depending on d_0. If d_0 >> d_tot the model is only slightly nonlinear around the solution, while it becomes severly nonlinear if d_0 << d_tot. The default is d_0 = 10*d_tot, so the model is only moderately nonlinear: the coefficients in the Jacobian change by at most 10% between the initial guess (which is trivially zero) and the solution.

I have set tearing off to get some more equations to the solver, since the most interesting use case for kinsol is large-scale systems without tearing. The unknowns are the individual relative displacements of the springs s[N], the absolute displacements of each spring endpoint d[N+1], and the force acting on the springs (one element of vector F, the others are aliases).

It is also possible to set k_max = k_min to make the analysis simpler, because then basically all elements of vector s[N] become equal and d=linspace(0,1,N+1), so one only needs to track s[1] and F[1] to understand what happens, as everything else trivially depends on that. On the other hand, by doing this one may get some behaviour which crucially depends of the high (and somewhat artificial) level of simmetry of the problem.

The base class Test is such that the scaling of variables and residuals should be decent without the need of setting the nominal attributes.

Test has only 10 spring elements and the solver converges quickly, as expected.

Test2 has 100 spring elements; the convergence at the first attempt is somewhat difficult. This is strange; as the nonlinearity is very mild, I would have expected a faster convergence rate. But that is based on my understanding of Newton's method with just scalar unknown; maybe the high number of dimensions adds more sensitivity to nonlinear behaviour? Or is it because Kinsol is a lot more than a damped quasi-Newton method?

Test3 has 1000 spring elements; now the solver fails on the first attempt, and quickly converges on the second. It could definitely be interesting to understand what happens between the first and second attempt, but note that you can get the same behaviour with the smaller Test4.

Test4 has 100 spring elements and d_0 >> d_tot, so it is only marginally nonlinear; the elements Jacobian change only on the 4th decimal digit. And yet, the first attempt does not converge, and only the second does.

Test5 has d_0 = 1e7*d_tot, so it is really practically linear, the elements of the Jacobian should change on their 7th digit (maybe we should double-check that, just in case). And yet, kinsol does not converge on the first attempt, as in Test4, which is now really weird.

In Test6, I change the stiffness parameters to a very large values, thus introducing a scaling problem on one unknown (F[99]) and on the set of equations

 s[i]*(1 + abs(s[i])/s_0)*k[i]*N = F[i];

whose residual have now order of magnitude 1e8. It is apparent that the solver has a lot more difficulty to converge now. But what is really interesting is that introducing the variable scaling (but not the residual scaling) in Test7 via nominal attributes actually makes the problem worse, as the solver completely fails then (!)

My conclusion is thus that residual scaling is absolutely essential to handle models written in SI units with kinsol, as they can have very badly scaled residuals. Variable scaling is not enough.

I'd encourage to implement some residual scaling mechanism, so that we can see what happens with this simple test case before venturing into the real-life, large-scale models. One option is to obtain the scaling factor for each residual row as the scalar product between the vector of scaling factors for the unknowns and the vector of the absolute values of the corresponding row of the Jacobian.

One should also set very small values for k_min and k_max and explore what happens if one unknown and N residuals are << 1 instead of being >> 1

Attachments (5)

test-kinsol.2.mos (2.8 KB) - added by casella 7 years ago.
test-kinsol.mos (2.8 KB) - added by casella 7 years ago.
Self-contained test script
test-kinsol-v2.mos (1.6 KB) - added by casella 7 years ago.
test-kinsol-v3.mos (1.6 KB) - added by casella 7 years ago.
test-kinsol-v4.mos (4.8 KB) - added by casella 7 years ago.

Download all attachments as: .zip

Change History (19)

comment:1 Changed 7 years ago by wbraun

  • Owner changed from somebody to wbraun
  • Status changed from new to assigned

comment:2 Changed 7 years ago by casella

In PR #1610, scaling was added to the residuals, but the result is still not satisfactory.

I have updated the test script, please check Test8 and Test9. Except for the scaling of some variables F and K, the two models are identical. I would expect that, once the appropriate nominal values are set for the badly scaled variables, the performance of the nonlinear solver is comparable in the two cases.

If you now run the .mos script, you'll see that the first model converges in a total of 13 iterations, while the second fails after 24 iterations, with fnorm constantly growing until the solver gives up.

I guess the implementation of scaling is either incomplete or still has some bug.

If we want to fully support the declarative EOOM paradigm, there should be no (substantial) penalty in writing physical equations which are badly scaled due to SI units, as long as the appropriate auxiliary information (i.e., nominal values) is provided. Thus, Test8 and Test9 should both run successfully and more or less with the same number of iterations.

Changed 7 years ago by casella

Changed 7 years ago by casella

Self-contained test script

comment:3 Changed 7 years ago by casella

I have checked the behaviour of the solver after PR 1790, but I'm still not convinced we're on the right track.

Please check the attached test-kinsol-v2.mos test script.

Test2 is a nicely scaled model: the values of fnorm during iterations hover for a while in the interval 0.4-160, then eventually they go all the way down to 3e-12, when Kinsol exits with code 0 (all fine).

Test3 is the same model, with badly scaled values, and with the right nominal values that should allow to scale it back to the values of Test2, thus obtaining the same qualitative solver behaviour, which is our goal. When you run it, fnorm starts at 1.8e17 (!!) which is obviously not scaled at all, it takes a lot of linesearch iterations during each nonlinear iteration, with absurdly small values of lambda, all the way down to 1e-20, and eventually the solver exits with fnorm = 842 and error code 2 (small step termination). Note that the kinsol manual states, "in the section Nonlinear iteration stopping criteria", that

Only the first condition (small residual) is considered a successful completion of kinsol. The second condition (small step) may indicate that the iteration is stalled near a point for which the residual is still unacceptable.

So, accepting this solution for omc is also questionable, IMHO.

The documentation of Kinsol specifies clearly that scaled norms of the residuals are to be used, the scaled norm of z being defined as the norm of D_f z.

I don't think that tweaking the minimum value of lambda to cope with a badly scaled residual as you did is a good solution. The main reason for that is that lambda is a scalar, while the norm of the residuals is a sum over all residuals: if some of them are large and some of them are small, not weighting them *individually* with the D_F matrix means that some equations will be solved with a very high accuracy, others with a much smaller one, and unnecessarily so. As the scale of the problem grows, this problem is just going to get worse, and we'll soon need to solve problems with tens or hundred thousands unknowns.

IMHO the only way to deal with this issue in a proper way is to compute D_F and let the solver use it appropriately. The i,i element of matrix D_F can be computed as the scalar product of the vector obtained by taking the absolute values of the i-th row of the Jacobian with the vector of the unknowns nominal values.

Can you implement that, please?

comment:4 Changed 7 years ago by casella

I have updated the test to avoid alias variable elimination, so there are more badly scaled residuals in the system and the difference between Test2 and Test3 is even more marked.

comment:5 follow-up: Changed 7 years ago by wbraun

  • Resolution set to fixed
  • Status changed from assigned to closed

In PR #1793 the scaling issues are fixed and the examples Test2 and Test3 from 'test-kinsol-v2.mos​' are solved with the same amount of iterations.

comment:6 in reply to: ↑ 5 Changed 7 years ago by casella

Replying to wbraun:

In PR #1793 the scaling issues are fixed and the examples Test2 and Test3 from 'test-kinsol-v2.mos​' are solved with the same amount of iterations.

This is true. However, if you now remove the declaration F(each nominal = 1e9) (see Test4 in the updated test-kinsol-v2.mos attachment), the very same number of iterations is needed. In fact, even the value of fnorm at each iteration is exactly the same down to the last digit. This is quite weird, I would have expected to see much worse performance when the forces are not scaled. So, for this specific example it would now seem that introducing scaling values for the variables is not necessary, or in fact that this is totally irrelevant.

In fact, the residuals for the spring equations s[i]*(1 + abs(s[i])/s_0)*k[i]*N = F[i] are indeed scaled correctly due to the presence of the large factors k[i], even without giving a scaling factor to F. On the other hand, during the various iterations the equations F[i] = F[i+1], which are badly scaled if no nominal attribute is given to F, but are trivial, are always solved to produce an exactly zero residual, so it might be well be that the scaling of the large variables F and of the residuals of those equality equations is irrelevant for this specific test case.

I have application models of hydraulic actuation systems where scaling proved to be extremely beneficial to attain convergence, so I believe that in general this is the case, and that the behaviour shown by the test case reported here is an exception. Unfortunately I cannot publish those models which are confidential. I'll try to come up with another simple test case which doesn't have this kind of symmetry, so we can actually demonstrate that there is indeed a benefit in introducing the nominal values.

Last edited 7 years ago by casella (previous) (diff)

Changed 7 years ago by casella

comment:7 follow-up: Changed 7 years ago by casella

  • Resolution fixed deleted
  • Status changed from closed to reopened

There is still a problem that is already visible with this test case, see Test5 in test-kinsol-v2.mos. At the end of the iterations, fnorm = 8.05e-14, which is less than ftol = 1e-12. However, when the residuals of the solution are checked, we get:

LOG_NLS           | info    | | scaled Euclidean norm of F(u) = 2.588407e-006
|                 | warning | | False positive solution. FNorm is not small enough.
...
LOG_NLS           | warning | nonlinear system 141 fails: at t=0
...
assert            | debug   | Solving non-linear system 141 failed at time=0.

As we discussed, I am convinced that you shouldn't use a plain Euclidean norm to check if the residuals are small, but that you should rather use a weighted Euclidean norm, with the same scaling factor that were used for the solution.

The rationale is that the same model may contain equations with residuals of vastly different orders of magnitude, e.g. a power balance on a big power plant turbine shaft (order of magnitude 1e9 W) and Kirchoff's current law relative to the current flowing through 14-nm transistor (order of magnitude 1e-9 A). Weighting them the same in the Euclidean norm makes no sense: a residual of 1e-12 W in the first case is ridiculously small and may not be reachable due to rounding errors, while a residual of 1e-12 A in the second case means a 0.1% relative error, which is indeed very large.

If you use a weighted norm, then a tolerance of 1e-12 would mean 1e-3 W for the first equation residual (which is just fine, since the overall power is 1GW), and of 1e-19 A in the second (which is again just fine, given the small absolute values of those currents).

Another way to put this is that the perfomance of the solver (in particular the termination criterion) should be invariant w.r.t. the choice of units. This is only attained if you evaluate the weighted residual norm.

comment:8 Changed 7 years ago by casella

I have come up with a modified test which has less symmetry and still shows a lot of issues, see the attached test-kinsol-v3.mos.

Now the nicely scaled case Test2 works fine, while the badly scaled case with nominal values Test4 does not: Kinsol terminates with error codes different from 0, even though the solution is then accepted anyway by the omc runtime.

There are several unexpected things happening with these tests:

  • Test2 and Test4 do not have the same performance
  • Test3 (without nominal values) and Test4 (with nominal values) do exactly the same iterations. This may still be due to some symmetry, but I find this harder to believe
  • The xScaling values of Test3 (all ones) and Test4 (ones for d and s, 1e-9 for F_a, F_b) are different as expected, while the fScaling values are the same for the two cases, which is strange; I would expect the scaling value of the residual F_a[i] + F_b[i] - F_a[i+1] - F_b[i+1] to be 1 in Test3 and 1e-9 in Test4, do I miss something here?
  • fnorm cannot get below 1e-6 in both Test3 and Test4. This is expected in Test3, but not in Test4, where residuals should be scaled correctly by the nominal values, with fnorm getting below 1e-12 as in Test2. As 1e-6*1e-9 = 1e-15, which is close to machine epsilon, I suspect there is still something wrong with the scaling of some residuals related to F_a, F_b
  • Both in Test3 and Test4 the value of fnorm after the last iteration and the scaled Euclidean norm of f(u) differ by a factor of about 1e5, while I would expect them to be the same, or at least to have the same order of magnitude, if the latter is evaluated with the same weight of the former, as it should

As a final remark, you may probably debug this test case taking N = 2 or N = 3, as all unexpected behaviour is due to structural issues, not to the size of the problem.

Changed 7 years ago by casella

comment:9 in reply to: ↑ 7 Changed 7 years ago by wbraun

Replying to casella:

There is still a problem that is already visible with this test case, see Test5 in test-kinsol-v2.mos. At the end of the iterations, fnorm = 8.05e-14, which is less than ftol = 1e-12. However, when the residuals of the solution are checked, we get:
[...]

This is fixed in PR #1800.

comment:10 Changed 7 years ago by casella

I have extensively tested the solver after the last fixes and the behaviour is now as expected, see the attached test-kinsol-v4.mos. The performance of the well-scaled problems is the same of the badly-scaled one if the nominal values are set, while the solver gets in real trouble if scaling is turned off via -noScaling

In fact, with -noScaling simulations should fail, since Kinsol reports a failure after a hundred or more iterations in all cases. However, the generated code still checks the residuals with scaling turned on (despite -noScaling) and accepts the solution, while it shouldn't.

@wibraun, if you can please fix the final check so that it takes -noScaling into account, then you can close this ticket.

Changed 7 years ago by casella

comment:11 Changed 7 years ago by casella

  • Component changed from *unknown* to Run-time

comment:12 Changed 6 years ago by casella

  • Milestone changed from 1.12.0 to 1.13.0

Milestone changed to 1.13.0 since 1.12.0 was released

comment:13 Changed 6 years ago by wbraun

  • Resolution set to fixed
  • Status changed from reopened to closed

Fixed output in PR2677.

comment:14 Changed 6 years ago by casella

Thanks Willi!

Note: See TracTickets for help on using tickets.