Opened 4 years ago

Last modified 3 years ago

#6228 reopened defect

Updated SUNDIALS KINSOL failing on Cpp runtime

Reported by: Andreas Heuermann Owned by: Andreas Heuermann
Priority: blocker Milestone: 1.19.0
Component: Cpp Run-time Version: v1.17.0-dev
Keywords: sundials, kinsol Cc: Niklas Worschech

Description

With the update of SUNDIALS to 5.4.0 we got some new failing test from MSL with the Cpp runtime.

Let's look at the example Modelica.Mechanics.MultiBody.Examples.Loops.Fourbar1 from MSL 3.2.1:

loadModel(Modelica,{"3.2.1"}); getErrorString();
setCommandLineOptions("-d=newInst --simCodeTarget=Cpp"); getErrorString();
simulate(Modelica.Mechanics.MultiBody.Examples.Loops.Fourbar1); getErrorString();

KINSOL will fail with KIN_LSETUP_FAIL -11 The linear solver’s setup function failed in an unrecoverable manner.
The Cpp runtime before the SUNDIALS upgrade PR passed this test.

Change History (20)

comment:1 by Andreas Heuermann, 4 years ago

Solving the same model with the C runtime and KINSOL reveals that the C runtime will fail with the same error.
Testing with the C runtime on a commit before the PR was merged I get the same error.

So I would rule out a wrong implementation and we just note that KINSOL can't solve the non-linear system with the settings we have chosen.


What exactly will the Cpp runtime do?
At the moment the Cpp runtimes failes to solve the loop, but it was able to solve it before, so let's compare.

It will try with a bunch of different strategies for the non-linear solver and with different linear solver modules:

We are trying the non-linear methods KIN_NONE (basic Newton iteration) and KIN_LINESEARCH (Newton with globalization).

And then we have several linear solver modules tried in this order (and yes, always starting at 1).

  1. Try Dense
  2. Try Dense with scaling
  3. Try SPGMR (Scaled, Preconditioned, Generalized Minimum Residual iterative linear solver)
  4. Try SPGMR with scaling
  5. Try SPBCG (Scaled, Preconditioned, Generalized Minimum Residual iterative linear solver)
  6. Try SPBCG with scaling.

For the old version the loop gets solved with:

Try Dense --> failing
Try Dense with scaling --> failing
Try SPGMR --> success

And for every other step the dense solver is enough to solve the problem.


What are the differences to the older SUNDIALS version?

We could be missing preconditioning for the iterative linear solvers. In previous versions of SUNDIALS the call to different linear modules was different.

But I'm sure that SPGMR is not working as it should and that's the reason for this new error.

comment:2 by Francesco Casella, 4 years ago

@AnHeuermann, I already saw this kind of error like "failure to set linear solver" in other contexts. It was often related to systems that turned out to be singular, as if setting up the solver in that case could be problematic. But maybe it's something else.

Do you have any clue what that -11 error code actually means and what are the typical root causes of it? Unfortunately I don't.

Could it be that due to wrong initial guesses the initial Jacobian is singular and this prevents to set up the solver correctly?

comment:3 by Andreas Heuermann, 4 years ago

Okay, the good news: I solved it.
The bad news: I'm not sure why it is working now.

Problem 1:
I played around with initialization function for the linear iterative solver SPBCGS SUNLinSol_SPBCGS. In version v2.6.2 (was using function KINSpgmr) you would provide the maximum number of Krylov subspaces maxl to it, which we set to the dimension of the non-linear system _dimSys.
In version v.5.4.0 maxl is the number of linear iterations allowed. Is this the same as the Krylov subspaces? Probalby, I have to dig into the definition of the method to find out.

The problem was, I changed it from _dimSys to the default value 5. Changing this back solved the problem.

Problem 2:
When using SUNLinearSolver_SPBCGS together with KINSOL the user has to choose right preconditioning PRE_RIGHT, but we had it at PREC_NONE. But the strange thing is, that PREC_NONE works and PREC_RIGHT doesn't. I guess I'm still missing something for PREC_RIGHT and that it would indeed work better if used correctly. But I can't see what I'm missing and it's working at the moment.

Anyway, I'll create a PR for it.

comment:4 by Andreas Heuermann, 4 years ago

comment:5 by Francesco Casella, 4 years ago

The PR was successfully merged on master, please also port it on the 1.16.1 branch!

comment:6 by Andreas Heuermann, 4 years ago

I think 1.16.1 doesn't contain any SUNDIALS changes, so this should not be ported.

comment:7 by Adrian Pop, 4 years ago

Milestone: 1.16.11.17.0

comment:8 by Adrian Pop, 4 years ago

Really no porting of this to the maintenance branch.

in reply to:  6 comment:9 by Francesco Casella, 4 years ago

Replying to AnHeuermann:

I think 1.16.1 doesn't contain any SUNDIALS changes, so this should not be ported.

Of course, sorry, my fault.

comment:10 by Andreas Heuermann, 4 years ago

Resolution: fixed
Status: newclosed

When running Modelica.Media.Examples.Tests.MediaTestModels.IdealGases.SimpleNaturalGas from MSL 3.2.1 I get the following error:

record SimulationResult
    resultFile = "",
    simulationOptions = "startTime = 0.0, stopTime = 1.01, numberOfIntervals = 500, tolerance = 1e-06, method = 'dassl', fileNamePrefix = 'Modelica.Media.Examples.Tests.MediaTestModels.IdealGases.SimpleNaturalGas', options = '', outputFormat = 'mat', variableFilter = '.*', cflags = '', simflags = ''",
    messages = "Simulation execution failed for model: Modelica.Media.Examples.Tests.MediaTestModels.IdealGases.SimpleNaturalGas

[CVODE ERROR]  CVode
  The right-hand side routine failed at the first call.


SUNDIALS_ERROR: CVode() failed with flag = -9

ERROR  : solver: SimManager: Simulation stopped with errors before t = 1.010000
ERROR  : solver: SimManager: Internal error

This was running before the SUNDIALS upgrade.
I will investigate this.

Last edited 4 years ago by Andreas Heuermann (previous) (diff)

comment:11 by Andreas Heuermann, 4 years ago

Resolution: fixed
Status: closedreopened

comment:12 by Andreas Heuermann, 4 years ago

Cc: Niklas Worschech added

Apparently there is a wrongly generated C++ function

/*_omcQ_24DER_24Modelica_24PMedia_24PExamples_24PTests_24PMediaTestModels_24PIdealGases_24PSimpleNaturalGas_24Pvolume_24PMedium_24Ph_5F_5FTXRetType */ 
void Functions::_omcQ_24DER_24Modelica_24PMedia_24PExamples_24PTests_24PMediaTestModels_24PIdealGases_24PSimpleNaturalGas_24Pvolume_24PMedium_24Ph_5F_5FTX(double T_, BaseArray<double>& X_, bool exclEnthForm_, int refChoice_, double h_off_, double _omcQ_24Modelica_24PMedia_24PExamples_24PTests_24PMediaTestModels_24PIdealGases_24PSimpleNaturalGas_24Pvolume_24PMedium_24Ph_5FTX_24funDERT_, BaseArray<double>& _omcQ_24Modelica_24PMedia_24PExamples_24PTests_24PMediaTestModels_24PIdealGases_24PSimpleNaturalGas_24Pvolume_24PMedium_24Ph_5FTX_24funDERX_, double _omcQ_24Modelica_24PMedia_24PExamples_24PTests_24PMediaTestModels_24PIdealGases_24PSimpleNaturalGas_24Pvolume_24PMedium_24Ph_5FTX_24funDERh_5Foff_,_omcQ_24DER_24Modelica_24PMedia_24PExamples_24PTests_24PMediaTestModels_24PIdealGases_24PSimpleNaturalGas_24Pvolume_24PMedium_24Ph_5F_5FTXRetType & output )

in OMCppModelica.Media.Examples.Tests.MediaTestModels.IdealGases.SimpleNaturalGasFunctions.cpp.

It will do something like this:

[...]
int tmp51;
[...]
if (!tmp51) throw ModelicaSimulationError(MODEL_ARRAY_FUNCTION,"Internal error");
[...]

But commit ba83d81e5f didn't change the code generation for the C++ code. This wrong code is already there with the previous commit but no error is thrown. So I guess the function isn't evaluated at the older version.

But it is in fact my commit that causes the model to fail for the C++ runtime. But I don't understand how and why.

@niklwors Any idea what of the changes in ba83d81e5f could cause this?


Testing with the C runtime and CVODE works fine, but that was to be expected, since I don't see any changes to the CVODE implementation that are relevant to the error.

comment:13 by Andreas Heuermann, 4 years ago

My test script

loadModel(Modelica, {"3.2.1"}); getErrorString();

setCommandLineOptions("-d=newInst,-frontEndUnitCheck"); getErrorString();
setCommandLineOptions("--simCodeTarget=Cpp"); getErrorString();

simulate(Modelica.Media.Examples.Tests.MediaTestModels.IdealGases.SimpleNaturalGas); getErrorString();

comment:14 by Andreas Heuermann, 4 years ago

Btw. when removing the wrong line

if (!tmp51) throw ModelicaSimulationError(MODEL_ARRAY_FUNCTION,"Internal error");

everything works just fine with the current master (newer SUNDIALS). So I would say it's a problem that existed already and only came to show now.

comment:15 by Andreas Heuermann, 4 years ago

So I debugged a bit more together with @niklwors and we concluded that there is a bug in the templates that is independent of this SUNDIALS upgrade. It is pure luck if the model will simulate or not.

It boils down to this:
If you run if often enough (or your memory is set to 0 in enough places) the example will fail when int tm51 is initialized with 0.

The error is somewhere in funStatement in OMCompiler/Compiler/Template/CodegenCpp.tpl. For the C runtime the template to generate code is very similar and there this problem is not existent, so I guess it was fixed there already. I'll open a new ticket for this and fix it there.

comment:16 by Andreas Heuermann, 4 years ago

Open problems:

  • Modelica_3.2.1_cpp Modelica.Magnetic.FundamentalWave.Examples.BasicMachines.SMR_Inverter_MultiPhase has wrong simulation results, see feedback.y
  • From MSL trunk we have problems, e.g. Modelica.Electrical.Machines.Examples.InductionMachines.IMC_InverterDrive is CVodeGetIntegratorStats failing at the moment.

Problems of:

  • Modelica.Media.Examples.Tests.MediaTestModels.IdealGases.SimpleNaturalGas


will be addressed in ticket: #6241

in reply to:  16 comment:17 by Francesco Casella, 4 years ago

Replying to AnHeuermann:

Open problems:

  • Modelica_3.2.1_cpp Modelica.Magnetic.FundamentalWave.Examples.BasicMachines.SMR_Inverter_MultiPhase has wrong simulation results, see feedback.y

These are not really wrong results. If you look at the reference values, you see that the result is obvioulsy zero with some numerical noise superimposed, with amplitude around 1e-9. The OMC solver manages to keep the noise it below 5e-8 most of the time (which is apparently good enough not to trigger a test failure), except for a short interval where it gets to about 5e-7 for some reason. It is still purely numerical noise, that transient has no physical meaning, but it is three order of magnitude larger than what Dymola (which was used to produce the reference results) computes, which triggers the failure.

The problem is related to the fact that we have active error control on state variables, but not on other variables, particularly those related to the state derivatives, that tend to be noisy and have very small offset values close to steady state.

We had similar problems with the zero currents in 3-phase machine models, see MSL issues #3476 and #3472. In that case, we solved the problem by removing those references outright. We could also do it here, but I guess we should figure out why the solver accepts solution with such large errors.

Some time ago we fixed a similar issue with our in-house solver, see #6106. Maybe this is something similar, or related.

comment:18 by Francesco Casella, 4 years ago

Milestone: 1.17.01.18.0

Rescheduled to 1.18.0

comment:19 by Francesco Casella, 3 years ago

Milestone: 1.18.0

Ticket retargeted after milestone closed

comment:20 by Francesco Casella, 3 years ago

Milestone: 1.19.0

1.18.0 blocker tickets moved to 1.19.0

Note: See TracTickets for help on using tickets.