Opened 4 years ago
Last modified 3 years ago
#6228 reopened defect
Updated SUNDIALS KINSOL failing on Cpp runtime
Reported by: | Andreas Heuermann | Owned by: | Andreas Heuermann |
---|---|---|---|
Priority: | blocker | Milestone: | 1.19.0 |
Component: | Cpp Run-time | Version: | v1.17.0-dev |
Keywords: | sundials, kinsol | Cc: | Niklas Worschech |
Description
With the update of SUNDIALS to 5.4.0 we got some new failing test from MSL with the Cpp runtime.
Let's look at the example Modelica.Mechanics.MultiBody.Examples.Loops.Fourbar1 from MSL 3.2.1:
loadModel(Modelica,{"3.2.1"}); getErrorString(); setCommandLineOptions("-d=newInst --simCodeTarget=Cpp"); getErrorString(); simulate(Modelica.Mechanics.MultiBody.Examples.Loops.Fourbar1); getErrorString();
KINSOL will fail with KIN_LSETUP_FAIL -11 The linear solver’s setup function failed in an unrecoverable manner.
The Cpp runtime before the SUNDIALS upgrade PR passed this test.
Change History (20)
comment:1 by , 4 years ago
comment:2 by , 4 years ago
@AnHeuermann, I already saw this kind of error like "failure to set linear solver" in other contexts. It was often related to systems that turned out to be singular, as if setting up the solver in that case could be problematic. But maybe it's something else.
Do you have any clue what that -11 error code actually means and what are the typical root causes of it? Unfortunately I don't.
Could it be that due to wrong initial guesses the initial Jacobian is singular and this prevents to set up the solver correctly?
comment:3 by , 4 years ago
Okay, the good news: I solved it.
The bad news: I'm not sure why it is working now.
Problem 1:
I played around with initialization function for the linear iterative solver SPBCGS SUNLinSol_SPBCGS
. In version v2.6.2 (was using function KINSpgmr
) you would provide the maximum number of Krylov subspaces maxl
to it, which we set to the dimension of the non-linear system _dimSys
.
In version v.5.4.0 maxl
is the number of linear iterations allowed. Is this the same as the Krylov subspaces? Probalby, I have to dig into the definition of the method to find out.
The problem was, I changed it from _dimSys
to the default value 5
. Changing this back solved the problem.
Problem 2:
When using SUNLinearSolver_SPBCGS
together with KINSOL the user has to choose right preconditioning PRE_RIGHT
, but we had it at PREC_NONE
. But the strange thing is, that PREC_NONE
works and PREC_RIGHT
doesn't. I guess I'm still missing something for PREC_RIGHT
and that it would indeed work better if used correctly. But I can't see what I'm missing and it's working at the moment.
Anyway, I'll create a PR for it.
comment:5 by , 4 years ago
The PR was successfully merged on master, please also port it on the 1.16.1 branch!
follow-up: 9 comment:6 by , 4 years ago
I think 1.16.1 doesn't contain any SUNDIALS changes, so this should not be ported.
comment:7 by , 4 years ago
Milestone: | 1.16.1 → 1.17.0 |
---|
comment:9 by , 4 years ago
Replying to AnHeuermann:
I think 1.16.1 doesn't contain any SUNDIALS changes, so this should not be ported.
Of course, sorry, my fault.
comment:10 by , 4 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
When running Modelica.Media.Examples.Tests.MediaTestModels.IdealGases.SimpleNaturalGas
from MSL 3.2.1
I get the following error:
record SimulationResult resultFile = "", simulationOptions = "startTime = 0.0, stopTime = 1.01, numberOfIntervals = 500, tolerance = 1e-06, method = 'dassl', fileNamePrefix = 'Modelica.Media.Examples.Tests.MediaTestModels.IdealGases.SimpleNaturalGas', options = '', outputFormat = 'mat', variableFilter = '.*', cflags = '', simflags = ''", messages = "Simulation execution failed for model: Modelica.Media.Examples.Tests.MediaTestModels.IdealGases.SimpleNaturalGas [CVODE ERROR] CVode The right-hand side routine failed at the first call. SUNDIALS_ERROR: CVode() failed with flag = -9 ERROR : solver: SimManager: Simulation stopped with errors before t = 1.010000 ERROR : solver: SimManager: Internal error
This was running before the SUNDIALS upgrade.
I will investigate this.
comment:11 by , 4 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
comment:12 by , 4 years ago
Cc: | added |
---|
Apparently there is a wrongly generated C++ function
/*_omcQ_24DER_24Modelica_24PMedia_24PExamples_24PTests_24PMediaTestModels_24PIdealGases_24PSimpleNaturalGas_24Pvolume_24PMedium_24Ph_5F_5FTXRetType */ void Functions::_omcQ_24DER_24Modelica_24PMedia_24PExamples_24PTests_24PMediaTestModels_24PIdealGases_24PSimpleNaturalGas_24Pvolume_24PMedium_24Ph_5F_5FTX(double T_, BaseArray<double>& X_, bool exclEnthForm_, int refChoice_, double h_off_, double _omcQ_24Modelica_24PMedia_24PExamples_24PTests_24PMediaTestModels_24PIdealGases_24PSimpleNaturalGas_24Pvolume_24PMedium_24Ph_5FTX_24funDERT_, BaseArray<double>& _omcQ_24Modelica_24PMedia_24PExamples_24PTests_24PMediaTestModels_24PIdealGases_24PSimpleNaturalGas_24Pvolume_24PMedium_24Ph_5FTX_24funDERX_, double _omcQ_24Modelica_24PMedia_24PExamples_24PTests_24PMediaTestModels_24PIdealGases_24PSimpleNaturalGas_24Pvolume_24PMedium_24Ph_5FTX_24funDERh_5Foff_,_omcQ_24DER_24Modelica_24PMedia_24PExamples_24PTests_24PMediaTestModels_24PIdealGases_24PSimpleNaturalGas_24Pvolume_24PMedium_24Ph_5F_5FTXRetType & output )
in OMCppModelica.Media.Examples.Tests.MediaTestModels.IdealGases.SimpleNaturalGasFunctions.cpp
.
It will do something like this:
[...] int tmp51; [...] if (!tmp51) throw ModelicaSimulationError(MODEL_ARRAY_FUNCTION,"Internal error"); [...]
But commit ba83d81e5f didn't change the code generation for the C++ code. This wrong code is already there with the previous commit but no error is thrown. So I guess the function isn't evaluated at the older version.
But it is in fact my commit that causes the model to fail for the C++ runtime. But I don't understand how and why.
@niklwors Any idea what of the changes in ba83d81e5f could cause this?
Testing with the C runtime and CVODE works fine, but that was to be expected, since I don't see any changes to the CVODE implementation that are relevant to the error.
comment:13 by , 4 years ago
My test script
loadModel(Modelica, {"3.2.1"}); getErrorString(); setCommandLineOptions("-d=newInst,-frontEndUnitCheck"); getErrorString(); setCommandLineOptions("--simCodeTarget=Cpp"); getErrorString(); simulate(Modelica.Media.Examples.Tests.MediaTestModels.IdealGases.SimpleNaturalGas); getErrorString();
comment:14 by , 4 years ago
Btw. when removing the wrong line
if (!tmp51) throw ModelicaSimulationError(MODEL_ARRAY_FUNCTION,"Internal error");
everything works just fine with the current master (newer SUNDIALS). So I would say it's a problem that existed already and only came to show now.
comment:15 by , 4 years ago
So I debugged a bit more together with @niklwors and we concluded that there is a bug in the templates that is independent of this SUNDIALS upgrade. It is pure luck if the model will simulate or not.
It boils down to this:
If you run if often enough (or your memory is set to 0 in enough places) the example will fail when int tm51
is initialized with 0
.
The error is somewhere in funStatement
in OMCompiler/Compiler/Template/CodegenCpp.tpl
. For the C runtime the template to generate code is very similar and there this problem is not existent, so I guess it was fixed there already. I'll open a new ticket for this and fix it there.
follow-up: 17 comment:16 by , 4 years ago
Open problems:
- Modelica_3.2.1_cpp Modelica.Magnetic.FundamentalWave.Examples.BasicMachines.SMR_Inverter_MultiPhase has wrong simulation results, see feedback.y
- From MSL trunk we have problems, e.g. Modelica.Electrical.Machines.Examples.InductionMachines.IMC_InverterDrive is CVodeGetIntegratorStats failing at the moment.
Problems of:
- Modelica.Media.Examples.Tests.MediaTestModels.IdealGases.SimpleNaturalGas
will be addressed in ticket: #6241
comment:17 by , 4 years ago
Replying to AnHeuermann:
Open problems:
- Modelica_3.2.1_cpp Modelica.Magnetic.FundamentalWave.Examples.BasicMachines.SMR_Inverter_MultiPhase has wrong simulation results, see feedback.y
These are not really wrong results. If you look at the reference values, you see that the result is obvioulsy zero with some numerical noise superimposed, with amplitude around 1e-9. The OMC solver manages to keep the noise it below 5e-8 most of the time (which is apparently good enough not to trigger a test failure), except for a short interval where it gets to about 5e-7 for some reason. It is still purely numerical noise, that transient has no physical meaning, but it is three order of magnitude larger than what Dymola (which was used to produce the reference results) computes, which triggers the failure.
The problem is related to the fact that we have active error control on state variables, but not on other variables, particularly those related to the state derivatives, that tend to be noisy and have very small offset values close to steady state.
We had similar problems with the zero currents in 3-phase machine models, see MSL issues #3476 and #3472. In that case, we solved the problem by removing those references outright. We could also do it here, but I guess we should figure out why the solver accepts solution with such large errors.
Some time ago we fixed a similar issue with our in-house solver, see #6106. Maybe this is something similar, or related.
Solving the same model with the C runtime and KINSOL reveals that the C runtime will fail with the same error.
Testing with the C runtime on a commit before the PR was merged I get the same error.
So I would rule out a wrong implementation and we just note that KINSOL can't solve the non-linear system with the settings we have chosen.
What exactly will the Cpp runtime do?
At the moment the Cpp runtimes failes to solve the loop, but it was able to solve it before, so let's compare.
It will try with a bunch of different strategies for the non-linear solver and with different linear solver modules:
We are trying the non-linear methods
KIN_NONE
(basic Newton iteration) andKIN_LINESEARCH
(Newton with globalization).And then we have several linear solver modules tried in this order (and yes, always starting at 1).
For the old version the loop gets solved with:
And for every other step the dense solver is enough to solve the problem.
What are the differences to the older SUNDIALS version?
We could be missing preconditioning for the iterative linear solvers. In previous versions of SUNDIALS the call to different linear modules was different.
But I'm sure that SPGMR is not working as it should and that's the reason for this new error.