Opened 4 years ago
Last modified 3 years ago
#6409 assigned defect
Simulation is not stopped when an error happens
Reported by: | Owned by: | Andreas Heuermann | |
---|---|---|---|
Priority: | high | Milestone: | |
Component: | Run-time | Version: | 1.18.0-dev |
Keywords: | Cc: |
Description
My "Network" simulation, in the output tab, gives these errors:
Error solving nonlinear system 6721 at time 40.5601 nonlinear system 6721 fails: at t=40.5601 Fatal error on line 1118, column 11: tag mismatch
but the process Network.exe is not stopped after this, it continues running. This is not so clear in the interface (see screenshot) because the progress bar remains stuck at 0%.
OMEdit allows to start a new try of the simulation (so it does not recognize that the process is still running), but the compilation an raises an error because it can't overwrite Network.exe.
C:\PROGRA~1\OPENMO~2.0-D\tools\msys\mingw64\bin\ld: cannot open output file Network.exe: Permission denied clang: error: unable to remove file: Permission denied clang: error: linker command failed with exit code 1 (use -v to see invocation) mingw32-make: *** [Network.makefile:39: omc_main_target] Error 1 mingw32-make: Leaving directory 'C:/Users/simon/AppData/Local/Temp/OPENMO~1/OMEdit/Net~1.NET' Compilation process failed. Exited with code 2.
OpenModelica v1.18.0-dev-132-gc048acb712 (64-bit)
Windows 10 64-bit
Attachments (1)
Change History (26)
by , 4 years ago
Attachment: | Immagine.png added |
---|
comment:1 by , 4 years ago
Milestone: | NeedsInput → 1.18.0 |
---|---|
Owner: | changed from | to
Status: | new → assigned |
comment:2 by , 4 years ago
It's stuck at 0% because the final time is around 7000 seconds.
Setting it to 50, the simulation is stuck at 68%.
comment:3 by , 4 years ago
Owner: | changed from | to
---|
@AnHeuermann, this tag mismatch error looks familiar, you fixed something similar a recently. Any idea what this could be?
@simone, try selecting "kinsol" as nonlinear solver in the simulation options.
follow-up: 5 comment:4 by , 4 years ago
I tried the same model with Kinsol and, when the simulation fails, Network.exe is stopped correctly.
Just to clarify: the model that I run has some problems. I opened the ticket because, if I forget to manually kill the simulation, it continues running indefinitely, consuming an increasing amount of RAM.
I tried the today's build too (OpenModelica v1.18.0-dev-133-gabdf40bd71) and the behaviour is the same.
comment:5 by , 4 years ago
Replying to Simone Bosotti <simone1.bosotti@…>:
I tried the same model with Kinsol and, when the simulation fails, Network.exe is stopped correctly.
OK, so we have a problem with the default solver. In the meantime, please use kinsol.
follow-up: 9 comment:6 by , 4 years ago
@Simone Can you provide us with the test example?
If not can you try to run your model from command line and see if the simulation terminates after the error is thrown.
I'm not sure if it is a problem of the C runtime or with OMEdit and where the fatal error is thrown.
You can create a mos file test.mos
that is similar to
loadModel(Modelica, {"3.2.3"}); getErrorString(); loadFile("pathToYourModelFile.mo"); getErrorString(); simulate(Network, simflags="-lv=LOG_STATS"); getErrorString();
and run it from CMD
>"C:\Program Files\OpenModelica1.18.0-dev-64bit\bin\omc.exe" test.mos
comment:7 by , 4 years ago
@Simone. And can you add the log flag LOG_NLS_V
to a simulation run and add the output to this ticket?
You can add it in the Simulation Setup->Simulation Flags->Logging and choose LOG_NLS_V.
comment:8 by , 4 years ago
It seems the error message
Error solving nonlinear system 6721 at time 40.5601
is from function void printErrorEqSyst(equationSystemError err, EQUATION_INFO eq, double time)
.
This function is used in the Newton and the Hybrid solver, but not in Kinsol. When the simulation is in the specific case for that error message the loop re-trying with newton should give up. I can say more when I have a working minimal example or see the logs.
comment:9 by , 4 years ago
Replying to AnHeuermann:
If not can you try to run your model from command line and see if the simulation terminates after the error is thrown.
In command line I see only
true "" true true ""
And the exe is not stopped. After manually killing the exe from task manager, the only warnings I get are:
"Warning: The model contains alias variables with redundant start and/or conflicting nominal values. It is recommended to resolve the conflicts, because otherwise the system could be hard to solve. To print the conflicting alias sets and the chosen candidates please use -d=aliasConflicts. Warning: There are nonlinear iteration variables with default zero start attribute found in NLSJac150. For more information set -d=initialization. In OMEdit Tools->Options->Simulation->Show additional information from the initialization process, in OMNotebook call setCommandLineOptions(\"-d=initialization\"). Warning: There are nonlinear iteration variables with default zero start attribute found in NLSJac151. For more information set -d=initialization. In OMEdit Tools->Options->Simulation->Show additional information from the initialization process, in OMNotebook call setCommandLineOptions(\"-d=initialization\"). Warning: There are nonlinear iteration variables with default zero start attribute found in NLSJac152. For more information set -d=initialization. In OMEdit Tools->Options->Simulation->Show additional information from the initialization process, in OMNotebook call setCommandLineOptions(\"-d=initialization\"). Warning: There are nonlinear iteration variables with default zero start attribute found in NLSJac153. For more information set -d=initialization. In OMEdit Tools->Options->Simulation->Show additional information from the initialization process, in OMNotebook call setCommandLineOptions(\"-d=initialization\"). Warning: There are nonlinear iteration variables with default zero start attribute found in NLSJac154. For more information set -d=initialization. In OMEdit Tools->Options->Simulation->Show additional information from the initialization process, in OMNotebook call setCommandLineOptions(\"-d=initialization\"). Warning: There are nonlinear iteration variables with default zero start attribute found in NLSJac190. For more information set -d=initialization. In OMEdit Tools->Options->Simulation->Show additional information from the initialization process, in OMNotebook call setCommandLineOptions(\"-d=initialization\"). Warning: The initial conditions are not fully specified. For more information set -d=initialization. In OMEdit Tools->Options->Simulation->Show additional information from the initialization process, in OMNotebook call setCommandLineOptions(\"-d=initialization\"). "
It generated also a 250 MB log file in about 2 minutes of executions. It contains the same lines repeated thousands of times:
assert | warning | The following assertion has been violated at time 35.860495 assert | debug | Model error: Argument of sqrt(station.TC4.compressor.b ^ 2.0 - 4.0 * station.TC4.compressor.a * station.TC4.compressor.c) was -0.0222076 should be >= 0
-0.0222076 is not always the same, but it assumes only a few different values.
And can you add the log flag LOG_NLS_V to a simulation run and add the output to this ticket?
The log is huge (120+ MB) and I don't know if I can share it.
follow-up: 11 comment:10 by , 4 years ago
Yes, the log files are quite huge for decent models with LOG_NLS_V enabled.
It seems that the simulation is in some endless-loop. An assert is triggered, that is caught somewhere and after a few number of re-tries the simulation should stop. Apparently that is not the case.
I have some guesses where this happens but am not sure, so I would need some example to test that.
Can you search for
Error solving nonlinear system
and post what ever comes directly after that message? I'm that something like
### No Solution! ### after XXX restarts"
is in the logs. I would need the next few lines to see in which case the simulation is.
comment:11 by , 4 years ago
Can you search for
Error solving nonlinear systemand post what ever comes directly after that message? I'm that something like
### No Solution! ### after XXX restarts"is in the logs. I would need the next few lines to see in which case the simulation is.
Sure. Here it is:
Error solving nonlinear system 17482 at time 35.8605 ### No Solution! ### after 144 restarts nls status variables [1] station.TC3.compressor.w = 0.00000000000000000000e+00 - scaling factor internal = 1.0002976983495444e+00 - scaling factor external = 1.0000000000000000e+01 [2] station.TC5.compressor.w = 0.00000000000000000000e+00 - scaling factor internal = 1.0002976983484608e+00 - scaling factor external = 1.0000000000000000e+01 [3] station.w = -8.35545943060227273236e+02 - scaling factor internal = 3.2817675810719258e+01 - scaling factor external = 8.3554594306022727e+02 [4] station.TC3.compressor.QStar = 0.00000000000000000000e+00 - scaling factor internal = 2.7777777778024495e+00 - scaling factor external = 1.0000000000000000e+00 [5] station.TC5.compressor.QStar = 0.00000000000000000000e+00 - scaling factor internal = 2.7777777778024495e+00 - scaling factor external = 1.0000000000000000e+00 [6] station.TC5.compressor.NStar = 1.00000000000000022204e+00 - scaling factor internal = 6.5000000011373131e+03 - scaling factor external = 1.0000000000000002e+00 [7] station.TC4.compressor.QStar = -7.33961231823885107417e+00 - scaling factor internal = 1.3474331859763883e+04 - scaling factor external = 7.3396123182388511e+00 [8] station.TC4.compressor.NStar = -1.43294356283598456336e+00 - scaling factor internal = 7.1284539836064665e+04 - scaling factor external = 1.4329435628359846e+00 [9] station.TC3.compressor.NStar = 1.00000000000000022204e+00 - scaling factor internal = 6.5000000000399677e+03 - scaling factor external = 1.0000000000000002e+00 [10] station.TC3.NStar = 1.00000000000000000000e+00 - scaling factor internal = 1.0000000000000000e+00 - scaling factor external = 1.0000000000000000e+00 [11] station.generalDistance = 3.05191524646234704221e+01 - scaling factor internal = 9.9999999988897770e-01 - scaling factor external = 3.0519152464623470e+01 [12] station.TC4.NStar = -1.43294356283598411927e+00 - scaling factor internal = 1.0000000000000000e+00 - scaling factor external = 1.4329435628359841e+00 [13] station.TC5.NStar = 1.00000000000000000000e+00 - scaling factor internal = 1.0000000000000000e+00 - scaling factor external = 1.0000000000000000e+00 [14] station.s = 6.37957568387626849926e-01 - scaling factor internal = 6.4999999996034103e+06 - scaling factor external = 1.0000000000000000e+00 [15] station.r = -4.97265043890755364941e+00 - scaling factor internal = 2.4999999993005596e+02 - scaling factor external = 4.9726504389075536e+00 functions res[1] = 2.92978726433696677235e-17 [scaling factor = 0.0000000000000000e+00] res[2] = 4.77484718430787324905e-12 [scaling factor = 0.0000000000000000e+00] res[3] = -2.51457095146179199219e-08 [scaling factor = 0.0000000000000000e+00] res[4] = 1.56490872287542467836e-15 [scaling factor = 0.0000000000000000e+00] res[5] = 1.38780075989300090413e-17 [scaling factor = 0.0000000000000000e+00] res[6] = 0.00000000000000000000e+00 [scaling factor = 0.0000000000000000e+00] res[7] = 0.00000000000000000000e+00 [scaling factor = 0.0000000000000000e+00] res[8] = -7.10542735760100185871e-14 [scaling factor = 0.0000000000000000e+00] res[9] = 2.66970347102387640916e+04 [scaling factor = 0.0000000000000000e+00] res[10] = -3.55271367880050092936e-15 [scaling factor = 0.0000000000000000e+00] res[11] = 3.27231844468870320952e+01 [scaling factor = 0.0000000000000000e+00] res[12] = -7.26431608200073242188e-08 [scaling factor = 0.0000000000000000e+00] res[13] = 1.51847372163710067374e-15 [scaling factor = 0.0000000000000000e+00] res[14] = 0.00000000000000000000e+00 [scaling factor = 0.0000000000000000e+00] res[15] = 0.00000000000000000000e+00 [scaling factor = 0.0000000000000000e+00] statistics nfunc = 51 error = 1.00000000000000000000e+00 error_scaled = 1.00000000000000000000e+00 Solution status: FAILED number of iterations : 1086 number of function evaluations : 17524 number of jacobian evaluations : 1065 solution values: [ 1] station.TC3.compressor.w = 0 [ 2] station.TC5.compressor.w = 0 [ 3] station.w = -835.54594 [ 4] station.TC3.compressor.QStar = 0 [ 5] station.TC5.compressor.QStar = 0 [ 6] station.TC5.compressor.NStar = 1 [ 7] station.TC4.compressor.QStar = -7.3396123 [ 8] station.TC4.compressor.NStar = -1.4329436 [ 9] station.TC3.compressor.NStar = 1 [10] station.TC3.NStar = 1 [11] station.generalDistance = 30.519152 [12] station.TC4.NStar = -1.4329436 [13] station.TC5.NStar = 1 [14] station.s = 0.63795757 [15] station.r = -4.9726504 nonlinear system 17482 fails: at t=35.8605 [1] Real station.TC3.compressor.w(start=0, nominal=10) [2] Real station.TC5.compressor.w(start=0, nominal=10) [3] Real station.w(start=0, nominal=10) [4] Real station.TC3.compressor.QStar(start=0, nominal=1) [5] Real station.TC5.compressor.QStar(start=0, nominal=1) [6] Real station.TC5.compressor.NStar(start=1, nominal=1) [7] Real station.TC4.compressor.QStar(start=0, nominal=1) [8] Real station.TC4.compressor.NStar(start=1, nominal=1) [9] Real station.TC3.compressor.NStar(start=1, nominal=1) [10] Real station.TC3.NStar(start=1, nominal=1) [11] Real station.generalDistance(start=0, nominal=1) [12] Real station.TC4.NStar(start=1, nominal=1) [13] Real station.TC5.NStar(start=1, nominal=1) [14] Real station.s(start=0, nominal=1) [15] Real station.r(start=0, nominal=1)
comment:12 by , 4 years ago
Wow, we have a while loop spanning over 600 lines of code for solveHybrd()
. That's not very nice.
We need to re-do the nonlinear hybrid solver. It has like 6 different counters for number of retries and try and catch blocks all over the place.
I added a break after the error message. But I'm not 100% sure that this enough.
We are using the function solveHybrd
in the nonlinear homotopy solver as well. And there it is not clear to me what happens with the return value.
The homotopy solver has a similar issue: Giant unclear while loops.
See PR https://github.com/OpenModelica/OpenModelica/pull/7288
follow-up: 14 comment:13 by , 4 years ago
PR 7288 is now merged. The next nightly build (tomorrow) will have this change.
@Simone Can you try with one of the next nightly builds and check if that solves your issue?
comment:14 by , 4 years ago
Replying to AnHeuermann:
PR 7288 is now merged. The next nightly build (tomorrow) will have this change.
@Simone Can you try with one of the next nightly builds and check if that solves your issue?
I'm sorry, but nothing changed. The execution of the exe continues indefinitely.
This is the new output of the LOG_NLS_V.
Error solving nonlinear system 17518 at time 35.8605 ### No Solution! ### after 144 restarts nls status variables [1] station.TC3.compressor.w = 0.00000000000000000000e+00 - scaling factor internal = 1.0002976983484608e+00 - scaling factor external = 1.0000000000000000e+01 [2] station.TC5.compressor.w = 0.00000000000000000000e+00 - scaling factor internal = 1.0002976983484608e+00 - scaling factor external = 1.0000000000000000e+01 [3] station.w = -8.35545940552257093259e+02 - scaling factor internal = 3.2817675810719258e+01 - scaling factor external = 8.3554594055225709e+02 [4] station.TC3.compressor.QStar = 0.00000000000000000000e+00 - scaling factor internal = 2.7777777778024495e+00 - scaling factor external = 1.0000000000000000e+00 [5] station.TC5.compressor.QStar = 0.00000000000000000000e+00 - scaling factor internal = 2.7777777778024495e+00 - scaling factor external = 1.0000000000000000e+00 [6] station.TC5.compressor.NStar = 1.00000000000000022204e+00 - scaling factor internal = 6.5000000011373131e+03 - scaling factor external = 1.0000000000000002e+00 [7] station.TC4.compressor.QStar = -7.33961229611659415895e+00 - scaling factor internal = 1.3474331845211967e+04 - scaling factor external = 7.3396122961165942e+00 [8] station.TC4.compressor.NStar = -1.43294355807376572187e+00 - scaling factor internal = 7.1284539836064665e+04 - scaling factor external = 1.4329435580737657e+00 [9] station.TC3.compressor.NStar = 1.00000000000000022204e+00 - scaling factor internal = 6.5000000000399677e+03 - scaling factor external = 1.0000000000000002e+00 [10] station.TC3.NStar = 1.00000000000000000000e+00 - scaling factor internal = 1.0000000000000000e+00 - scaling factor external = 1.0000000000000000e+00 [11] station.generalDistance = 3.05191522650963378283e+01 - scaling factor internal = 9.9999999988897770e-01 - scaling factor external = 3.0519152265096338e+01 [12] station.TC4.NStar = -1.43294355807377704615e+00 - scaling factor internal = 1.0000000000000000e+00 - scaling factor external = 1.4329435580737770e+00 [13] station.TC5.NStar = 1.00000000000000000000e+00 - scaling factor internal = 1.0000000000000000e+00 - scaling factor external = 1.0000000000000000e+00 [14] station.s = 6.37957568377593875475e-01 - scaling factor internal = 6.4999999996034103e+06 - scaling factor external = 1.0000000000000000e+00 [15] station.r = -4.97265042887570807295e+00 - scaling factor internal = 2.4999999993005596e+02 - scaling factor external = 4.9726504288757081e+00 functions res[1] = 4.93434652508476323902e-17 [scaling factor = 0.0000000000000000e+00] res[2] = 4.88853402202948927879e-12 [scaling factor = 0.0000000000000000e+00] res[3] = -2.51457095146179199219e-08 [scaling factor = 0.0000000000000000e+00] res[4] = 4.62596133086742261756e-17 [scaling factor = 0.0000000000000000e+00] res[5] = -4.63451217237824686130e-16 [scaling factor = 0.0000000000000000e+00] res[6] = 0.00000000000000000000e+00 [scaling factor = 0.0000000000000000e+00] res[7] = 0.00000000000000000000e+00 [scaling factor = 0.0000000000000000e+00] res[8] = 2.66970344779274091707e+04 [scaling factor = 0.0000000000000000e+00] res[9] = 1.42108547152020037174e-14 [scaling factor = 0.0000000000000000e+00] res[10] = -8.88178419700125232339e-16 [scaling factor = 0.0000000000000000e+00] res[11] = 3.27231841520902904108e+01 [scaling factor = 0.0000000000000000e+00] res[12] = -7.26431608200073242188e-08 [scaling factor = 0.0000000000000000e+00] res[13] = -3.04479229760059189515e-16 [scaling factor = 0.0000000000000000e+00] res[14] = 0.00000000000000000000e+00 [scaling factor = 0.0000000000000000e+00] res[15] = 0.00000000000000000000e+00 [scaling factor = 0.0000000000000000e+00] statistics nfunc = 51 error = 1.00000000000000000000e+00 error_scaled = 1.00000000000000000000e+00 Solution status: FAILED number of iterations : 1086 number of function evaluations : 17524 number of jacobian evaluations : 1065 solution values: [ 1] station.TC3.compressor.w = 0 [ 2] station.TC5.compressor.w = 0 [ 3] station.w = -835.54594 [ 4] station.TC3.compressor.QStar = 0 [ 5] station.TC5.compressor.QStar = 0 [ 6] station.TC5.compressor.NStar = 1 [ 7] station.TC4.compressor.QStar = -7.3396123 [ 8] station.TC4.compressor.NStar = -1.4329436 [ 9] station.TC3.compressor.NStar = 1 [10] station.TC3.NStar = 1 [11] station.generalDistance = 30.519152 [12] station.TC4.NStar = -1.4329436 [13] station.TC5.NStar = 1 [14] station.s = 0.63795757 [15] station.r = -4.9726504 nonlinear system 17518 fails: at t=35.8605 [1] Real station.TC3.compressor.w(start=0, nominal=10) [2] Real station.TC5.compressor.w(start=0, nominal=10) [3] Real station.w(start=0, nominal=10) [4] Real station.TC3.compressor.QStar(start=0, nominal=1) [5] Real station.TC5.compressor.QStar(start=0, nominal=1) [6] Real station.TC5.compressor.NStar(start=1, nominal=1) [7] Real station.TC4.compressor.QStar(start=0, nominal=1) [8] Real station.TC4.compressor.NStar(start=1, nominal=1) [9] Real station.TC3.compressor.NStar(start=1, nominal=1) [10] Real station.TC3.NStar(start=1, nominal=1) [11] Real station.generalDistance(start=0, nominal=1) [12] Real station.TC4.NStar(start=1, nominal=1) [13] Real station.TC5.NStar(start=1, nominal=1) [14] Real station.s(start=0, nominal=1) [15] Real station.r(start=0, nominal=1)
Version: OpenModelica v1.18.0-dev-146-gf50e95b736 (64-bit)
comment:16 by , 4 years ago
Component: | *unknown* → Run-time |
---|
follow-up: 18 comment:17 by , 4 years ago
I'm afraid I need to have an example to fix the error. I don't see why it won't stop and need to jump through it with a debugger.
@Simone Is it possible for you to share the example with us?
If it can't be made public you could send it to us via mail (zip file with password or gpg encrypted with our public OpenModelica key).
comment:18 by , 4 years ago
Replying to AnHeuermann:
@Simone Is it possible for you to share the example with us?
If it can't be made public you could send it to us via mail (zip file with password or gpg encrypted with our public OpenModelica key).
Ok, I've asked and I can send the file via email.
Since I don't need encryption, it's enough to send it as attachment to openmodelica@ ida.liu.se ?
follow-up: 21 comment:20 by , 4 years ago
@Simone I wrote you an E-Mail last week. I'll need an additional library to run your example to solve this ticket.
comment:21 by , 4 years ago
Replying to AnHeuermann:
@Simone I wrote you an E-Mail last week. I'll need an additional library to run your example to solve this ticket.
I replied to that one week ago. I retried now, let me know if it's ok.
comment:22 by , 4 years ago
So now it's the brute force approach. Just throw an error to stop the simulation.
See https://github.com/OpenModelica/OpenModelica/pull/7321
comment:23 by , 4 years ago
Problem is in DASSL and not the hybrid solver. I'll need to come up with a better solution.
Replying to Simone Bosotti <simone1.bosotti@…>:
This is strange. If the simulation has already reached time = 40.560, why is the process bar still at 0%? What is the final time? Can you set it to, say 50, and see what happens?
I'll assign this to somebody when the root cause becomes clearer.
This is a known issue, see #5974. We should eventually take care of that.