Opened 12 years ago
Closed 10 years ago
#1967 closed defect (fixed)
Modelica.Mechanics.MultiBody.Examples.Loops.EngineV6
Reported by: | Christian Schubert | Owned by: | Willi Braun |
---|---|---|---|
Priority: | blocker | Milestone: | 1.9.1 |
Component: | Run-time | Version: | trunk |
Keywords: | simulation takes forever | Cc: | Lennart Ochel, Willi Braun, Adrian Pop, Martin Sjölund |
Description
Hi,
this is one of the last models from MultiBody (not counting Surfaces and UserDefinedGravity) that has a serious issue: The simulation crashes.
According to the executables debug output (-lv ...), the last words it says are:
99999999990000e-001 = t (f) events | info | Result := t events | info | Relation LESS: -0.00000000000000000000e+000 < 0.00000 000000000000000e+000 = f (f) events | info | Result := f events | info | Relation GREATER: 9.09090909090909280000e-001 > 9.300 00000000000050000e-001 = f (f) events | info | Result := f
Attachments (1)
Change History (32)
by , 12 years ago
comment:1 by , 12 years ago
Cc: | added |
---|
follow-up: 3 comment:2 by , 12 years ago
If I run the system with +d=solveinitialsystem, I get no error and a balanced initial set of equations (2085 eqns and vars).
Nevertheless, the simulation crashes. Running the exe with -lv LOG_INIT gives
init | info | ### START INITIALIZATION ### init | info | updating start-values | | | | | filter.x[2](start=0) | | | | | filter.x[1](start=0) init | info | initialization method: symbolic [solves the initialization problem symbolically] init | info | optimization method: nelder_mead_ex [Nelder-Mead method with global homotopy]
When the initial system is balanced why does it still try to run nelder_mead?
comment:3 by , 12 years ago
The optimization method (e.g. nelder_mead_ex) is meaningless if the initialization method is set to >>symbolic<<.
Replying to cschubert:
If I run the system with +d=solveinitialsystem, I get no error and a balanced initial set of equations (2085 eqns and vars).
Nevertheless, the simulation crashes. Running the exe with -lv LOG_INIT gives
init | info | ### START INITIALIZATION ### init | info | updating start-values | | | | | filter.x[2](start=0) | | | | | filter.x[1](start=0) init | info | initialization method: symbolic [solves the initialization problem symbolically] init | info | optimization method: nelder_mead_ex [Nelder-Mead method with global homotopy]When the initial system is balanced why does it still try to run nelder_mead?
comment:4 by , 11 years ago
Keywords: | takes forever added; crashes multibody removed |
---|---|
Priority: | high → blocker |
Currently the EngineV6 models take almost infinite time to run.
As this is a model which many people will try out to judge whether OpenModelica is worth looking into, it deserves the highest priority for being fixed.
Especially, as it used to work until the 24th of February 2013.
comment:5 by , 11 years ago
Cc: | added |
---|
comment:6 by , 11 years ago
Cc: | added |
---|
comment:7 by , 11 years ago
Cc: | added |
---|
comment:8 by , 11 years ago
Probably this is also an linux/windows or different compiler issue. Using windows I get:
DASSL-- ALL ELEMENTS OF RTOL AND ATOL ARE ZERO stdout | warning | The code has encountered trouble from which it cannot recover. stdout | warning | can't continue. time = 0.000000 stdout | info | model terminate | Integrator failed. | Simulation terminated at time 0 LOG_STATS | info | ### STATISTICS ### LOG_STATS | info | timer | | | | | 0.0101461s [ 17.3%] pre-initialization | | | | | 0.0271451s [ 46.4%] initialization | | | | | 0.0182034s [ 31.1%] steps | | | | | 0.0110415s [ 18.9%] creating output-file | | | | | 1.81117e-006s [ 0.0%] event-handling | | | | | 1.20744e-005s [ 0.0%] overhead | | | | | -0.00801078s [-13.7%] simulation | | | | | 0.0585393s [100.0%] total LOG_STATS | info | events | | | | | 0 state events | | | | | 0 time events LOG_STATS | info | solver | | | | | 0 steps taken | | | | | 0 calls of functionODE | | | | | 0 evaluations of jacobian | | | | | 0 error test failures | | | | | 0 convergence test failures LOG_STATS | info | ### END STATISTICS ###
Using linux the test succeeds (files equal):
LOG_STATS | info | ### STATISTICS ### LOG_STATS | info | timer | | | | | 0.00783451s [ 0.0%] pre-initialization | | | | | 0.0294539s [ 0.0%] initialization | | | | | 0.823272s [ 0.2%] steps | | | | | 0.237919s [ 0.1%] creating output-file | | | | | 3.3278s [ 0.8%] event-handling | | | | | 0.00339252s [ 0.0%] overhead | | | | | 398.787s [ 98.9%] simulation | | | | | 403.217s [100.0%] total LOG_STATS | info | events | | | | | 539 state events | | | | | 0 time events LOG_STATS | info | solver | | | | | 15164 steps taken | | | | | 22262 calls of functionODE | | | | | 4295 evaluations of jacobian | | | | | 379 error test failures | | | | | 0 convergence test failures LOG_STATS | info | ### END STATISTICS ###
Initialization seems to be okay. No warnings or errors occur during the initialization process. Also all states and derivatives are the same (for linux and windows).
comment:9 by , 11 years ago
Is EngineV6 working now? The new release should be able to simulate it properly. This is one of the models people try first.
comment:10 by , 11 years ago
I had a look at the model and want to post my progress.
I am comparing 1.9.0 beta 4 (1st Feb 2013) against the current trunk.
Execution time of executable with different stoptime gives on my Core i3:
stopTime;beta4;trunk 0.1;2.71;7.38 0.2;6.02;22.18 0.3;11.17;44.33 0.4;17.35;72.26 0.5;25.73;?
So you can see that simulation time increased massively. However, running with +lv=LOG_SOLVER, shows that still the same timesteps are taken.
Thus, it seems that:
a) the same results are being calculated
b) the increase must come from evaluating the ODE function.
I'll see if I can find out more.
(PS: I did NOT enable +d=doLinearTearing)
comment:11 by , 11 years ago
I looked into it, a little bit more using the Visual Studio 2012 Performance Analyzer.
Apparently, the current trunk spends roughly 90% of its time in lapack_win32_MT.dll.
In the 1.9.0 beta4 release, almost no time is spent on the linear system (_omc_dgesv_ accounts for 6% of overall time spent). Here the main task is _getAnalyticalJacobianNewton.
Therefore, I would guess that it has to do with Lapack. Looking at the changesets during the timespan where the test broke, it might be that
Revision: 15293 Author: wbraun Date: Sonntag, 24. Februar 2013 12:25:24 Message: - switched to lapack library instead using internal copy - fixed some compiling warning (clang) ---- Modified : /trunk/SimulationRuntime/c/openmodelica_func.h Modified : /trunk/SimulationRuntime/c/simulation/solver/dassl.c Modified : /trunk/SimulationRuntime/c/simulation/solver/linearSolverLapack.c Modified : /trunk/SimulationRuntime/c/simulation/solver/linearSolverLapack.h Modified : /trunk/SimulationRuntime/c/simulation/solver/model_help.c Modified : /trunk/SimulationRuntime/c/simulation/solver/solver_main.c
might be connected to that behaviour.
Willi, could you investigate that matter a little bit further?
comment:12 by , 11 years ago
I ran the built-in profiler on it :)
Modelica.Mechanics.MultiBody.Frames.resolve2
: 469434 calls, 1132.219405731 sec, 98.10% execution time
comment:13 by , 11 years ago
Weird. This function has InlineAfterIndexReduction = true
, it should not be part of generated code, should it? Did someone break the profiler? :)
comment:14 by , 11 years ago
OK. I think the profiler counted linear blocks as index=-1, which is the last function in the list. Should be slow linear solving, then :(
comment:16 by , 11 years ago
In old omc: linear system 5391 (size 452): 67294x, 88.5s, 89.63%
So something changed in MSL or the back-end that cases the linear system to be bigger.
comment:17 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
We found the reason for the larger linear system.
The equation
bearing.e = Modelica.Math.Vectors.normalizeWithAssert({1.0, 0.0, 0.0})
is not being simplified, although the function has the annotation Inline=true.
normalizeWithAssert consists of two lines, however:
assert(length(v) > 0); result := v/length(v);
and is therefore not inlined.
Volker is looking into that subject and will try to fix it.
comment:18 by , 11 years ago
"The time complexity of DGESV for an NxN matrix is O(N3)", so the time increase seems reasonable :(
comment:19 by , 11 years ago
Is strange that the function is not constant evaluated in the FrontEnd as its input is constant. I'll have a quick look.
comment:20 by , 11 years ago
It is not constant in the front-end: It is constant after symbolic manipulation. We could either evaluate the expression in an optimisation module or do inlining.
comment:21 by , 11 years ago
OK, is not constant, is actually a parameter that is given as input, so the FrontEnd does the correct thing (as is not a parameter with Evaluate=true).
comment:22 by , 11 years ago
After Volkers fix r17525, the size of the linear system decreased to 403 (instead of 452 in the old omc). This required allowing the inlining of functions with one assignment and one assert.
It seems that it has helped a little. Nevertheless, EngineV6 still requires 7m13s compared to 1m59s on 24th of February 2013. For EngineV6_analytic it is 2m7s (new) against 52s (old).
Martin, could you run the profiling on your computer again, please? So that we can see how much time is spent in the linear system now.
Any other ideas?
comment:23 by , 11 years ago
Still dominated by that big linear function:
- EngineV6: Linear function (size 404) 67064x, 184.431686947s, 93.20%
- EngineV6_analytic: Linear function (size 382) 22189x, 53.532767600s, 93.61%
comment:25 by , 10 years ago
This is still pretty much the same. Dymola EngineV6 in MSL 3.2.1 has linear system size 327, and 6 torn non-linear systems (10-1). OpenModelica has linear system size 404 and 6 torn non-linear systems (13-1).
For my laptop, Dymola:
CPU-time for integration : 11 seconds CPU-time for one GRID interval: 22 milli-seconds Number of result points : 1167 Number of GRID points : 501 Number of (successful) steps : 14639 Number of F-evaluations : 39453 Number of H-evaluations : 17677 Number of Jacobian-evaluations: 4103 Number of (model) time events : 0 Number of (U) time events : 0 Number of state events : 333 Number of step events : 0 Minimum integration stepsize : 3.01e-09 Maximum integration stepsize : 0.00126 Maximum integration order : 5
OpenModelica LOG_STATS:
0.603972s reading init.xml 0.578528s reading info.xml 0.0289822s pre-initialization 0.0155034s [ 0.0%] initialization 1.25805s [ 0.2%] steps 0.318949s [ 0.1%] creating output-file 4.46913s [ 0.8%] event-handling 0.00940738s [ 0.0%] overhead 523.043s [ 98.8%] simulation 529.143s [100.0%] total events 544 state events 0 time events solver: DASSL 15169 steps taken 21628 calls of functionODE 4300 evaluations of jacobian 379 error test failures 0 convergence test failures
The difference in integration time (48x) should be more than just the size of the linear system (404/327)^3
is only 1.89x slower.
So I think Dymola possibly calls the linear solver a fewer number of times than we do, or uses a sparse solver. I set the tolerance to be the same (1e-6). The total number of steps is similar, as is the number of state events.
comment:26 by , 10 years ago
Actually, Dymola reduces the linear system from 327 unknowns to just 36
Sizes of linear systems of equations: {2, 2, 8, 2, 2, 8, 2, 2, 8, 2, 2, 8, 2, 2, 8, 2, 2, 8, 327} Sizes after manipulation of the linear systems: {2, 2, 0, 2, 2, 0, 2, 2, 0, 2, 2, 0, 2, 2, 0, 2, 2, 0, 36}
However (404/36)3 is not the expected speed up as there comes additional overhead with tearing, i.e. calculating the matrices A and b of the reduced linear system A*x=b.
comment:27 by , 10 years ago
I still think a sparse solver could also do a good job here though. The density is something like 0.8%.
comment:28 by , 10 years ago
This is undoubtedly true.
In the end, tearing is just one of many methods for solving sparse systems of equations and it need not be the best in every case.
In contrast to "classical" sparse approaches, tearing makes use of the structure of the system (position of zero entries) which is already known during compile time.
comment:29 by , 10 years ago
A "good" (Modelica-specialized) sparse linear solver would also be able to take advantage of zero entries.
Because you can register the sparsity pattern in advance (so you have the indexes in the sparse matrix and the data of sparse matrix separate and just update the sparse matrix in every step).
The key is that you know you want to calculate this matrix many times but with different values in there.
comment:30 by , 10 years ago
Owner: | changed from | to
---|---|
Status: | assigned → accepted |
Perhaps we should activate doLinearTearing as default for 1.9.1, even if some other examples might failed it would at least improve the performance of this example:
LOG_STATS | info | ### STATISTICS ### LOG_STATS | info | timer | | | | | 0.715282s reading init.xml | | | | | 0.693358s reading info.xml | | | | | 0.00478853s pre-initialization | | | | | 0.00209845s [ 0.0%] initialization | | | | | 0.350783s [ 0.3%] steps | | | | | 0.0229478s [ 0.0%] creating output-file | | | | | 0.738157s [ 0.6%] event-handling | | | | | 0.00381507s [ 0.0%] overhead | | | | | 129.405s [ 99.1%] simulation | | | | | 130.528s [100.0%] total LOG_STATS | info | events | | | | | 549 state events | | | | | 0 time events LOG_STATS | info | solver: DASSL | | | | | 15174 steps taken | | | | | 21633 calls of functionODE | | | | | 4305 evaluations of jacobian | | | | | 379 error test failures | | | | | 0 convergence test failures LOG_STATS | info | ### END STATISTICS ###
Further you are right it's also worth to test sparse solvers like UMFPACK. We could start with the version 5.1 which is available under a GNU LGPL license. I'll have also a student, how is interested to work in this direction. So I'll try to take care on this ticket, now this with more pressure.
comment:31 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
In r21588 I activated doLinearTearing by default and now I've following performance measurement:
wbraun@###$./Modelica.Mechanics.MultiBody.Examples.Loops.EngineV6 -lv LOG_STATS LOG_STATS | info | ### STATISTICS ### LOG_STATS | info | timer | | | | | 0.749886s reading init.xml | | | | | 0.774778s reading info.xml | | | | | 0.00468146s pre-initialization | | | | | 0.0016162s [ 0.0%] initialization | | | | | 0.034482s [ 0.2%] steps | | | | | 0.0222729s [ 0.2%] creating output-file | | | | | 0.102994s [ 0.7%] event-handling | | | | | 0.00603958s [ 0.0%] overhead | | | | | 14.6146s [ 98.8%] simulation | | | | | 14.7867s [100.0%] total LOG_STATS | info | events | | | | | 551 state events | | | | | 0 time events LOG_STATS | info | solver: DASSL | | | | | 15176 steps taken | | | | | 21635 calls of functionODE | | | | | 4307 evaluations of jacobian | | | | | 379 error test failures | | | | | 0 convergence test failures LOG_STATS | info | ### END STATISTICS ###
So I'll close that ticket.
I ran the model with Visual Studio to debug the model a bit more.
It fails with a "Stack Overflow" and Visual Studio gives the following Call Stack
As the function _initialization is part of the Call Stack I think that there might be an issue.