Opened 12 years ago

Closed 10 years ago

#1967 closed defect (fixed)

Modelica.Mechanics.MultiBody.Examples.Loops.EngineV6

Reported by: cschubert Owned by: wbraun
Priority: blocker Milestone: 1.9.1
Component: Run-time Version: trunk
Keywords: simulation takes forever Cc: lochel, wbraun, adrpo, sjoelund.se

Description

Hi,

this is one of the last models from MultiBody (not counting Surfaces and UserDefinedGravity) that has a serious issue: The simulation crashes.
According to the executables debug output (-lv ...), the last words it says are:

99999999990000e-001 = t (f)
events        | info    | Result := t
events        | info    | Relation LESS: -0.00000000000000000000e+000 < 0.00000
000000000000000e+000 = f (f)
events        | info    | Result := f
events        | info    | Relation GREATER:  9.09090909090909280000e-001 > 9.300
00000000000050000e-001 = f (f)
events        | info    | Result := f

Attachments (1)

log.txt (3.7 KB) - added by cschubert 12 years ago.

Download all attachments as: .zip

Change History (32)

Changed 12 years ago by cschubert

comment:1 Changed 12 years ago by lochel

  • Cc lochel added

I ran the model with Visual Studio to debug the model a bit more.
It fails with a "Stack Overflow" and Visual Studio gives the following Call Stack

 	Modelica.Mechanics.MultiBody.Examples.Loops.EngineV6.exe!_chkstk()  Line 99	Asm
 	Modelica.Mechanics.MultiBody.Examples.Loops.EngineV6.exe!functionDAE(DATA * data)  Line 34574 + 0x9 bytes	C++
 	Modelica.Mechanics.MultiBody.Examples.Loops.EngineV6.exe!_updateDiscreteSystem()  + 0x30 bytes	C
 	Modelica.Mechanics.MultiBody.Examples.Loops.EngineV6.exe!_reportResidualValue()  + 0x731 bytes	C
>	Modelica.Mechanics.MultiBody.Examples.Loops.EngineV6.exe!_initialization()  + 0x3df bytes	C
 	Modelica.Mechanics.MultiBody.Examples.Loops.EngineV6.exe!_solver_main()  + 0x1db bytes	C
 	Modelica.Mechanics.MultiBody.Examples.Loops.EngineV6.exe!_rt_accumulate()  + 0x10 bytes	C
 	Modelica.Mechanics.MultiBody.Examples.Loops.EngineV6.exe!callSolver()  + 0xc3e bytes	C++
 	Modelica.Mechanics.MultiBody.Examples.Loops.EngineV6.exe!startNonInteractiveSimulation()  + 0x68d bytes	C++
 	Modelica.Mechanics.MultiBody.Examples.Loops.EngineV6.exe!__main_SimulationRuntime()  + 0x4d bytes	C++
 	Modelica.Mechanics.MultiBody.Examples.Loops.EngineV6.exe!main(int argc, char * * argv)  Line 37204 + 0x14 bytes	C++
 	Modelica.Mechanics.MultiBody.Examples.Loops.EngineV6.exe!__tmainCRTStartup()  Line 555 + 0x17 bytes	C
 	kernel32.dll!74c133aa() 	
 	[Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]	
 	ntdll.dll!77179ef2() 	
 	ntdll.dll!77179ec5() 	

As the function _initialization is part of the Call Stack I think that there might be an issue.

comment:2 follow-up: Changed 12 years ago by cschubert

If I run the system with +d=solveinitialsystem, I get no error and a balanced initial set of equations (2085 eqns and vars).
Nevertheless, the simulation crashes. Running the exe with -lv LOG_INIT gives

init          | info    | ### START INITIALIZATION ###
init          | info    | updating start-values
|             | |       | | filter.x[2](start=0)
|             | |       | | filter.x[1](start=0)
init          | info    | initialization method: symbolic        [solves the initialization problem symbolically]
init          | info    | optimization method:   nelder_mead_ex  [Nelder-Mead method with global homotopy]

When the initial system is balanced why does it still try to run nelder_mead?

comment:3 in reply to: ↑ 2 Changed 12 years ago by lochel

The optimization method (e.g. nelder_mead_ex) is meaningless if the initialization method is set to >>symbolic<<.

Replying to cschubert:

If I run the system with +d=solveinitialsystem, I get no error and a balanced initial set of equations (2085 eqns and vars).
Nevertheless, the simulation crashes. Running the exe with -lv LOG_INIT gives

init          | info    | ### START INITIALIZATION ###
init          | info    | updating start-values
|             | |       | | filter.x[2](start=0)
|             | |       | | filter.x[1](start=0)
init          | info    | initialization method: symbolic        [solves the initialization problem symbolically]
init          | info    | optimization method:   nelder_mead_ex  [Nelder-Mead method with global homotopy]

When the initial system is balanced why does it still try to run nelder_mead?

comment:4 Changed 11 years ago by cshubert

  • Keywords takes forever added; crashes multibody removed
  • Priority changed from high to blocker

Currently the EngineV6 models take almost infinite time to run.

As this is a model which many people will try out to judge whether OpenModelica is worth looking into, it deserves the highest priority for being fixed.

Especially, as it used to work until the 24th of February 2013.

comment:5 Changed 11 years ago by wbraun

  • Cc wbraun added

comment:6 Changed 11 years ago by adrpo

  • Cc adrpo added

comment:7 Changed 11 years ago by sjoelund.se

  • Cc sjoelund.se added

comment:8 Changed 11 years ago by lochel

Probably this is also an linux/windows or different compiler issue. Using windows I get:

 DASSL--  ALL ELEMENTS OF RTOL AND ATOL ARE ZERO
stdout            | warning | The code has encountered trouble from which it cannot recover.
stdout            | warning | can't continue. time = 0.000000
stdout            | info    | model terminate | Integrator failed. | Simulation terminated at time 0
LOG_STATS         | info    | ### STATISTICS ###
LOG_STATS         | info    | timer
|                 | |       | |    0.0101461s [ 17.3%] pre-initialization
|                 | |       | |    0.0271451s [ 46.4%] initialization
|                 | |       | |    0.0182034s [ 31.1%] steps
|                 | |       | |    0.0110415s [ 18.9%] creating output-file
|                 | |       | | 1.81117e-006s [  0.0%] event-handling
|                 | |       | | 1.20744e-005s [  0.0%] overhead
|                 | |       | |  -0.00801078s [-13.7%] simulation
|                 | |       | |    0.0585393s [100.0%] total
LOG_STATS         | info    | events
|                 | |       | |     0 state events
|                 | |       | |     0 time events
LOG_STATS         | info    | solver
|                 | |       | |     0 steps taken
|                 | |       | |     0 calls of functionODE
|                 | |       | |     0 evaluations of jacobian
|                 | |       | |     0 error test failures
|                 | |       | |     0 convergence test failures
LOG_STATS         | info    | ### END STATISTICS ###

Using linux the test succeeds (files equal):

LOG_STATS         | info    | ### STATISTICS ###
LOG_STATS         | info    | timer
|                 | |       | |   0.00783451s [  0.0%] pre-initialization
|                 | |       | |    0.0294539s [  0.0%] initialization
|                 | |       | |     0.823272s [  0.2%] steps
|                 | |       | |     0.237919s [  0.1%] creating output-file
|                 | |       | |       3.3278s [  0.8%] event-handling
|                 | |       | |   0.00339252s [  0.0%] overhead
|                 | |       | |      398.787s [ 98.9%] simulation
|                 | |       | |      403.217s [100.0%] total
LOG_STATS         | info    | events
|                 | |       | |   539 state events
|                 | |       | |     0 time events
LOG_STATS         | info    | solver
|                 | |       | | 15164 steps taken
|                 | |       | | 22262 calls of functionODE
|                 | |       | |  4295 evaluations of jacobian
|                 | |       | |   379 error test failures
|                 | |       | |     0 convergence test failures
LOG_STATS         | info    | ### END STATISTICS ###

Initialization seems to be okay. No warnings or errors occur during the initialization process. Also all states and derivatives are the same (for linux and windows).

Last edited 11 years ago by lochel (previous) (diff)

comment:9 Changed 11 years ago by cschubert

Is EngineV6 working now? The new release should be able to simulate it properly. This is one of the models people try first.

comment:10 Changed 11 years ago by cschubert

I had a look at the model and want to post my progress.
I am comparing 1.9.0 beta 4 (1st Feb 2013) against the current trunk.
Execution time of executable with different stoptime gives on my Core i3:

stopTime;beta4;trunk
0.1;2.71;7.38
0.2;6.02;22.18
0.3;11.17;44.33
0.4;17.35;72.26
0.5;25.73;?

So you can see that simulation time increased massively. However, running with +lv=LOG_SOLVER, shows that still the same timesteps are taken.

Thus, it seems that:
a) the same results are being calculated
b) the increase must come from evaluating the ODE function.

I'll see if I can find out more.

(PS: I did NOT enable +d=doLinearTearing)

comment:11 Changed 11 years ago by cschubert

I looked into it, a little bit more using the Visual Studio 2012 Performance Analyzer.

Apparently, the current trunk spends roughly 90% of its time in lapack_win32_MT.dll.
In the 1.9.0 beta4 release, almost no time is spent on the linear system (_omc_dgesv_ accounts for 6% of overall time spent). Here the main task is _getAnalyticalJacobianNewton.

Therefore, I would guess that it has to do with Lapack. Looking at the changesets during the timespan where the test broke, it might be that

Revision: 15293
Author: wbraun
Date: Sonntag, 24. Februar 2013 12:25:24
Message:
 - switched to lapack library instead using internal copy
 - fixed some compiling warning (clang)

----
Modified : /trunk/SimulationRuntime/c/openmodelica_func.h
Modified : /trunk/SimulationRuntime/c/simulation/solver/dassl.c
Modified : /trunk/SimulationRuntime/c/simulation/solver/linearSolverLapack.c
Modified : /trunk/SimulationRuntime/c/simulation/solver/linearSolverLapack.h
Modified : /trunk/SimulationRuntime/c/simulation/solver/model_help.c
Modified : /trunk/SimulationRuntime/c/simulation/solver/solver_main.c

might be connected to that behaviour.

Willi, could you investigate that matter a little bit further?

comment:12 Changed 11 years ago by sjoelund.se

I ran the built-in profiler on it :)

Modelica.Mechanics.MultiBody.Frames.resolve2: 469434 calls, 1132.219405731 sec, 98.10% execution time.

Execution time is measured in full seconds per global step.

Last edited 11 years ago by sjoelund.se (previous) (diff)

comment:13 Changed 11 years ago by sjoelund.se

Weird. This function has InlineAfterIndexReduction = true, it should not be part of generated code, should it? Did someone break the profiler? :)

Last edited 11 years ago by sjoelund.se (previous) (diff)

comment:14 Changed 11 years ago by sjoelund.se

OK. I think the profiler counted linear blocks as index=-1, which is the last function in the list. Should be slow linear solving, then :(

comment:15 Changed 11 years ago by sjoelund.se

Linear function (index 5115, size 775): 67061x, 1132.7s, 97.99%

comment:16 Changed 11 years ago by sjoelund.se

In old omc: linear system 5391 (size 452): 67294x, 88.5s, 89.63%

So something changed in MSL or the back-end that cases the linear system to be bigger.

comment:17 Changed 11 years ago by cschubert

  • Owner changed from wbraun to vwaurich
  • Status changed from new to assigned

We found the reason for the larger linear system.
The equation

bearing.e = Modelica.Math.Vectors.normalizeWithAssert({1.0, 0.0, 0.0})

is not being simplified, although the function has the annotation Inline=true.

normalizeWithAssert consists of two lines, however:

assert(length(v) > 0);
result := v/length(v);

and is therefore not inlined.

Volker is looking into that subject and will try to fix it.

comment:18 Changed 11 years ago by sjoelund.se

"The time complexity of DGESV for an NxN matrix is O(N3)", so the time increase seems reasonable :(

comment:19 Changed 11 years ago by adrpo

Is strange that the function is not constant evaluated in the FrontEnd as its input is constant. I'll have a quick look.

comment:20 Changed 11 years ago by sjoelund.se

It is not constant in the front-end: It is constant after symbolic manipulation. We could either evaluate the expression in an optimisation module or do inlining.

comment:21 Changed 11 years ago by adrpo

OK, is not constant, is actually a parameter that is given as input, so the FrontEnd does the correct thing (as is not a parameter with Evaluate=true).

comment:22 Changed 11 years ago by cschubert

After Volkers fix r17525, the size of the linear system decreased to 403 (instead of 452 in the old omc). This required allowing the inlining of functions with one assignment and one assert.

It seems that it has helped a little. Nevertheless, EngineV6 still requires 7m13s compared to 1m59s on 24th of February 2013. For EngineV6_analytic it is 2m7s (new) against 52s (old).

Martin, could you run the profiling on your computer again, please? So that we can see how much time is spent in the linear system now.

Any other ideas?

comment:23 Changed 11 years ago by sjoelund.se

Still dominated by that big linear function:

  • EngineV6: Linear function (size 404) 67064x, 184.431686947s, 93.20%
  • EngineV6_analytic: Linear function (size 382) 22189x, 53.532767600s, 93.61%

comment:24 Changed 11 years ago by sjoelund.se

  • Milestone changed from 1.9.0 to 1.9.1

Postponed until 1.9.1

comment:25 Changed 10 years ago by sjoelund.se

This is still pretty much the same. Dymola EngineV6 in MSL 3.2.1 has linear system size 327, and 6 torn non-linear systems (10-1). OpenModelica has linear system size 404 and 6 torn non-linear systems (13-1).

For my laptop, Dymola:

   CPU-time for integration      : 11 seconds
   CPU-time for one GRID interval: 22 milli-seconds
   Number of result points       : 1167
   Number of GRID   points       : 501
   Number of (successful) steps  : 14639
   Number of F-evaluations       : 39453
   Number of H-evaluations       : 17677
   Number of Jacobian-evaluations: 4103
   Number of (model) time events : 0
   Number of (U) time events     : 0
   Number of state    events     : 333
   Number of step     events     : 0
   Minimum integration stepsize  : 3.01e-09
   Maximum integration stepsize  : 0.00126
   Maximum integration order     : 5

OpenModelica LOG_STATS:

     0.603972s          reading init.xml
     0.578528s          reading info.xml
    0.0289822s          pre-initialization
    0.0155034s [  0.0%] initialization
      1.25805s [  0.2%] steps
     0.318949s [  0.1%] creating output-file
      4.46913s [  0.8%] event-handling
   0.00940738s [  0.0%] overhead
      523.043s [ 98.8%] simulation
      529.143s [100.0%] total
 events
   544 state events
     0 time events
 solver: DASSL
 15169 steps taken
 21628 calls of functionODE
  4300 evaluations of jacobian
   379 error test failures
     0 convergence test failures

The difference in integration time (48x) should be more than just the size of the linear system (404/327)^3 is only 1.89x slower.
So I think Dymola possibly calls the linear solver a fewer number of times than we do, or uses a sparse solver. I set the tolerance to be the same (1e-6). The total number of steps is similar, as is the number of state events.

comment:26 Changed 10 years ago by cschubert

Actually, Dymola reduces the linear system from 327 unknowns to just 36

Sizes of linear systems of equations: {2, 2, 8, 2, 2, 8, 2, 2, 8, 2, 2, 8, 2, 2, 8, 2, 2, 8, 327}
Sizes after manipulation of the linear systems: {2, 2, 0, 2, 2, 0, 2, 2, 0, 2, 2, 0, 2, 2, 0, 2, 2, 0, 36}

However (404/36)3 is not the expected speed up as there comes additional overhead with tearing, i.e. calculating the matrices A and b of the reduced linear system A*x=b.

comment:27 Changed 10 years ago by sjoelund.se

I still think a sparse solver could also do a good job here though. The density is something like 0.8%.

comment:28 Changed 10 years ago by cschubert

This is undoubtedly true.
In the end, tearing is just one of many methods for solving sparse systems of equations and it need not be the best in every case.

In contrast to "classical" sparse approaches, tearing makes use of the structure of the system (position of zero entries) which is already known during compile time.

comment:29 Changed 10 years ago by sjoelund.se

A "good" (Modelica-specialized) sparse linear solver would also be able to take advantage of zero entries.
Because you can register the sparsity pattern in advance (so you have the indexes in the sparse matrix and the data of sparse matrix separate and just update the sparse matrix in every step).

The key is that you know you want to calculate this matrix many times but with different values in there.

comment:30 Changed 10 years ago by wbraun

  • Owner changed from vwaurich to wbraun
  • Status changed from assigned to accepted

Perhaps we should activate doLinearTearing as default for 1.9.1, even if some other examples might failed it would at least improve the performance of this example:

LOG_STATS         | info    | ### STATISTICS ###
LOG_STATS         | info    | timer
|                 | |       | |     0.715282s          reading init.xml
|                 | |       | |     0.693358s          reading info.xml
|                 | |       | |   0.00478853s          pre-initialization
|                 | |       | |   0.00209845s [  0.0%] initialization
|                 | |       | |     0.350783s [  0.3%] steps
|                 | |       | |    0.0229478s [  0.0%] creating output-file
|                 | |       | |     0.738157s [  0.6%] event-handling
|                 | |       | |   0.00381507s [  0.0%] overhead
|                 | |       | |      129.405s [ 99.1%] simulation
|                 | |       | |      130.528s [100.0%] total
LOG_STATS         | info    | events
|                 | |       | |   549 state events
|                 | |       | |     0 time events
LOG_STATS         | info    | solver: DASSL
|                 | |       | | 15174 steps taken
|                 | |       | | 21633 calls of functionODE
|                 | |       | |  4305 evaluations of jacobian
|                 | |       | |   379 error test failures
|                 | |       | |     0 convergence test failures
LOG_STATS         | info    | ### END STATISTICS ###

Further you are right it's also worth to test sparse solvers like UMFPACK. We could start with the version 5.1 which is available under a GNU LGPL license. I'll have also a student, how is interested to work in this direction. So I'll try to take care on this ticket, now this with more pressure.

comment:31 Changed 10 years ago by wbraun

  • Resolution set to fixed
  • Status changed from accepted to closed

In r21588 I activated doLinearTearing by default and now I've following performance measurement:

wbraun@###$./Modelica.Mechanics.MultiBody.Examples.Loops.EngineV6 -lv LOG_STATS
LOG_STATS         | info    | ### STATISTICS ###
LOG_STATS         | info    | timer
|                 | |       | |     0.749886s          reading init.xml
|                 | |       | |     0.774778s          reading info.xml
|                 | |       | |   0.00468146s          pre-initialization
|                 | |       | |    0.0016162s [  0.0%] initialization
|                 | |       | |     0.034482s [  0.2%] steps
|                 | |       | |    0.0222729s [  0.2%] creating output-file
|                 | |       | |     0.102994s [  0.7%] event-handling
|                 | |       | |   0.00603958s [  0.0%] overhead
|                 | |       | |      14.6146s [ 98.8%] simulation
|                 | |       | |      14.7867s [100.0%] total
LOG_STATS         | info    | events
|                 | |       | |   551 state events
|                 | |       | |     0 time events
LOG_STATS         | info    | solver: DASSL
|                 | |       | | 15176 steps taken
|                 | |       | | 21635 calls of functionODE
|                 | |       | |  4307 evaluations of jacobian
|                 | |       | |   379 error test failures
|                 | |       | |     0 convergence test failures
LOG_STATS         | info    | ### END STATISTICS ###

So I'll close that ticket.

Note: See TracTickets for help on using tickets.