Opened 6 years ago

Last modified 3 years ago

#5353 new defect

Enhance the LOG_STATS output to tell apart time spent by the solver and by the generated code

Reported by: Francesco Casella Owned by: Lennart Ochel
Priority: high Milestone:
Component: Run-time Version:
Keywords: Cc: Karim Adbdelhak

Description

At the heart of the process of optimizing simulation performance is the ability to tell apart how much time is spent by the solver (DASSL, IDA, etc.), and how much time is spent by omc-generated code to compute residuals, jacobians, variables, zero-crossing functions etc.

The LOG_STATS output should contain a bit more detailed information about this.

Change History (20)

comment:1 by Martin Sjölund, 6 years ago

We do keep functionODE/Jacobian stats when doing profiling/always. I'll see if I can add the residuals and zero-crossing functions. I'm sure input/output/algebraics could also be added if necessary.

comment:2 by Martin Sjölund, 6 years ago

Note that we already have a LOG_STATS_V which contains a bit more information. I have extended it in my local copy:

[martin@mega tmp]$ ./Modelica.Mechanics.MultiBody.Examples.Elementary.DoublePendulum -lv LOG_STATS_V,LOG_STATS
LOG_SUCCESS       | info    | The initialization finished successfully without homotopy method.
LOG_STATS         | info    | ### STATISTICS ###
|                 | |       | | timer
|                 | |       | | |    0.0279313s          reading init.xml
|                 | |       | | |   0.00226606s          reading info.xml
|                 | |       | | |   0.00063764s [  4.7%] pre-initialization
|                 | |       | | |  0.000219568s [  1.6%] initialization
|                 | |       | | |   1.9243e-05s [  0.1%] steps
|                 | |       | | |   0.00279653s [ 20.5%] creating output-file
|                 | |       | | |   1.0715e-05s [  0.1%] event-handling
|                 | |       | | |  0.000225546s [  1.7%] overhead
|                 | |       | | |   0.00972814s [ 71.3%] simulation
|                 | |       | | |    0.0136374s [100.0%] total
|                 | |       | | events
|                 | |       | | |     0 state events
|                 | |       | | |     0 time events
|                 | |       | | solver: dassl
|                 | |       | | |   716 steps taken
|                 | |       | | |   921 calls of functionODE
|                 | |       | | |    34 evaluations of jacobian
|                 | |       | | |    15 error test failures
|                 | |       | | |     0 convergence test failures
LOG_STATS_V       | info    | function calls
|                 | |       | |  1557 calls of functionODE
|                 | |       | | |    0.0039522s [ 29.0%]
|                 | |       | |  1057 calls of functionODE_residual
|                 | |       | | |    0.0032001s [ 23.5%]
|                 | |       | |   500 calls of functionAlgebraics
|                 | |       | | |   0.00264221s [ 19.4%]
|                 | |       | |    34 evaluations of jacobian
|                 | |       | | |  0.000334096s [  2.4%]
|                 | |       | |     2 calls of updateDiscreteSystem
|                 | |       | |     0 calls of functionZeroCrossingsEquations
|                 | |       | |   501 calls of functionZeroCrossings
|                 | |       | | |    6.392e-06s [  0.0%]
LOG_STATS_V       | info    | linear systems
|                 | |       | | Linear system 688 with (size = 4, nonZeroElements = 14, density = 87.50 %) solver statistics:
|                 | |       | | |  number of calls                : 1
|                 | |       | | |  average time per call          : 5.2088e-05
|                 | |       | | |  time of jacobian evaluations   : 3.988e-06
|                 | |       | | |  total time                     : 5.2088e-05
|                 | |       | | Linear system 930 with (size = 4, nonZeroElements = 14, density = 87.50 %) solver statistics:
|                 | |       | | |  number of calls                : 1559
|                 | |       | | |  average time per call          : 2.05356e-06
|                 | |       | | |  time of jacobian evaluations   : 0.00104905
|                 | |       | | |  total time                     : 0.00320149
LOG_STATS_V       | info    | non-linear systems
LOG_SUCCESS       | info    | The simulation finished successfully.

Note that the number of functionODE calls reported for LOG_STATS (which gets the information from the solver) is different than in LOG_STATS_V (which gets the number as measured by functionODE).

comment:3 by Martin Sjölund, 6 years ago

https://github.com/OpenModelica/OMCompiler/pull/2997

Is there anything you feel needs to be added?

comment:4 by Francesco Casella, 6 years ago

Now there's a lot more information, but I still find it difficult to tell how much time is spent by the solver and how much is spent by the omc generated code.

If we want to try to improve the performance, telling these two apart is essential, because at the end of the day, the former is not our responsibility (except regarding the optimizations we may use to compile the solver source code), while the latter definitely is. Could you please add this imformation?

It would also be nice if the timers were also added to DAEmode, I understand there's currently no profiling there.

Last edited 6 years ago by Francesco Casella (previous) (diff)

comment:5 by Martin Sjölund, 6 years ago

The problem is that all of these timers overlap. There is no timer for the solver since then you would need add logic for every single solver (including creating wrappers for every callback function to stop this particular timer).
I'll see if I could do it for dassl.

I have no clue how DAEmode works though.

comment:6 by Martin Sjölund, 6 years ago

true
record SimulationResult
    resultFile = "/home/marsj/tmp/Modelica.Mechanics.MultiBody.Examples.Elementary.DoublePendulum_res.mat",
    simulationOptions = "startTime = 0.0, stopTime = 3.0, numberOfIntervals = 500, tolerance = 1e-06, method = 'ida', fileNamePrefix = 'Modelica.Mechanics.MultiBody.Examples.Elementary.DoublePendulum', options = '', outputFormat = 'mat', variableFilter = '.*', cflags = '', simflags = '-lv LOG_STATS,LOG_STATS_V'",
    messages = "LOG_SUCCESS       | info    | The initialization finished successfully without homotopy method.
LOG_STATS         | info    | ### STATISTICS ###
|                 | |       | | timer
|                 | |       | | |    0.0917078s          reading init.xml
|                 | |       | | |   0.00344319s          reading info.xml
|                 | |       | | |   0.00110111s [  1.8%] pre-initialization
|                 | |       | | |  0.000726535s [  1.2%] initialization
|                 | |       | | |   3.3796e-05s [  0.1%] steps
|                 | |       | | |    0.0268731s [ 44.4%] solver (excl. callbacks)
|                 | |       | | |   0.00592067s [  9.8%] creating output-file
|                 | |       | | |            0s [  0.0%] event-handling
|                 | |       | | |   0.00214035s [  3.5%] overhead
|                 | |       | | |    0.0229237s [ 37.8%] simulation
|                 | |       | | |    0.0606009s [100.0%] total
|                 | |       | | events
|                 | |       | | |     0 state events
|                 | |       | | |     0 time events
|                 | |       | | solver: ida
|                 | |       | | |  2338 steps taken
|                 | |       | | |  3593 calls of functionODE
|                 | |       | | |   176 evaluations of jacobian
|                 | |       | | |    81 error test failures
|                 | |       | | |     0 convergence test failures
LOG_STATS_V       | info    | function calls
|                 | |       | |  6082 calls of functionDAE
|                 | |       | | |    0.0138641s [ 22.9%]
|                 | |       | |   180 evaluations of jacobian
|                 | |       | | |   0.00995321s [ 16.4%]
|                 | |       | |     2 calls of updateDiscreteSystem
|                 | |       | |     0 calls of functionZeroCrossingsEquations
|                 | |       | |   501 calls of functionZeroCrossings
|                 | |       | | |  6.42056e-05s [  0.1%]
LOG_STATS_V       | info    | linear systems
|                 | |       | | Linear system 691 with (size = 4, nonZeroElements = 14, density = 87.50 %) solver statistics:
|                 | |       | | |  number of calls                : 1
|                 | |       | | |  average time per call          : 9.8888e-05
|                 | |       | | |  time of jacobian evaluations   : 6.16e-06
|                 | |       | | |  total time                     : 9.8888e-05
LOG_STATS_V       | info    | non-linear systems
LOG_SUCCESS       | info    | The simulation finished successfully.
",
    timeFrontend = 1.418430537,
    timeBackend = 0.730873369,
    timeSimCode = 0.149293907,
    timeTemplates = 0.106439983,
    timeCompile = 2.144501323,
    timeSimulation = 0.1918890489999999,
    timeTotal = 4.635116575
end SimulationResult;

So... I guess the Jacobian calculation calls functionDAE so those numbers overlap? Otherwise there is something terribly wrong there :)

in reply to:  5 comment:7 by Francesco Casella, 6 years ago

Cc: Karim Adbdelhak added

Replying to sjoelund.se:

The problem is that all of these timers overlap. There is no timer for the solver since then you would need add logic for every single solver (including creating wrappers for every callback function to stop this particular timer).

I guess for this purposes taking the difference between the total time and the time spent by the generated code would be ok.

I have no clue how DAEmode works though.

Ach. We may involve Karim in the discussion, then.

comment:8 by Martin Sjölund, 6 years ago

No, you can't take the difference between the total time and time spent in generated code because the generated code timers overlap. It's quite messy. I added the logic for the IDA, dassl, euler, RK solvers though. But it's tricky and I'm not sure if some timers overlap still (well, the Jacobian method seems to call functionDAE but other than that it seems OK).

For example, for DoublePendulum you can see that dassl spends 6.6% in the solver and the rest in our code. And that ida spends 44.4% in the solver.

in reply to:  8 comment:9 by Francesco Casella, 6 years ago

Replying to sjoelund.se:

No, you can't take the difference between the total time and time spent in generated code because the generated code timers overlap.

I see.

It's quite messy. I added the logic for the IDA, dassl, euler, RK solvers though. But it's tricky and I'm not sure if some timers overlap still (well, the Jacobian method seems to call functionDAE but other than that it seems OK).

For example, for DoublePendulum you can see that dassl spends 6.6% in the solver and the rest in our code. And that ida spends 44.4% in the solver.

This is surprising. As I understand IDA is basically an evolution of DASSL, so it should take more or less the same time, possibly a bit less, since I expect the evolution to be faster than the original :)

Maybe we are compiling IDA with much slower optimization options than we do for DASSL?

Or are we just missing some callback functions in DASSL?

comment:10 by Martin Sjölund, 6 years ago

Well, we are using DASSL in ODE mode, so that probably has something to do with it.

comment:11 by Martin Sjölund, 6 years ago

IDA

6082 calls of functionDAE
0.0101015s [ 20.8%]
180 evaluations of jacobian
0.00855972s [ 17.6%]

DASSL

1585 calls of functionODE
0.006175s [ 21.6%]
1085 calls of functionODE_residual
0.00523746s [ 18.4%]
500 calls of functionAlgebraics
0.00335483s [ 11.8%]
36 evaluations of jacobian
0.000872114s [  3.1%]

The system is quite small for DoublePendulum so I guess it's dense enough for ODE mode to beat it.

in reply to:  10 comment:12 by Francesco Casella, 6 years ago

Replying to sjoelund.se:

Well, we are using DASSL in ODE mode, so that probably has something to do with it.

Aha, sorry, so you mean IDA in daeMode

in reply to:  11 comment:13 by Francesco Casella, 6 years ago

Replying to sjoelund.se:

The system is quite small for DoublePendulum so I guess it's dense enough for ODE mode to beat it.

Absolutely, the double pendulum has only four states and no serious algebraic loops, so there's no use for sparse daeMode here.

comment:14 by Martin Sjölund, 6 years ago

Yes, IDA in ODE mode is very similar to dassl performance. Actually, total times are virtually identical (including number of callbacks). The only exception is the time reported inside the solver is 6.4% for dassl and 12.8% for ida.

in reply to:  14 comment:15 by Francesco Casella, 6 years ago

Replying to sjoelund.se:

Yes, IDA in ODE mode is very similar to dassl performance. Actually, total times are virtually identical (including number of callbacks).

OK, that's what I expected.

The only exception is the time reported inside the solver is 6.4% for dassl and 12.8% for ida.

Maybe we should compile IDA with -O3?

comment:16 by Martin Sjölund, 6 years ago

DASSL and IDA were both compiled -O2 in this case. -O3 usually has little impact relative to O2.

comment:17 by Francesco Casella, 5 years ago

Milestone: 1.14.01.16.0

Releasing 1.14.0 which is stable and has many improvements w.r.t. 1.13.2. This issue is rescheduled to 1.16.0

comment:18 by Francesco Casella, 4 years ago

Milestone: 1.16.01.17.0

Retargeted to 1.17.0 after 1.16.0 release

comment:19 by Francesco Casella, 4 years ago

Milestone: 1.17.01.18.0

Retargeted to 1.18.0 because of 1.17.0 timed release.

comment:20 by Francesco Casella, 3 years ago

Milestone: 1.18.0

Ticket retargeted after milestone closed

Note: See TracTickets for help on using tickets.