Opened 6 years ago
Last modified 3 years ago
#5353 new defect
Enhance the LOG_STATS output to tell apart time spent by the solver and by the generated code
Reported by: | Francesco Casella | Owned by: | Lennart Ochel |
---|---|---|---|
Priority: | high | Milestone: | |
Component: | Run-time | Version: | |
Keywords: | Cc: | Karim Adbdelhak |
Description
At the heart of the process of optimizing simulation performance is the ability to tell apart how much time is spent by the solver (DASSL, IDA, etc.), and how much time is spent by omc-generated code to compute residuals, jacobians, variables, zero-crossing functions etc.
The LOG_STATS output should contain a bit more detailed information about this.
Change History (20)
comment:1 by , 6 years ago
comment:2 by , 6 years ago
Note that we already have a LOG_STATS_V which contains a bit more information. I have extended it in my local copy:
[martin@mega tmp]$ ./Modelica.Mechanics.MultiBody.Examples.Elementary.DoublePendulum -lv LOG_STATS_V,LOG_STATS LOG_SUCCESS | info | The initialization finished successfully without homotopy method. LOG_STATS | info | ### STATISTICS ### | | | | | timer | | | | | | 0.0279313s reading init.xml | | | | | | 0.00226606s reading info.xml | | | | | | 0.00063764s [ 4.7%] pre-initialization | | | | | | 0.000219568s [ 1.6%] initialization | | | | | | 1.9243e-05s [ 0.1%] steps | | | | | | 0.00279653s [ 20.5%] creating output-file | | | | | | 1.0715e-05s [ 0.1%] event-handling | | | | | | 0.000225546s [ 1.7%] overhead | | | | | | 0.00972814s [ 71.3%] simulation | | | | | | 0.0136374s [100.0%] total | | | | | events | | | | | | 0 state events | | | | | | 0 time events | | | | | solver: dassl | | | | | | 716 steps taken | | | | | | 921 calls of functionODE | | | | | | 34 evaluations of jacobian | | | | | | 15 error test failures | | | | | | 0 convergence test failures LOG_STATS_V | info | function calls | | | | | 1557 calls of functionODE | | | | | | 0.0039522s [ 29.0%] | | | | | 1057 calls of functionODE_residual | | | | | | 0.0032001s [ 23.5%] | | | | | 500 calls of functionAlgebraics | | | | | | 0.00264221s [ 19.4%] | | | | | 34 evaluations of jacobian | | | | | | 0.000334096s [ 2.4%] | | | | | 2 calls of updateDiscreteSystem | | | | | 0 calls of functionZeroCrossingsEquations | | | | | 501 calls of functionZeroCrossings | | | | | | 6.392e-06s [ 0.0%] LOG_STATS_V | info | linear systems | | | | | Linear system 688 with (size = 4, nonZeroElements = 14, density = 87.50 %) solver statistics: | | | | | | number of calls : 1 | | | | | | average time per call : 5.2088e-05 | | | | | | time of jacobian evaluations : 3.988e-06 | | | | | | total time : 5.2088e-05 | | | | | Linear system 930 with (size = 4, nonZeroElements = 14, density = 87.50 %) solver statistics: | | | | | | number of calls : 1559 | | | | | | average time per call : 2.05356e-06 | | | | | | time of jacobian evaluations : 0.00104905 | | | | | | total time : 0.00320149 LOG_STATS_V | info | non-linear systems LOG_SUCCESS | info | The simulation finished successfully.
Note that the number of functionODE calls reported for LOG_STATS
(which gets the information from the solver) is different than in LOG_STATS_V
(which gets the number as measured by functionODE).
comment:3 by , 6 years ago
https://github.com/OpenModelica/OMCompiler/pull/2997
Is there anything you feel needs to be added?
comment:4 by , 6 years ago
Now there's a lot more information, but I still find it difficult to tell how much time is spent by the solver and how much is spent by the omc generated code.
If we want to try to improve the performance, telling these two apart is essential, because at the end of the day, the former is not our responsibility (except regarding the optimizations we may use to compile the solver source code), while the latter definitely is. Could you please add this imformation?
It would also be nice if the timers were also added to DAEmode, I understand there's currently no profiling there.
follow-up: 7 comment:5 by , 6 years ago
The problem is that all of these timers overlap. There is no timer for the solver since then you would need add logic for every single solver (including creating wrappers for every callback function to stop this particular timer).
I'll see if I could do it for dassl.
I have no clue how DAEmode works though.
comment:6 by , 6 years ago
true record SimulationResult resultFile = "/home/marsj/tmp/Modelica.Mechanics.MultiBody.Examples.Elementary.DoublePendulum_res.mat", simulationOptions = "startTime = 0.0, stopTime = 3.0, numberOfIntervals = 500, tolerance = 1e-06, method = 'ida', fileNamePrefix = 'Modelica.Mechanics.MultiBody.Examples.Elementary.DoublePendulum', options = '', outputFormat = 'mat', variableFilter = '.*', cflags = '', simflags = '-lv LOG_STATS,LOG_STATS_V'", messages = "LOG_SUCCESS | info | The initialization finished successfully without homotopy method. LOG_STATS | info | ### STATISTICS ### | | | | | timer | | | | | | 0.0917078s reading init.xml | | | | | | 0.00344319s reading info.xml | | | | | | 0.00110111s [ 1.8%] pre-initialization | | | | | | 0.000726535s [ 1.2%] initialization | | | | | | 3.3796e-05s [ 0.1%] steps | | | | | | 0.0268731s [ 44.4%] solver (excl. callbacks) | | | | | | 0.00592067s [ 9.8%] creating output-file | | | | | | 0s [ 0.0%] event-handling | | | | | | 0.00214035s [ 3.5%] overhead | | | | | | 0.0229237s [ 37.8%] simulation | | | | | | 0.0606009s [100.0%] total | | | | | events | | | | | | 0 state events | | | | | | 0 time events | | | | | solver: ida | | | | | | 2338 steps taken | | | | | | 3593 calls of functionODE | | | | | | 176 evaluations of jacobian | | | | | | 81 error test failures | | | | | | 0 convergence test failures LOG_STATS_V | info | function calls | | | | | 6082 calls of functionDAE | | | | | | 0.0138641s [ 22.9%] | | | | | 180 evaluations of jacobian | | | | | | 0.00995321s [ 16.4%] | | | | | 2 calls of updateDiscreteSystem | | | | | 0 calls of functionZeroCrossingsEquations | | | | | 501 calls of functionZeroCrossings | | | | | | 6.42056e-05s [ 0.1%] LOG_STATS_V | info | linear systems | | | | | Linear system 691 with (size = 4, nonZeroElements = 14, density = 87.50 %) solver statistics: | | | | | | number of calls : 1 | | | | | | average time per call : 9.8888e-05 | | | | | | time of jacobian evaluations : 6.16e-06 | | | | | | total time : 9.8888e-05 LOG_STATS_V | info | non-linear systems LOG_SUCCESS | info | The simulation finished successfully. ", timeFrontend = 1.418430537, timeBackend = 0.730873369, timeSimCode = 0.149293907, timeTemplates = 0.106439983, timeCompile = 2.144501323, timeSimulation = 0.1918890489999999, timeTotal = 4.635116575 end SimulationResult;
So... I guess the Jacobian calculation calls functionDAE so those numbers overlap? Otherwise there is something terribly wrong there :)
comment:7 by , 6 years ago
Cc: | added |
---|
Replying to sjoelund.se:
The problem is that all of these timers overlap. There is no timer for the solver since then you would need add logic for every single solver (including creating wrappers for every callback function to stop this particular timer).
I guess for this purposes taking the difference between the total time and the time spent by the generated code would be ok.
I have no clue how DAEmode works though.
Ach. We may involve Karim in the discussion, then.
follow-up: 9 comment:8 by , 6 years ago
No, you can't take the difference between the total time and time spent in generated code because the generated code timers overlap. It's quite messy. I added the logic for the IDA, dassl, euler, RK solvers though. But it's tricky and I'm not sure if some timers overlap still (well, the Jacobian method seems to call functionDAE but other than that it seems OK).
For example, for DoublePendulum you can see that dassl spends 6.6% in the solver and the rest in our code. And that ida spends 44.4% in the solver.
comment:9 by , 6 years ago
Replying to sjoelund.se:
No, you can't take the difference between the total time and time spent in generated code because the generated code timers overlap.
I see.
It's quite messy. I added the logic for the IDA, dassl, euler, RK solvers though. But it's tricky and I'm not sure if some timers overlap still (well, the Jacobian method seems to call functionDAE but other than that it seems OK).
For example, for DoublePendulum you can see that dassl spends 6.6% in the solver and the rest in our code. And that ida spends 44.4% in the solver.
This is surprising. As I understand IDA is basically an evolution of DASSL, so it should take more or less the same time, possibly a bit less, since I expect the evolution to be faster than the original :)
Maybe we are compiling IDA with much slower optimization options than we do for DASSL?
Or are we just missing some callback functions in DASSL?
follow-up: 12 comment:10 by , 6 years ago
Well, we are using DASSL in ODE mode, so that probably has something to do with it.
follow-up: 13 comment:11 by , 6 years ago
IDA
6082 calls of functionDAE 0.0101015s [ 20.8%] 180 evaluations of jacobian 0.00855972s [ 17.6%]
DASSL
1585 calls of functionODE 0.006175s [ 21.6%] 1085 calls of functionODE_residual 0.00523746s [ 18.4%] 500 calls of functionAlgebraics 0.00335483s [ 11.8%] 36 evaluations of jacobian 0.000872114s [ 3.1%]
The system is quite small for DoublePendulum so I guess it's dense enough for ODE mode to beat it.
comment:12 by , 6 years ago
Replying to sjoelund.se:
Well, we are using DASSL in ODE mode, so that probably has something to do with it.
Aha, sorry, so you mean IDA in daeMode
comment:13 by , 6 years ago
Replying to sjoelund.se:
The system is quite small for DoublePendulum so I guess it's dense enough for ODE mode to beat it.
Absolutely, the double pendulum has only four states and no serious algebraic loops, so there's no use for sparse daeMode here.
follow-up: 15 comment:14 by , 6 years ago
Yes, IDA in ODE mode is very similar to dassl performance. Actually, total times are virtually identical (including number of callbacks). The only exception is the time reported inside the solver is 6.4% for dassl and 12.8% for ida.
comment:15 by , 6 years ago
Replying to sjoelund.se:
Yes, IDA in ODE mode is very similar to dassl performance. Actually, total times are virtually identical (including number of callbacks).
OK, that's what I expected.
The only exception is the time reported inside the solver is 6.4% for dassl and 12.8% for ida.
Maybe we should compile IDA with -O3?
comment:16 by , 6 years ago
DASSL and IDA were both compiled -O2 in this case. -O3 usually has little impact relative to O2.
comment:17 by , 5 years ago
Milestone: | 1.14.0 → 1.16.0 |
---|
Releasing 1.14.0 which is stable and has many improvements w.r.t. 1.13.2. This issue is rescheduled to 1.16.0
comment:19 by , 4 years ago
Milestone: | 1.17.0 → 1.18.0 |
---|
Retargeted to 1.18.0 because of 1.17.0 timed release.
We do keep functionODE/Jacobian stats when doing profiling/always. I'll see if I can add the residuals and zero-crossing functions. I'm sure input/output/algebraics could also be added if necessary.