Opened 4 years ago
Last modified 4 years ago
#6367 assigned defect
Avoid C compilation bottleneck in OMC
Reported by: | Francesco Casella | Owned by: | Martin Sjölund |
---|---|---|---|
Priority: | critical | Milestone: | NeedsInput |
Component: | Code Generation | Version: | 1.16.0 |
Keywords: | Cc: | Karim Adbdelhak, Andreas Heuermann, Adrian Pop, jean-philippe.tavella@…, Adeel Asghar |
Description
Comparing ScalableTestSuite and ScalableTestSuite_noopt it seems that the simulation may be 20-30% slower using -O0
instead of -Os
, but the C compilation time may be 3-4 times faster.
This issue was already discussed in #4033 and #4879. Bottom line: there is no need to completely disable optimization, thus introducing an unnecessary trade-off. What is needed is
- use the
OMC_DISABLE_OPT
macro to tell the compiler which functions should not be optimized. Basically, the ones involved in initialization (which are only executed a few times) should be marked with this macro - split the jac.c and nls.c files, so that they can be compiled in parallel
Apparently, some code was added after #4879 was closed without the appropriate OMC_DISABLE_OPT
macro that made it slow again.
The examples attached to #6271 and the larger test cases in the ScalableTestSuite can be used as benchmarks.
@sjoelund.se just opened PR 7133 and PR 7134 to improve things.
I expect to see the two above-mentioned test reports to show similar C-code compilation time after the PRs are merged.
Change History (4)
comment:1 by , 4 years ago
Cc: | added |
---|---|
Status: | new → assigned |
follow-up: 3 comment:2 by , 4 years ago
comment:3 by , 4 years ago
Replying to sjoelund.se:
Also, note that while compilation speed is slower on the testsuite, it compiles files using 1 thread. If you have a 16-core machine, the simulation time will be a magnitude faster than the testsuite says.
I guess you mean "compilation time".
Yes, assuming the c source files are split conveniently. Currently, with larger models, the speedup ratio is more like 2-3 even with a 20-core machine, because of the jac.c and nlz.c files.
I'll do some testing with ScalablePowerGrids and ScalableTestSuite now that your PRs are merged in and report.
comment:4 by , 4 years ago
I ran ScalableTestGrids.Models.Type1_N_4_M_4
with version 4295d1
(before the two PRs) and with version c69a0c
(after the two PRs), with -O0
and -Os
, on a 20-thread Xeon server with 72 GB RAM. Here are the results:
Compiler | FE | BE | Simcode | Templates | Compile | Simulation |
4295d1, -O0 | 11.44 | 30.31 | 9.22 | 6.04 | 15.44 | 4.30 |
c69a0c, -O0 | 14.58 | 34.09 | 16.29 | 11.69 | 15.18 | 4.06 |
4295d1, -Os | 11.35 | 29.58 | 9.66 | 6.04 | 94.02 | 3.85 |
c69a0c, -Os | 14.71 | 34.53 | 16.28 | 12.55 | 76.81 | 3.85 |
The new version became significantly slower during code generation. There was some hiccup with version management, so if I ran apt-get update I wouldn't get the newest nightly. I had to uninstall openmodelica and reinstall it from scratch, I'm not sure if that is the cause.
As to C compilation, there was a -20% reduction in C-compilation time with -Os
, but we are still very far from the time we can get with -O0
. The bottlenecks of C compilation are the inz.c
, jac.c
, dae.c
, and bnd.c
.
Also some of the the split nls.c
files show up in htop long after most other file have been compiled and only 5 or 6 threads are running clang. Maybe they should be split into even smaller chunks.
Also, note that while compilation speed is slower on the testsuite, it compiles files using 1 thread. If you have a 16-core machine, the simulation time will be a magnitude faster than the testsuite says.