Opened 4 years ago

Last modified 4 years ago

#6367 assigned defect

Avoid C compilation bottleneck in OMC

Reported by: Francesco Casella Owned by: Martin Sjölund
Priority: critical Milestone: NeedsInput
Component: Code Generation Version: 1.16.0
Keywords: Cc: Karim Adbdelhak, Andreas Heuermann, Adrian Pop, jean-philippe.tavella@…, Adeel Asghar

Description

Comparing ScalableTestSuite and ScalableTestSuite_noopt it seems that the simulation may be 20-30% slower using -O0 instead of -Os, but the C compilation time may be 3-4 times faster.

This issue was already discussed in #4033 and #4879. Bottom line: there is no need to completely disable optimization, thus introducing an unnecessary trade-off. What is needed is

  • use the OMC_DISABLE_OPT macro to tell the compiler which functions should not be optimized. Basically, the ones involved in initialization (which are only executed a few times) should be marked with this macro
  • split the jac.c and nls.c files, so that they can be compiled in parallel

Apparently, some code was added after #4879 was closed without the appropriate OMC_DISABLE_OPT macro that made it slow again.

The examples attached to #6271 and the larger test cases in the ScalableTestSuite can be used as benchmarks.

@sjoelund.se just opened PR 7133 and PR 7134 to improve things.

I expect to see the two above-mentioned test reports to show similar C-code compilation time after the PRs are merged.

Change History (4)

comment:1 by Francesco Casella, 4 years ago

Cc: Adeel Asghar added
Status: newassigned

comment:2 by Martin Sjölund, 4 years ago

Also, note that while compilation speed is slower on the testsuite, it compiles files using 1 thread. If you have a 16-core machine, the simulation time will be a magnitude faster than the testsuite says.

in reply to:  2 comment:3 by Francesco Casella, 4 years ago

Replying to sjoelund.se:

Also, note that while compilation speed is slower on the testsuite, it compiles files using 1 thread. If you have a 16-core machine, the simulation time will be a magnitude faster than the testsuite says.

I guess you mean "compilation time".

Yes, assuming the c source files are split conveniently. Currently, with larger models, the speedup ratio is more like 2-3 even with a 20-core machine, because of the jac.c and nlz.c files.

I'll do some testing with ScalablePowerGrids and ScalableTestSuite now that your PRs are merged in and report.

comment:4 by Francesco Casella, 4 years ago

I ran ScalableTestGrids.Models.Type1_N_4_M_4 with version 4295d1 (before the two PRs) and with version c69a0c (after the two PRs), with -O0 and -Os, on a 20-thread Xeon server with 72 GB RAM. Here are the results:

Compiler FE BE Simcode Templates Compile Simulation
4295d1, -O0 11.44 30.31 9.22 6.04 15.44 4.30
c69a0c, -O0 14.58 34.09 16.29 11.69 15.18 4.06
4295d1, -Os 11.35 29.58 9.66 6.04 94.02 3.85
c69a0c, -Os 14.71 34.53 16.28 12.55 76.81 3.85

The new version became significantly slower during code generation. There was some hiccup with version management, so if I ran apt-get update I wouldn't get the newest nightly. I had to uninstall openmodelica and reinstall it from scratch, I'm not sure if that is the cause.

As to C compilation, there was a -20% reduction in C-compilation time with -Os, but we are still very far from the time we can get with -O0. The bottlenecks of C compilation are the inz.c, jac.c, dae.c, and bnd.c.

Also some of the the split nls.c files show up in htop long after most other file have been compiled and only 5 or 6 threads are running clang. Maybe they should be split into even smaller chunks.

Note: See TracTickets for help on using tickets.