Opened 8 years ago

Closed 4 years ago

#4033 closed enhancement (fixed)

Default clang flags for faster build time

Reported by: casella Owned by: adeas31
Priority: critical Milestone:
Component: Run-time Version:
Keywords: Cc: adeas31, jean-philippe.tavella@…, Andrea.Bartolini, ceraolo

Description

Compare the performance of the ScalableTestSuite when the C code is compared with the flag -Os

https://test.openmodelica.org/libraries/history/ScalableTestSuite_Experimental/ScalableTestSuite_Experimental-2016-08-29.html

and with the flag -O0

https://test.openmodelica.org/libraries/history/ScalableTestSuite_Experimental/ScalableTestSuite_Experimental-2016-08-30.html

It is apparent that the clang optimizations cause the C code compilation time to scale as O(N2) when using -Os, while it scales as O(N) when using -O0. For larger models, using -Os causes the C code compilation time to blow up to times over one hour, which is totally unacceptable.

The use of -Os leads to a reduction in simulation time between 25% and 50%, which is significant, but almost never pays off if the total build+simulation time is considered. In fact, the only cases when -Os pays off if the total time is considered are the BreakerNetwork and HeatingSystem test cases. In the case of BreakerNetwork, this is only true because the event handling is still very inefficient (the sparse solver is not used for event processing), so the simulation time grows very quickly.

I would therefore suggest changing the default optimization flag to -O0, at least until #3964 is implemented. Once the C source code is split in more smaller files, the situation might need to be reconsidered.

Change History (17)

comment:1 Changed 8 years ago by casella

I just realized that -Os was set explicitly in setup strings of the LibraryExperimental Hudson job, and is by no means omc's default setting.

I'm not sure what is the clang optimization default in omc, I don't see any -O* strings in the console output, so I assume it is clang's default. However, I couldn't understand from the online manual of clang what that is. Is it -O0 (no optimization)? Someone can help?

comment:2 Changed 8 years ago by sjoelund.se

clang default is -O0

comment:3 Changed 8 years ago by casella

  • Resolution set to fixed
  • Status changed from new to closed

OK. I would then suggest to remove the -Os thing from the default setup of the LibraryExperimental Hudson job, because it is clearly not a good idea, at least in term of time spent covering the whole library.

comment:4 Changed 7 years ago by sjoelund.se

  • Milestone 1.10.0 deleted

Milestone deleted

comment:5 Changed 4 years ago by casella

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:6 Changed 4 years ago by casella

  • Cc adeas31 added
  • Owner changed from somebody to adrpo
  • Status changed from reopened to assigned

comment:7 Changed 4 years ago by casella

I just ran an example in OMEdit, here is an excerpt of the compilation log

C:/Program Files/OpenModelica1.17.0-dev-64bit/share/omc/scripts/Compile.bat Modelica.Blocks.Examples.Filter gcc mingw64 parallel 8 0
PATH = "C:\PROGRA~1\OPENMO~1.0-D\tools\msys\mingw64\bin;C:\PROGRA~1\OPENMO~1.0-D\tools\msys\mingw64\bin\..\..\usr\bin;"
mingw32-make: Entering directory 'd:/Temp/OMEdit/Modelica.Blocks.Examples.Filter'
gcc  -Os -falign-functions -mstackrealign -msse2 -mfpmath=sse     -I"C:/Program Files/OpenModelica1.17.0-dev-64bit/include/omc/c" -I. -DOPENMODELICA_XML_FROM_FILE_AT_RUNTIME -DOMC_MODEL_PREFIX=Modelica_Blocks_Examples_Filter -DOMC_NUM_MIXED_SYSTEMS=0 -DOMC_NUM_LINEAR_SYSTEMS=3 -DOMC_NUM_NONLINEAR_SYSTEMS=0 -DOMC_NDELAY_EXPRESSIONS=0 -DOMC_NVAR_STRING=0  -c -o Modelica.Blocks.Examples.Filter.o Modelica.Blocks.Examples.Filter.c

gcc is clearly called with -Os, and it just takes forever to compile. Is this an OMEdit-specific setup, or is it the default one? Could we just switch to -O0 as the default choice? That is for sure the fastest one in >95% of the cases.

comment:8 follow-up: Changed 4 years ago by adrpo

Why do you have gcc? Clang should be the default in 1.17.

comment:9 follow-up: Changed 4 years ago by adrpo

(sorry @adrpo, I messed up with your comment...)

and is hardcoded in the Makefile).

So we could simply change it there to -O0, or just remove it, since I understand that's the default option for clang.

We should have a setting for it in Tools->Options->Simulation right below the compiler setting.

Yes, we could have it as a GUI option. Maybe we should make some evaluations on what are the more useful settings (-Os? -O3?)

Truth is that right now you can press S and then add -O0 in "C/C++ compiler flags (Optional)".

Last edited 4 years ago by casella (previous) (diff)

comment:10 in reply to: ↑ 8 Changed 4 years ago by casella

Replying to adrpo:

Why do you have gcc? Clang should be the default in 1.17.

Sorry, I made some experiments with gcc and forgot to reset to default. This is the output with clang

C:/Program Files/OpenModelica1.17.0-dev-64bit/share/omc/scripts/Compile.bat Modelica.Blocks.Examples.Filter gcc mingw64 parallel 8 0
PATH = "C:\PROGRA~1\OPENMO~1.0-D\tools\msys\mingw64\bin;C:\PROGRA~1\OPENMO~1.0-D\tools\msys\mingw64\bin\..\..\usr\bin;"
mingw32-make: Entering directory 'd:/Temp/OMEdit/Modelica.Blocks.Examples.Filter'
clang  -Os -falign-functions -mstackrealign -msse2 -mfpmath=sse     -I"C:/Program Files/OpenModelica1.17.0-dev-64bit/include/omc/c" -I. -DOPENMODELICA_XML_FROM_FILE_AT_RUNTIME -DOMC_MODEL_PREFIX=Modelica_Blocks_Examples_Filter -DOMC_NUM_MIXED_SYSTEMS=0 -DOMC_NUM_LINEAR_SYSTEMS=3 -DOMC_NUM_NONLINEAR_SYSTEMS=0 -DOMC_NDELAY_EXPRESSIONS=0 -DOMC_NVAR_STRING=0  -c -o Modelica.Blocks.Examples.Filter.o Modelica.Blocks.Examples.Filter.c

and indeed we use -Os there as well.

comment:11 in reply to: ↑ 9 Changed 4 years ago by casella

Replying to adrpo:

I have no idea why we use -Os by default (and is hardcoded in the Makefile).

Neither have I. Maybe @sjoelund.se knows.

We should have a setting for it in Tools->Options->Simulation right below the compiler setting.

I guess so. And also a compiler option flag, that can be stored in a vendor-specific annotation.

Truth is that right now you can press S and then add -O0 in "C/C++ compiler flags (Optional)".

Sure. The problem is, 99% of users have no idea about this. So we should make this the default, and -Os (or -O3, or whatever) the customized choices for the pros.

comment:12 follow-up: Changed 4 years ago by sjoelund.se

Neither have I. Maybe @sjoelund.se knows.

Because it's been requested by users to optimize by default. For dynload, it is -O0 by default.

-O0 is quite a bit slower and we don't want benchmarks to say OM is slow just because people won't bother to enable optimization.

The best solution for slow compilation speeds would be to not expand everything :)

comment:13 in reply to: ↑ 12 ; follow-up: Changed 4 years ago by casella

  • Cc jean-philippe.tavella@… Andrea.Bartolini ceraolo added

Replying to sjoelund.se:

Neither have I. Maybe @sjoelund.se knows.

Because it's been requested by users to optimize by default. For dynload, it is -O0 by default.

Q1: which users? I can't recall that.

Q2: optimize what, exactly? The overall code generation and simulation time, or just the simulation time?

-O0 is quite a bit slower and we don't want benchmarks to say OM is slow just because people won't bother to enable optimization.

Comparing ScalableTestSuite and ScalableTestSuite_noopt it seems that the simulation may be 20-30% slower with -O0, but the C compilation time may be 3-4 times faster. Of course this is a multi-objective optimization problem, so there is no unique optimal combination. But I'd claim the number of cases where that 20-30% is more important than the factor 3-4 are not that many.

In any case, benchmarks are a different story than daily OMEdit use. Besides, what we currently show in the benchmarks is that the bottleneck is almost invariably C compilation. Which is made even worse by the fact that the test cases are run on single threads, so we are not exploiting the parallel C compilation feature.

Again, the question is how do you define "slow" or "fast". What is the point of saving 0.5 s from simulation if it takes one more minute to generate the code? In my experience, most of the time people work interactively in a cycle including changing the code, recompiling, simulating, and analyzing the results. For that, fast compilation is key.

I am really concerned that when people compare OMEdit with, say, Dymola, they immediately notice one thing: with Dymola, you press "Simulate" and in most cases you get the result in a few seconds. With OMEdit, you need to wait at least half a minute before you get something. This really gives a bad impression, and also has a lot more impact on people's work than benchmark results.

This is my proposal. I think we should have one clear high-level option in the GUI:

  • optimize for compilation time
  • optimize for simulation time

and then let the end user decide what he or she wants. I would definitely put the first option as default, because that's what people are looking for most of the time, and most people don't read the manuals or the release notes, so they may not be aware of the impact of such an option on their daily life.

The best solution for slow compilation speeds would be to not expand everything :)

That we all know, but it will take a while until we get there, we need to do something now. The fact is, even compiling models with 10.000 equations (which are not "very large" by any means) takes a while, which can be really annoying when you are recompiling all the time. And the comparison with commercial tools is not good from this point of view.

Adding some users to the loop to get their feedback.

comment:14 in reply to: ↑ 13 Changed 4 years ago by jean-philippe.tavella@…

Replying to casella:

I am really concerned that when people compare OMEdit with, say, Dymola, they immediately notice one thing: with Dymola, you press "Simulate" and in most cases you get the result in a few seconds. With OMEdit, you need to wait at least half a minute before you get something. This really gives a bad impression, and also has a lot more impact on people's work than benchmark results.

For lots of people the first impression lasts a long time. Knowing that EDF is the bigger customer of DS for Dymola, it will be difficult to switch to OM, as we try to do for future (for financial reasons).
So I think your proposal to have a clear high-level option in the GUI:

  • optimize for compilation time first
  • or optimize for simulation time first

with default value for compilation time optimization, is a good idea.

comment:15 follow-up: Changed 4 years ago by casella

  • Owner changed from adrpo to adeas31
  • Priority changed from high to critical

@jean-philippe, good to hear you agree with me :)

I would then suggest to put a combo box in the Simulation Setup | General tab, right above "C/C++ Compiler flags", with two options

  • optimize for compilation time (default)
  • optimize for simulation time

Later on we may add more specific options.

What this thing should do, at least as a first implementation, could just be to add -O0 to compilation flags for the first option, and -Os for the second option. I understand the last -O flag overrides previously applied ones, so there is no need to change the makefile at the moment. I guess this can be implemented in no time.

Maybe -O3 is even better than -Os, but we need more testing for that.

Technically speaking this is a new feature, but I understand the implementation is straighforward, and it may actually be considered as a bugfix, in the sense that it improves the performance in most cases. @adrpo, @adeas31, could we sneak it in 1.17.0?

comment:16 in reply to: ↑ 15 Changed 4 years ago by sjoelund.se

Replying to casella:

Maybe -O3 is even better than -Os, but we need more testing for that.

There is not much difference in compilation speed between the two (or even between -O1 and -Os).

comment:17 Changed 4 years ago by casella

  • Resolution set to fixed
  • Status changed from assigned to closed

OK, @sjoelund.se did some interesting work that probably made my comments obsolete, as the issue can be solved in a better way than just by changing the default flags.

As a consequence, it may not be really necessary to have GUI support for the choice of compilation vs. simulation optimization. Or maybe we'll have it in a different form, possibly coupled with "Evaluate parameters" but there's no need to hurry once @sjoelund.se's improvements are merged in.

I'm closing this ticket, see #6367 to track further improvements.

Note: See TracTickets for help on using tickets.