Opened 8 years ago
Closed 4 years ago
#4033 closed enhancement (fixed)
Default clang flags for faster build time
Reported by: | Francesco Casella | Owned by: | Adeel Asghar |
---|---|---|---|
Priority: | critical | Milestone: | |
Component: | Run-time | Version: | |
Keywords: | Cc: | Adeel Asghar, jean-philippe.tavella@…, Andrea Bartolini, massimo ceraolo |
Description
Compare the performance of the ScalableTestSuite when the C code is compared with the flag -Os
and with the flag -O0
It is apparent that the clang optimizations cause the C code compilation time to scale as O(N2) when using -Os
, while it scales as O(N) when using -O0
. For larger models, using -Os
causes the C code compilation time to blow up to times over one hour, which is totally unacceptable.
The use of -Os
leads to a reduction in simulation time between 25% and 50%, which is significant, but almost never pays off if the total build+simulation time is considered. In fact, the only cases when -Os
pays off if the total time is considered are the BreakerNetwork and HeatingSystem test cases. In the case of BreakerNetwork, this is only true because the event handling is still very inefficient (the sparse solver is not used for event processing), so the simulation time grows very quickly.
I would therefore suggest changing the default optimization flag to -O0
, at least until #3964 is implemented. Once the C source code is split in more smaller files, the situation might need to be reconsidered.
Change History (17)
comment:1 by , 8 years ago
comment:3 by , 8 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
OK. I would then suggest to remove the -Os thing from the default setup of the LibraryExperimental Hudson job, because it is clearly not a good idea, at least in term of time spent covering the whole library.
comment:5 by , 4 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
comment:6 by , 4 years ago
Cc: | added |
---|---|
Owner: | changed from | to
Status: | reopened → assigned |
comment:7 by , 4 years ago
I just ran an example in OMEdit, here is an excerpt of the compilation log
C:/Program Files/OpenModelica1.17.0-dev-64bit/share/omc/scripts/Compile.bat Modelica.Blocks.Examples.Filter gcc mingw64 parallel 8 0 PATH = "C:\PROGRA~1\OPENMO~1.0-D\tools\msys\mingw64\bin;C:\PROGRA~1\OPENMO~1.0-D\tools\msys\mingw64\bin\..\..\usr\bin;" mingw32-make: Entering directory 'd:/Temp/OMEdit/Modelica.Blocks.Examples.Filter' gcc -Os -falign-functions -mstackrealign -msse2 -mfpmath=sse -I"C:/Program Files/OpenModelica1.17.0-dev-64bit/include/omc/c" -I. -DOPENMODELICA_XML_FROM_FILE_AT_RUNTIME -DOMC_MODEL_PREFIX=Modelica_Blocks_Examples_Filter -DOMC_NUM_MIXED_SYSTEMS=0 -DOMC_NUM_LINEAR_SYSTEMS=3 -DOMC_NUM_NONLINEAR_SYSTEMS=0 -DOMC_NDELAY_EXPRESSIONS=0 -DOMC_NVAR_STRING=0 -c -o Modelica.Blocks.Examples.Filter.o Modelica.Blocks.Examples.Filter.c
gcc is clearly called with -Os
, and it just takes forever to compile. Is this an OMEdit-specific setup, or is it the default one? Could we just switch to -O0 as the default choice? That is for sure the fastest one in >95% of the cases.
follow-up: 11 comment:9 by , 4 years ago
I have no idea why we use -Os
by default (and is hardcoded in the Makefile).
We should have a setting for it in Tools->Options->Simulation right below the compiler setting.
Truth is that right now you can press S and then add -O0
in "C/C++ compiler flags (Optional)".
comment:10 by , 4 years ago
Replying to adrpo:
Why do you have gcc? Clang should be the default in 1.17.
Sorry, I made some experiments with gcc and forgot to reset to default. This is the output with clang
C:/Program Files/OpenModelica1.17.0-dev-64bit/share/omc/scripts/Compile.bat Modelica.Blocks.Examples.Filter gcc mingw64 parallel 8 0 PATH = "C:\PROGRA~1\OPENMO~1.0-D\tools\msys\mingw64\bin;C:\PROGRA~1\OPENMO~1.0-D\tools\msys\mingw64\bin\..\..\usr\bin;" mingw32-make: Entering directory 'd:/Temp/OMEdit/Modelica.Blocks.Examples.Filter' clang -Os -falign-functions -mstackrealign -msse2 -mfpmath=sse -I"C:/Program Files/OpenModelica1.17.0-dev-64bit/include/omc/c" -I. -DOPENMODELICA_XML_FROM_FILE_AT_RUNTIME -DOMC_MODEL_PREFIX=Modelica_Blocks_Examples_Filter -DOMC_NUM_MIXED_SYSTEMS=0 -DOMC_NUM_LINEAR_SYSTEMS=3 -DOMC_NUM_NONLINEAR_SYSTEMS=0 -DOMC_NDELAY_EXPRESSIONS=0 -DOMC_NVAR_STRING=0 -c -o Modelica.Blocks.Examples.Filter.o Modelica.Blocks.Examples.Filter.c
and indeed we use -Os
there as well.
comment:11 by , 4 years ago
Replying to adrpo:
I have no idea why we use
-Os
by default (and is hardcoded in the Makefile).
Neither have I. Maybe @sjoelund.se knows.
We should have a setting for it in Tools->Options->Simulation right below the compiler setting.
I guess so. And also a compiler option flag, that can be stored in a vendor-specific annotation.
Truth is that right now you can press S and then add
-O0
in "C/C++ compiler flags (Optional)".
Sure. The problem is, 99% of users have no idea about this. So we should make this the default, and -Os (or -O3, or whatever) the customized choices for the pros.
follow-up: 13 comment:12 by , 4 years ago
Neither have I. Maybe @sjoelund.se knows.
Because it's been requested by users to optimize by default. For dynload, it is -O0 by default.
-O0 is quite a bit slower and we don't want benchmarks to say OM is slow just because people won't bother to enable optimization.
The best solution for slow compilation speeds would be to not expand everything :)
follow-up: 14 comment:13 by , 4 years ago
Cc: | added |
---|
Replying to sjoelund.se:
Neither have I. Maybe @sjoelund.se knows.
Because it's been requested by users to optimize by default. For dynload, it is -O0 by default.
Q1: which users? I can't recall that.
Q2: optimize what, exactly? The overall code generation and simulation time, or just the simulation time?
-O0 is quite a bit slower and we don't want benchmarks to say OM is slow just because people won't bother to enable optimization.
Comparing ScalableTestSuite and ScalableTestSuite_noopt it seems that the simulation may be 20-30% slower with -O0
, but the C compilation time may be 3-4 times faster. Of course this is a multi-objective optimization problem, so there is no unique optimal combination. But I'd claim the number of cases where that 20-30% is more important than the factor 3-4 are not that many.
In any case, benchmarks are a different story than daily OMEdit use. Besides, what we currently show in the benchmarks is that the bottleneck is almost invariably C compilation. Which is made even worse by the fact that the test cases are run on single threads, so we are not exploiting the parallel C compilation feature.
Again, the question is how do you define "slow" or "fast". What is the point of saving 0.5 s from simulation if it takes one more minute to generate the code? In my experience, most of the time people work interactively in a cycle including changing the code, recompiling, simulating, and analyzing the results. For that, fast compilation is key.
I am really concerned that when people compare OMEdit with, say, Dymola, they immediately notice one thing: with Dymola, you press "Simulate" and in most cases you get the result in a few seconds. With OMEdit, you need to wait at least half a minute before you get something. This really gives a bad impression, and also has a lot more impact on people's work than benchmark results.
This is my proposal. I think we should have one clear high-level option in the GUI:
- optimize for compilation time
- optimize for simulation time
and then let the end user decide what he or she wants. I would definitely put the first option as default, because that's what people are looking for most of the time, and most people don't read the manuals or the release notes, so they may not be aware of the impact of such an option on their daily life.
The best solution for slow compilation speeds would be to not expand everything :)
That we all know, but it will take a while until we get there, we need to do something now. The fact is, even compiling models with 10.000 equations (which are not "very large" by any means) takes a while, which can be really annoying when you are recompiling all the time. And the comparison with commercial tools is not good from this point of view.
Adding some users to the loop to get their feedback.
comment:14 by , 4 years ago
Replying to casella:
I am really concerned that when people compare OMEdit with, say, Dymola, they immediately notice one thing: with Dymola, you press "Simulate" and in most cases you get the result in a few seconds. With OMEdit, you need to wait at least half a minute before you get something. This really gives a bad impression, and also has a lot more impact on people's work than benchmark results.
For lots of people the first impression lasts a long time. Knowing that EDF is the bigger customer of DS for Dymola, it will be difficult to switch to OM, as we try to do for future (for financial reasons).
So I think your proposal to have a clear high-level option in the GUI:
- optimize for compilation time first
- or optimize for simulation time first
with default value for compilation time optimization, is a good idea.
follow-up: 16 comment:15 by , 4 years ago
Owner: | changed from | to
---|---|
Priority: | high → critical |
@jean-philippe, good to hear you agree with me :)
I would then suggest to put a combo box in the Simulation Setup | General tab, right above "C/C++ Compiler flags", with two options
- optimize for compilation time (default)
- optimize for simulation time
Later on we may add more specific options.
What this thing should do, at least as a first implementation, could just be to add -O0
to compilation flags for the first option, and -Os
for the second option. I understand the last -O flag overrides previously applied ones, so there is no need to change the makefile at the moment. I guess this can be implemented in no time.
Maybe -O3
is even better than -Os
, but we need more testing for that.
Technically speaking this is a new feature, but I understand the implementation is straighforward, and it may actually be considered as a bugfix, in the sense that it improves the performance in most cases. @adrpo, @adeas31, could we sneak it in 1.17.0?
comment:16 by , 4 years ago
Replying to casella:
Maybe
-O3
is even better than-Os
, but we need more testing for that.
There is not much difference in compilation speed between the two (or even between -O1 and -Os).
comment:17 by , 4 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
OK, @sjoelund.se did some interesting work that probably made my comments obsolete, as the issue can be solved in a better way than just by changing the default flags.
As a consequence, it may not be really necessary to have GUI support for the choice of compilation vs. simulation optimization. Or maybe we'll have it in a different form, possibly coupled with "Evaluate parameters" but there's no need to hurry once @sjoelund.se's improvements are merged in.
I'm closing this ticket, see #6367 to track further improvements.
I just realized that
-Os
was set explicitly in setup strings of the LibraryExperimental Hudson job, and is by no means omc's default setting.I'm not sure what is the clang optimization default in omc, I don't see any
-O*
strings in the console output, so I assume it is clang's default. However, I couldn't understand from the online manual of clang what that is. Is it-O0
(no optimization)? Someone can help?