Opened 8 years ago
Closed 7 years ago
#4453 closed defect (fixed)
Compilation Time of a model with large size arrays
Reported by: | Christos Theodosiou | Owned by: | Martin Sjölund |
---|---|---|---|
Priority: | high | Milestone: | 1.12.0 |
Component: | Backend | Version: | |
Keywords: | Cc: | Francesco Casella, Lennart Ochel |
Description
We have tried the following items:
1) Making a function for large array equations is slow (Pac_OnlyFunction)
2) Making an External function(c) for large array equations is fast (Pac_External_C_function)
We seek a way to do everything in OpenModelica.
Attachments (6)
Change History (19)
by , 8 years ago
Attachment: | Pac_OnlyFunction.zip added |
---|
by , 8 years ago
Attachment: | Pac_External_C_function.zip added |
---|
by , 8 years ago
Attachment: | Execution_Compiling_time.xlsx added |
---|
by , 8 years ago
by , 8 years ago
Attachment: | callgrind.out.14520 added |
---|
by , 8 years ago
Attachment: | callgraph.dot.ps added |
---|
comment:1 by , 8 years ago
comment:2 by , 7 years ago
Almost 50% of translation time is generating the equations:
Notification: Performance of Templates: time 56.31/126.1, allocations: 7.939 GB / 37.29 GB, free: 369.3 MB / 2.196 GB
And the generated code is mostly:
put_real_matrix_element(data->localData[0]->realVars[1482] /* initialXCoordinateValue_21._y[6,2] variable */, 5, 1, &tmp35); put_real_matrix_element(data->localData[0]->realVars[1483] /* initialXCoordinateValue_21._y[6,3] variable */, 5, 2, &tmp35); put_real_matrix_element(data->localData[0]->realVars[1484] /* initialXCoordinateValue_21._y[6,4] variable */, 5, 3, &tmp35);
Not expanding these arrays should give a big improvement in performance of the compiler. There is the compiler flag --vectorizationLimit=n
, but setting it to 2 only improved performance by ~33% and didn't seem to change the end result much (the limit mostly affects the frontend and somewhere in the backend the variable seems to be expanded again).
comment:3 by , 7 years ago
In the backend, this expansion is done by removeSimpleEquations. The new experimental version of that module fails on this model...
- -v=2 --removeSimpleEquations=none: 5.2s
- -v=2 --removeSimpleEquations=default: 89.98s
- -v=2 --removeSimpleEquations=causal: 47.85s
- -v=2 --removeSimpleEquations=fastAcausal: 89.69s
- -v=2 --removeSimpleEquations=allAcausal: 91.09s
- -v=2 --removeSimpleEquations=new: fail
I guess it could be possible to detect the vectorization limit and stop the expansion from happening in removeSimpleEquations. But if you just want to run your models faster, use a small vectorization limit (like -v=2) and --removeSimpleEquations=none. This might cause some models to grow in size/execution time, but it will not expand this case and generates smaller code (~68MB to ~356 kB for the initialization problem).
comment:4 by , 7 years ago
Cc: | added |
---|---|
Owner: | changed from | to
Status: | new → assigned |
comment:5 by , 7 years ago
I also think the vectorization limit has a bug somewhere and it checks each dimension individually. If you set -v=36 on an 18x36 matrix, it vectorizes the entire thing. You have to set the limit lower or equal to the greatest dimension in order to stop the expansion; I would have assumed lower or equal than the product of all dimensions...
comment:6 by , 7 years ago
The issue with removeSimpleEquations is well-known; ticket #3695 was opened one and a half years ago, but there's been no progress since then, and I understand Willi will have other priorities for many months to come.
Is anybody else out there willing to take over the completion of the new removeSimpleEquation module, possibly under Willi's directions? I'm not sure how much work is left, but maybe he or Bernhard can comment about that.
comment:7 by , 7 years ago
I found one place where I can fix the expansion of functions greater than some size in removeSimpleEquations and then I found a replacement from the original variable to an alias that seems wrong: it has something like:ch
x=y;
With replacements for x[1]...x[n] -> y[1]...y[n]
which causes the entire thing to expand since there is no replacement rule for x -> y
...
follow-up: 9 comment:8 by , 7 years ago
The problem with my fix to not expand equations is that it breaks other models that require the equations to be scalarized...
My fix to collapse equations like {x[1],x[2],x[3]}
into x
seems to work a bit better though. And without the above fix still improves translation time to 1/3 of the original.
I think what's needed is just detecting:
x = f(...); // Array equation
And simply not expanding it into:
x[1] = f(...)[1]; x[...] = f(...)[...]; x[n] = f(...)[n];
(Which is the later on treated with common subexpression elimination)
comment:10 by , 7 years ago
It might have an effect on #4129 in some cases. I can't push all my fixes for this because the code generator cannot handle everything (and I don't have time to fix it). For example, x[7].y
has to be expanded to {x[7].y[1], ...
because otherwise incorrect offsets are generated. Also, in #4129 the equations have been expanded and many multibody models seem to depend on this or unsolvable strong connected components are generated. I thought #4129 would sort of be resolved by CSE (resolveLoops) though.
I will take a look at #4129 when I am done with the things I can fix in this ticket though.
comment:11 by , 7 years ago
I might have to revert the PR due to the regression testing. It broke C++ runtime (fixable), but also... there is a problem with dummy states. {x[1], ..., x[n]}
might be an array where x[1]
is a dummy state and not stored in the same memory location as the other variables.
comment:12 by , 7 years ago
The thing with dummy states is handled by expanding the array again if necessary during codegen
comment:13 by , 7 years ago
Component: | *unknown* → Backend |
---|---|
Milestone: | Future → 1.12.0 |
Resolution: | → fixed |
Status: | assigned → closed |
Fixed the reported performance issue and cleared the regressions:
https://github.com/OpenModelica/OMCompiler/pull/1749
https://github.com/OpenModelica/OMCompiler/pull/1745
https://github.com/OpenModelica/OMCompiler/pull/1736
https://github.com/OpenModelica/OMCompiler/pull/1735
https://github.com/OpenModelica/OMCompiler/pull/1731
Note that the performance will degrade in many of the non-trivial cases
Uploaded main.mos the test script.
Also uploaded valgrind profiling results.
It seems that:
valueEq
(called ~3e8 times) which is actually a thin wrapper ofvalueCompare
. ThevalueCompare
recursively calls its self about 4.6e9 times.gc
garbage collector seems to consume ~38% of the run time