#3491 closed defect (fixed)
Back-end time scales very badly for large system with very simple structure
Reported by: | Francesco Casella | Owned by: | Lennart Ochel |
---|---|---|---|
Priority: | high | Milestone: | |
Component: | Backend | Version: | |
Keywords: | Cc: | Adrian Pop, Willi Braun, andrea.bartolini@…, Vitalij Ruge |
Description
Please consider the back-end times for the models ScalableTestSuite.Thermal.DistrictHeating.ScaledExperiments.HeatingSystem_N_XXX
.
The structure of the system is extremely simple: index 1, no algebraic loops, very sparse Jacobian of the state derivative functions (most state derivatives depend on just 3 states). It shouldn't be too hard for the back-end structural analysis algorithms to find this out quickly.
The back-end time scales as O(N) up to size 40 (about 80 states). Then it scales as O(N2) up to size 160. Then, for larger sizes, it starts growing as O(N3.5) (!), so it already takes 20 minutes for a system with a mere 1200 states.
Give the extremely simple structure of the system I really see no reason why this should happen. Some basic profiling should quickly reveal which part of the back-end goes wrong.
Attachments (3)
Change History (27)
by , 9 years ago
Attachment: | HeatingSystem.pdf added |
---|
comment:2 by , 9 years ago
follow-up: 7 comment:3 by , 9 years ago
Cc: | added |
---|
Some more comments and suggestions, after reading the detailed profiling report.
Regarding the report itself: is it easy to generate it? It might be very useful for our research work.
Regarding the time spent by analyzeInitialSystem (initialization)
. The system with N = 640
has 1281 variables appearing under derivative: Tu[640], x[640], Td
. The very same variables have an attribute fixed = true
, so it should be fairly straightforward to select them as states. The initial equations for them are trivially determined by the fixed = true
attributes, and the remaining initial equations are the same which are found during sorting for simulation equations, which is carried out in 0.15 seconds. I would expect that the analysis doesn't take much longer than that. What does this function actually do, to scale up as O(N4) between N = 320 and N = 640, and take 4 minutes of CPU time on such a straightforward model?
Regarding the time spent by preOpt evaluateReplaceProtectedFinalEvaluateParameter
, preOpt comSubExp
, solveSimpleEquations (initialization)
, simplifyAllExpressions (simulation)
, SimCode generated analytical Jacobians
. My naive understanding, given the structure of the system of equations, is that the complexity of all of them should be O(N). Am I mistaken? Why does the time spent going from N = 320
to N = 640
increase by a factor of 6 to 8? Is this due to the inherent complexity of the function, to the way the metamodelica code is translated into C-code, to the overhead of the runtime system / garbage collector / cache misses?
comment:4 by , 9 years ago
Lennart, I suggest you run valgrind (Callgrind) on it over night to get a clearer picture.
For that i suggest you configure with -g to get a better view.
This way we can understand which parts of the functions are behaving badly and so on.
comment:5 by , 9 years ago
Cc: | added |
---|
by , 9 years ago
Attachment: | HeatingSystem_v1.9.4-dev.168+g1c35b32.pdf added |
---|
second analysis of the back end performance
comment:7 by , 9 years ago
Replying to casella:
Regarding the report itself: is it easy to generate it? It might be very useful for our research work.
I generated the report using the flag +d=execstat. I am using a small tool to convert the output into a csv file.
comment:8 by , 9 years ago
Cc: | added |
---|
The performance of the analyzeInitialSystem
function in the N = 640
case has been improved by three orders of magnitude in just three days (including a weekend) since I opened the ticket. This required changing less than 30 lines of source code.
May I say I am really impressed :)
May I also assume that there are similar issues with the remaining bottleneck functions, and that it might be quite easy to bring the backend time down from 59 to a few seconds?
follow-up: 12 comment:9 by , 9 years ago
To be more specific, the latest attachment posted by Lennart reveals that there are still a few functions in the back-end whose execution time does not scale as expected for the larger benchmarks
evaluateReplaceProtectedFinalEvaluateParameters
O(N3)solveSimpleEquations (initialization)
O(N3)simplifyAllExpressions (initialization)
O(N3)simplifyAllExpressions (simulation)
O(N3)comSubExp
O(N3) - this has been improved by Volker, see ticket:3478#comment:12, but I have no data on the improved algorithm
I would expect that, give the sparse structure of the system, their complexity is O(N) or, in the worst case, O(N2). In fact, this seems to be how the execution time actually scales up for moderate system sizes. Above a certain critical system size, the exponent of the complexity goes into high gear for some reason.
Please identify what is the root cause of this problem and remove it, if possible.
comment:11 by , 9 years ago
N is the number of instances of a certain submodel in the benchmark. Each instance has the same equations, which contain the same operation. So, to my understanding N (the number of submodels) and M (the number of operations) should be roughly proportional.
In the case of HeatingSystem
there is also one big connection equation involving flow variables that sums up all the heat flows from the N units, but even in that case, the number of operations is proportional to N.
comment:12 by , 9 years ago
Replying to casella:
To be more specific, the latest attachment posted by Lennart reveals that there are still a few functions in the back-end whose execution time does not scale as expected for the larger benchmarks
evaluateReplaceProtectedFinalEvaluateParameters
O(N3)solveSimpleEquations (initialization)
O(N3)simplifyAllExpressions (initialization)
O(N3)simplifyAllExpressions (simulation)
O(N3)comSubExp
O(N3) - this has been improved by Volker, see ticket:3478#comment:12, but I have no data on the improved algorithm
comSubExp
scales better now, but the other modules are still the same.
by , 9 years ago
Attachment: | HeatingSystem_v1.9.4-dev.539+gc45436a.pdf added |
---|
follow-up: 14 comment:13 by , 9 years ago
Is there any place (paper, document, report, book) where I can get a rough understanding of what solveSimpleEquations and simplifyAllEquations actually do, and what their complexity is expected to be?
For instance, I understand that alias elimination should have complexity O(N2), because for each variable (there are K1*N variables) you have to look it up in K2*N expressions. That is, unless one uses some smart data structure that allows to avoid looking for each the variable in all expressions all the time, which could be useful for very large, very sparse models. I guess database algorithms do this all the time, though I am not really an expert in such techniques.
However, I cannot really understand how could it be O(N3), which seems to be the case in all the cited examples. Do you have any concrete example with that complexity?
comment:14 by , 9 years ago
Cc: | added |
---|
Replying to casella:
Is there any place (paper, document, report, book) where I can get a rough understanding of what solveSimpleEquations and simplifyAllEquations actually do, and what their complexity is expected to be?
I think Vitalij has implemented/reimplemented these modules. I added him to this ticket and maybe he can give some explanations.
follow-up: 16 comment:15 by , 9 years ago
comment:16 by , 9 years ago
Replying to vitalij:
Lennart, is there a reason why do you use simplify in simplifyAllExpressions and not simplify1 (see #2647)??
No, there is no good reason for this. I just used simplify
since it seemed to be the entry point of the simplification module. You should know better how it is implemented and what the differences are between simplify
and simplify1
. So can you please explain the differences?
comment:17 by , 9 years ago
simplify:
// Advanced (global) simplifications
see ExpressionSimplify.mo
simplify1:
//basic simplification
see ExpressionSimplify.mo
comment:18 by , 9 years ago
don't forget the notes from https://trac.openmodelica.org/OpenModelica/ticket/2647#comment:7
comment:19 by , 9 years ago
I also noticed that generate backend data structure
scales up neatly as O(N3), taking a substantial amount of time.
As the test model in question has O(N) equations with O(N) non-zero elements in the incidence matrix, I would suggest to have a look at this module as well.
comment:20 by , 9 years ago
some notes for SimCode:
- solveSimpleEquations is not actived for simulation(, because it was not possible before we had no seperating of simulation and initial dae...). so in SimCode we have solving of 1x1 equation for simulation (which shold have same issues like solveSimpleEquations).
Anyway it can be a part of the problem and possible not the problem.
comment:23 by , 9 years ago
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
a first quick analysis of the back end performance