Opened 9 years ago

Closed 9 years ago

Last modified 7 years ago

#3491 closed defect (fixed)

Back-end time scales very badly for large system with very simple structure

Reported by: Francesco Casella Owned by: Lennart Ochel
Priority: high Milestone:
Component: Backend Version:
Keywords: Cc: Adrian Pop, Willi Braun, andrea.bartolini@…, Vitalij Ruge

Description

Please consider the back-end times for the models ScalableTestSuite.Thermal.DistrictHeating.ScaledExperiments.HeatingSystem_N_XXX.

The structure of the system is extremely simple: index 1, no algebraic loops, very sparse Jacobian of the state derivative functions (most state derivatives depend on just 3 states). It shouldn't be too hard for the back-end structural analysis algorithms to find this out quickly.

The back-end time scales as O(N) up to size 40 (about 80 states). Then it scales as O(N2) up to size 160. Then, for larger sizes, it starts growing as O(N3.5) (!), so it already takes 20 minutes for a system with a mere 1200 states.

Give the extremely simple structure of the system I really see no reason why this should happen. Some basic profiling should quickly reveal which part of the back-end goes wrong.

Attachments (3)

HeatingSystem.pdf (369.3 KB ) - added by Lennart Ochel 9 years ago.
a first quick analysis of the back end performance
HeatingSystem_v1.9.4-dev.168+g1c35b32.pdf (369.3 KB ) - added by Lennart Ochel 9 years ago.
second analysis of the back end performance
HeatingSystem_v1.9.4-dev.539+gc45436a.pdf (398.8 KB ) - added by Lennart Ochel 9 years ago.

Download all attachments as: .zip

Change History (27)

by Lennart Ochel, 9 years ago

Attachment: HeatingSystem.pdf added

a first quick analysis of the back end performance

comment:1 by Lennart Ochel, 9 years ago

Status: newaccepted

This seems to be the same issue than #3490.

comment:2 by Francesco Casella, 9 years ago

The test case in #3490 did not have fully specified initial condition, but this one does, at least to my understanding. I have now added initial equations to the example of #3490, let's see what happens.

comment:3 by Francesco Casella, 9 years ago

Cc: Adrian Pop added

Some more comments and suggestions, after reading the detailed profiling report.

Regarding the report itself: is it easy to generate it? It might be very useful for our research work.

Regarding the time spent by analyzeInitialSystem (initialization). The system with N = 640 has 1281 variables appearing under derivative: Tu[640], x[640], Td. The very same variables have an attribute fixed = true, so it should be fairly straightforward to select them as states. The initial equations for them are trivially determined by the fixed = true attributes, and the remaining initial equations are the same which are found during sorting for simulation equations, which is carried out in 0.15 seconds. I would expect that the analysis doesn't take much longer than that. What does this function actually do, to scale up as O(N4) between N = 320 and N = 640, and take 4 minutes of CPU time on such a straightforward model?

Regarding the time spent by preOpt evaluateReplaceProtectedFinalEvaluateParameter, preOpt comSubExp, solveSimpleEquations (initialization), simplifyAllExpressions (simulation), SimCode generated analytical Jacobians. My naive understanding, given the structure of the system of equations, is that the complexity of all of them should be O(N). Am I mistaken? Why does the time spent going from N = 320 to N = 640 increase by a factor of 6 to 8? Is this due to the inherent complexity of the function, to the way the metamodelica code is translated into C-code, to the overhead of the runtime system / garbage collector / cache misses?

comment:4 by Adrian Pop, 9 years ago

Lennart, I suggest you run valgrind (Callgrind) on it over night to get a clearer picture.
For that i suggest you configure with -g to get a better view.
This way we can understand which parts of the functions are behaving badly and so on.

comment:5 by Willi Braun, 9 years ago

Cc: Willi Braun added

by Lennart Ochel, 9 years ago

second analysis of the back end performance

comment:6 by Lennart Ochel, 9 years ago

I fixed the bottleneck in 2602d0.

in reply to:  3 comment:7 by Lennart Ochel, 9 years ago

Replying to casella:

Regarding the report itself: is it easy to generate it? It might be very useful for our research work.

I generated the report using the flag +d=execstat. I am using a small tool to convert the output into a csv file.

comment:8 by Francesco Casella, 9 years ago

Cc: andrea.bartolini@… added

The performance of the analyzeInitialSystem function in the N = 640 case has been improved by three orders of magnitude in just three days (including a weekend) since I opened the ticket. This required changing less than 30 lines of source code.

May I say I am really impressed :)

May I also assume that there are similar issues with the remaining bottleneck functions, and that it might be quite easy to bring the backend time down from 59 to a few seconds?

comment:9 by Francesco Casella, 9 years ago

To be more specific, the latest attachment posted by Lennart reveals that there are still a few functions in the back-end whose execution time does not scale as expected for the larger benchmarks

  • evaluateReplaceProtectedFinalEvaluateParameters O(N3)
  • solveSimpleEquations (initialization) O(N3)
  • simplifyAllExpressions (initialization) O(N3)
  • simplifyAllExpressions (simulation) O(N3)
  • comSubExp O(N3) - this has been improved by Volker, see ticket:3478#comment:12, but I have no data on the improved algorithm

I would expect that, give the sparse structure of the system, their complexity is O(N) or, in the worst case, O(N2). In fact, this seems to be how the execution time actually scales up for moderate system sizes. Above a certain critical system size, the exponent of the complexity goes into high gear for some reason.

Please identify what is the root cause of this problem and remove it, if possible.

comment:10 by anonymous, 9 years ago

the modules should N(M) where M is the number of operation(+,×,÷,-).

comment:11 by Francesco Casella, 9 years ago

N is the number of instances of a certain submodel in the benchmark. Each instance has the same equations, which contain the same operation. So, to my understanding N (the number of submodels) and M (the number of operations) should be roughly proportional.

In the case of HeatingSystem there is also one big connection equation involving flow variables that sums up all the heat flows from the N units, but even in that case, the number of operations is proportional to N.

in reply to:  9 comment:12 by Lennart Ochel, 9 years ago

Replying to casella:

To be more specific, the latest attachment posted by Lennart reveals that there are still a few functions in the back-end whose execution time does not scale as expected for the larger benchmarks

  • evaluateReplaceProtectedFinalEvaluateParameters O(N3)
  • solveSimpleEquations (initialization) O(N3)
  • simplifyAllExpressions (initialization) O(N3)
  • simplifyAllExpressions (simulation) O(N3)
  • comSubExp O(N3) - this has been improved by Volker, see ticket:3478#comment:12, but I have no data on the improved algorithm

comSubExp scales better now, but the other modules are still the same.

Last edited 9 years ago by Lennart Ochel (previous) (diff)

by Lennart Ochel, 9 years ago

comment:13 by Francesco Casella, 9 years ago

Is there any place (paper, document, report, book) where I can get a rough understanding of what solveSimpleEquations and simplifyAllEquations actually do, and what their complexity is expected to be?

For instance, I understand that alias elimination should have complexity O(N2), because for each variable (there are K1*N variables) you have to look it up in K2*N expressions. That is, unless one uses some smart data structure that allows to avoid looking for each the variable in all expressions all the time, which could be useful for very large, very sparse models. I guess database algorithms do this all the time, though I am not really an expert in such techniques.

However, I cannot really understand how could it be O(N3), which seems to be the case in all the cited examples. Do you have any concrete example with that complexity?

in reply to:  13 comment:14 by Lennart Ochel, 9 years ago

Cc: Vitalij Ruge added

Replying to casella:

Is there any place (paper, document, report, book) where I can get a rough understanding of what solveSimpleEquations and simplifyAllEquations actually do, and what their complexity is expected to be?

I think Vitalij has implemented/reimplemented these modules. I added him to this ticket and maybe he can give some explanations.

comment:15 by Vitalij Ruge, 9 years ago

Lennart, is there a reason why do you use simplify in simplifyAllExpressions and not simplify1 (see #2647)??

The changes in 297 can help.

in reply to:  15 comment:16 by Lennart Ochel, 9 years ago

Replying to vitalij:

Lennart, is there a reason why do you use simplify in simplifyAllExpressions and not simplify1 (see #2647)??

No, there is no good reason for this. I just used simplify since it seemed to be the entry point of the simplification module. You should know better how it is implemented and what the differences are between simplify and simplify1. So can you please explain the differences?

comment:17 by Vitalij Ruge, 9 years ago

simplify:

 // Advanced (global) simplifications

see ExpressionSimplify.mo
simplify1:

//basic simplification

see ExpressionSimplify.mo

comment:19 by Francesco Casella, 9 years ago

I also noticed that generate backend data structure scales up neatly as O(N3), taking a substantial amount of time.

As the test model in question has O(N) equations with O(N) non-zero elements in the incidence matrix, I would suggest to have a look at this module as well.

comment:20 by Vitalij Ruge, 9 years ago

some notes for SimCode:

  • solveSimpleEquations is not actived for simulation(, because it was not possible before we had no seperating of simulation and initial dae...). so in SimCode we have solving of 1x1 equation for simulation (which shold have same issues like solveSimpleEquations).

Anyway it can be a part of the problem and possible not the problem.

comment:21 by Martin Sjölund, 9 years ago

Milestone: 1.9.41.9.5

Milestone pushed to 1.9.5

comment:22 by Martin Sjölund, 9 years ago

Milestone: 1.9.51.10.0

Milestone renamed

comment:23 by Francesco Casella, 9 years ago

Resolution: fixed
Status: acceptedclosed

The situation since the last reports in November has improved dramatically: the back-end time reported here for HeatingSystem_N_640 was 61.16, and is currently down to 4.85 (see here).

The scaling has also improved dramatically and is now much closer to linear.

Therefore, I am closing the ticket.

comment:24 by Martin Sjölund, 7 years ago

Milestone: 1.10.0

Milestone deleted

Note: See TracTickets for help on using tickets.