Opened 9 years ago

Closed 7 years ago

#3748 closed defect (fixed)

Tearing consuming too much memory

Reported by: sjoelund.se Owned by: sjoelund.se
Priority: high Milestone: 1.11.0
Component: Backend Version: v1.9.4-dev-nightly
Keywords: Cc: casella, lochel, ptaeuber, bachmann

Description

ScalableTestSuite.Electrical.DistributionSystemDC.ScaledExperiments.DistributionSystemModelica_N_56_M_56 consume an enormous amount of memory during postOpt tearingSystem (initialization):

Notification: Performance of postOpt simplifyComplexFunction (initialization): time 0.005682/281.8, memory: 0.8594 MB / 66.34 GB
Notification: Performance of postOpt tearingSystem (initialization): time 6857/7138, memory: 1.845 TB / 1.91 TB
Notification: Performance of postOpt calculateStrongComponentJacobians (initialization): time 157.6/7296, memory: 19.24 GB / 1.928 TB

This should probably be investigated. For the larger model (N_160_M_160), it is still in the front-end connection handling after running for 3 hours. But if it would pass through to the backend, it would get stuck there as well.

Change History (12)

comment:1 Changed 9 years ago by lochel

  • Cc ptaeuber added

comment:2 follow-up: Changed 9 years ago by sjoelund.se

For ScalableTestSuite, it is mostly tearing, removeSimpleEquations, and evalFunc taking up lots and lots of memory.

comment:3 follow-up: Changed 9 years ago by casella

For these kind of models, tearing is simply not an option: it will take forever, even if naive implementations are avoided, and the result will often be very sparse anyway. One should avoid tearing and use a sparse solver instead.

Regarding the testsuite, this could be managed easily if #3488 is implemented.

Another option is to implement #3487, so that for algebraic systems over a certain size, tearing is skipped. Please note it is also essential to use simflags = "-ls=klu" or simflags="-ls=umfpack" to solve the resulting very sparse, very large systems efficiently.

This last feature should in fact be automatically set for linear systems above a certain size and below a certain density ratio. The thresholds should be customizable, with reasonable defaults. It doesn't really make any sense to ignore the available sparse solvers to for systems with thousands of unknowns and a density ratio below 1%, stubbornly sticking to the default dense LAPACK solvers.

In fact, having only one system-wide setting for -ls and --disableLinearTearing doesn't make much sense either. Small systems should be handled by tearing and dense solvers, larger ones without tearing and sparse solvers, see the discussion in #3487.

comment:4 in reply to: ↑ 2 Changed 9 years ago by casella

Replying to sjoelund.se:

For ScalableTestSuite, it is mostly tearing, removeSimpleEquations, and evalFunc taking up lots and lots of memory.

removeSimpleEquations is known to be slow. removeSimpleEquation=new is known to be faster, but the implementation is still incomplete, so it can't be used by default. See #3695. Anyone willing to help the Bielefeld guys to get through it?

comment:5 Changed 9 years ago by sjoelund.se

My thought would be to use 2 different removeSimpleEquations: one really fast, imprecise one first (doing correct things, but not finding all simple equations).

comment:6 Changed 9 years ago by sjoelund.se

That's what... 9500 iteration variables followed by a 3000x3000 sparse linear system? :)

Linear torn systems: 1 {(3137,1.1%) 9518}

I guess we should not even consider trying to tear this even if the user explicitly set a tearing limit this high (or at least add a warning for such a high tearing limit).

comment:7 Changed 9 years ago by casella

Well, there might be some special cases where tearing coud be very efficient at runtime. For instance, if I'm not mistaken, a tri-diagonal system of N equations can be reduced to N-2 assignments and a 2x2 system, no matter how large N is. If this system has to be solved a lot of times, it might be worth spending some more time on tearing during code generation, to produce code that can solve the system efficiently at runtime.

There are also cases where tearing could be advantageous w.r.t. sparse solvers. I understand there is a vast body of literature on these topics, some of it quite old (see, e.g., http://www.sciencedirect.com/science/article/pii/0022247X74901796).

I would not try to improvise, and leave this to the experts (i.e., mathematicians). Maybe Bernhard can give some more scientifically sound advice on how to proceed here.

comment:8 in reply to: ↑ 3 Changed 9 years ago by ptaeuber

Replying to sjoelund.se:

That's what... 9500 iteration variables followed by a 3000x3000 sparse linear system? :)

Linear torn systems: 1 {(3137,1.1%) 9518}

It is 3000 iteration variables and 9500 "other" variables solved explicitly. So the system size is reduced from 12500 to 3000.

Replying to casella:

Another option is to implement #3487, so that for algebraic systems over a certain size, tearing is skipped.

I think we should differentiate between tearing for initialization and tearing for simulation. In many cases such large systems only occur for initialization while the same set can be split into several smaller sets for simulation (see results from scalable testsuite model. Tearing for simulation "only" takes 42 GB instead of 1.8 TB) So, for initialization it is indeed too much effort to tear the system, because it is only calculated once during simulation.
As mentioned in #3487 it could be wortwhile to spend some more effort in tearing for simulation sets, depending on the end-user's need.

@sjoelund.se
Maybe you could run the same model(s) again with another tearing heuristic, e.g. +tearingHeuristic=MC1, because the default heuristic is really expensive. The difference to a simplier heuristic would be interesting for me, too.

comment:9 Changed 9 years ago by ptaeuber

  • Cc bachmann added

comment:10 Changed 9 years ago by casella

For large power generation and transmission system models, I reckon tearing is really out of the question. New info comes every 15 minutes and requires to re-generate the simulation code from scratch, so it is really important to lower the code generation time. In fact, this is currently the major bottleneck, much more than simulation time, which can easily be mastered by running the different simulation scenarios in parallel. Also, 42 GB is not the average quantity of RAM you find on computers meant for this kind of job.

Anyway, it is definitely worth investigating all the trade-offs between compile-time and run-time for the three alternatives:

  • no tearing, sparse solver(s)
  • tearing, sparse solver(s)
  • tearing, dense solver

on a number of test cases. The ones in ScalableTestSuite are somewhat artificial, because the network structures are very regular, in order to be generated automatically. I have real-life test cases at hand, though they are confidential and I cannot post them here. Let me know if you are interested.

comment:11 Changed 7 years ago by sjoelund.se

  • Milestone changed from Future to 1.11.0
  • Owner changed from lochel to sjoelund.se
  • Status changed from new to assigned

comment:12 Changed 7 years ago by sjoelund.se

  • Resolution set to fixed
  • Status changed from assigned to closed

This was fixed for 1.11; memory consumed is then 1.3MB instead of 2TB for the given phase.

Note: See TracTickets for help on using tickets.