Opened 8 years ago

Last modified 3 years ago

#4362 new discussion

Always check how new methods scale with the problem size

Reported by: casella Owned by: lochel
Priority: critical Milestone:
Component: Backend Version:
Keywords: Cc:

Description

Dear developers,

allow me this reflection triggered by the discussion on PR 1561.

When we started to play around with models with more than 100000 equations, interesting results popped out. In particular, it turned out that many methods in the back-end and code-generation phase had an execution time and/or memory requirements scaling as O(N2) or, even worse, O(N3). Which means that above a certain threshold size, some methods take a ludicrous amount of time (tens of minutes) and space (tens of Gigabytes) to complete, which was totally unjustified.

With help from Martin and Lennart we got rid of some of the worst instances of this problem, see, e.g., #3491, #3681, #3685, #4005. However, there are still many that should be fixed, see #4146 for a non-exhaustive list of pending issues.

There's nothing wrong per se with algorithms with superlinear complexity. The problem is if this complexity stems from a careless implementation: if every step of an O(N) algorithm actually takes O(N) seconds because the wrong data structure is used, we end up with O(N2) execution time.

In the past most of the focus was in getting OMC to work, even on the simplest models. Now that we are approaching 100% coverage with the new front-end (which will also be way faster, see #4071), the bottleneck will increasingly be the time spent by the back-end and code-generation phases. IMHO it is very important to avoid introducing further methods that do not scale well, besides fixing existing models that already do not do so.

I would propose the following guidelines:

  • Experimental methods, which are turned off by default, are allowed to have bad scaling performance. The focus in this case is on the concept and on the correctness of the algorithm, not on the efficiency of implementation
  • Production-quality methods, which are turned on by default, should instead have strict requirements on performance, which should scale as the theoretical complexity of the algorithm, not more than that
  • Scaling properties can be easily checked by running the LibraryExperimental task on Hudson onto the ScalableTestSuite library, setting the appropriate flags. Some time ago Lennart developed a script to generate nice reports, see e.g. this example. He could probably put it on GitHub somewhere so everybody can use it, or it could even be triggered automatically by Hudson
  • If some algorithms still have superlinear complexity because of intrinsic reasons, an upper limit should be set on the problem size, so that the algorithm is not applied to systems with larger size. A flag to change the value of this threshold should be added

I would also invite Martin and/or Lennart to prepare and link a short wiki page with hints and programming patterns to avoid this kind of problems (e.g. use tail recursion instead of plain recursion, or prefer some data structures to others in certain contexts). Some developers are not really experts in these efficiency issues, and it would be good if we capitalize on what some of us have already learned by spreading the information, rather than repeating the same errors times and again.

Comments?

Change History (6)

comment:1 Changed 7 years ago by casella

  • Milestone changed from 1.12.0 to 1.13.0

Milestone changed to 1.13.0 since 1.12.0 was released

comment:2 Changed 6 years ago by casella

  • Milestone changed from 1.13.0 to 1.14.0

Rescheduled to 1.14.0 after 1.13.0 releasee

comment:3 Changed 5 years ago by casella

  • Milestone changed from 1.14.0 to 1.16.0

Releasing 1.14.0 which is stable and has many improvements w.r.t. 1.13.2. This issue is rescheduled to 1.16.0

comment:4 Changed 4 years ago by casella

  • Milestone changed from 1.16.0 to 1.17.0

Retargeted to 1.17.0 after 1.16.0 release

comment:5 Changed 4 years ago by casella

  • Milestone changed from 1.17.0 to 1.18.0

Retargeted to 1.18.0 because of 1.17.0 timed release.

comment:6 Changed 3 years ago by casella

  • Milestone 1.18.0 deleted

Ticket retargeted after milestone closed

Note: See TracTickets for help on using tickets.