Opened 7 years ago

Last modified 3 years ago

#4845 assigned defect

Tearing of linear systems produces singular system out of moderately-sized, well-posed models

Reported by: casella Owned by: AnHeuermann
Priority: high Milestone:
Component: Backend Version:
Keywords: Cc: Karim.Abdelhak

Description

Please consider the ScalableTestSuite.Mechanical.HarmmonicOscillator.ScaledExperiments.HarmonicOscillatorNetwork_N_XX test models. They require solving a linear system of N equations to compute the accelerations of the N point masses involved. For N > 40, the solver fails repeatedly with error messages like:

Failed to solve linear system of equations (no. 322) at time 0.000000. 
Residual norm is 14.0094302893603.
The default linear solver fails, the fallback solver with total pivoting
is started at time 0.000000. That might raise performance issues, for
more information use -lv LOG_LS

In fact, the outcome is particularly bad, because the solver does not abort, but keeps on trying forever. Even if the Cancel simulation button is pressed, the .exe file keeps on running in the background unless it is killed with the Process Manager, which is quite bad.

The debugger reveals that the linear system of size N is very effectively torn, ending up with just one tearing variable. Unfortunately, the resulting torn system is ill-conditioned for even moderate values of N.

From what I understand, the problem is that the k+1-th torn variable is given by 3 times the k-th one, plus other terms. As a consequence, the N-th torn variable is ultimately depending on 3N times the tearing variable, which is obviously not going to be numerically well-posed for N much larger than 20.

The system per se is well-posed and solved without problems also for much larger sizes if tearing is switched off, and possibly a sparse solver is used for large values of N.

Clearly, tearing is not a good idea to solve this class of systems. I think we need some mechanism to identify them and prevent (or limit) the use of tearing in such cases.. The current behaviour, i.e., get an ill-posed system and failing badly, is not acceptable.

BTW, note that I did not build this test cases specifically to cause this outcome.

Change History (12)

comment:1 Changed 7 years ago by lochel

  • Owner lochel deleted
  • Status changed from new to assigned

comment:2 Changed 7 years ago by casella

  • Cc lochel added

@lochel why did you remove yourself as owner?

comment:3 Changed 7 years ago by lochel

I don't have time to work on it myself and since Patrick left the dev team, I don't know to whom this should be assigned.
So I removed myself to indicate that I cannot take care of this issue.

comment:4 Changed 7 years ago by casella

OK, it looked strange that the ticket was assigned to no-one, but now I understand you can remove yourself while leaving the ticket unassigned.

comment:5 Changed 6 years ago by casella

  • Milestone changed from 1.13.0 to 1.14.0

Rescheduled to 1.14.0 after 1.13.0 releasee

comment:6 Changed 5 years ago by casella

  • Milestone changed from 1.14.0 to 1.16.0

Releasing 1.14.0 which is stable and has many improvements w.r.t. 1.13.2. This issue is rescheduled to 1.16.0

comment:7 follow-up: Changed 5 years ago by casella

  • Cc Karim.Abdelhak added; wbraun ptaeuber bachmann lochel removed
  • Owner set to AnHeuermann

Interesting borderline case for tearing, discovered with the ScalableTestSuite library.

Tearing is ok from a structural point of view, but numerically it becomes ill-posed for N > 20.

Definitely not urgent, but anyway well worth having a look. Similar situations may arise in real life large user models, and will fail spectacularly.

comment:8 in reply to: ↑ 7 Changed 5 years ago by Karim.Abdelhak

Replying to casella:

Interesting borderline case for tearing, discovered with the ScalableTestSuite library.

Tearing is ok from a structural point of view, but numerically it becomes ill-posed for N > 20.

Definitely not urgent, but anyway well worth having a look. Similar situations may arise in real life large user models, and will fail spectacularly.

What makes it ill-posed and why at N > 20? Is the jacobian close to singularity?

comment:9 in reply to: ↑ description Changed 5 years ago by casella

It's explained in the description of the ticket

From what I understand, the problem is that the k+1-th torn variable is given by 3 times the k-th one, plus other terms. As a consequence, the N-th torn variable is ultimately depending on 3N times the tearing variable, which is obviously not going to be numerically well-posed for N much larger than 20.

x: tearing variable
vj: torn variables

v1 = 3*x;
v2 = 3*v1;
v3 = 3*v2;
...
VN = 3*V{N-1}

The sensitivity of the last torn variable to small variations of the tearing variable is 3N, which is really too much when N > 20, as a variation of 1e-10 of the tearing variable gives a variation of at least 0.36 of the torn variable. Hence, getting f(x) close enough to zero become very difficult to the solver because the effects of machine precision are greatly amplified.

Last edited 5 years ago by casella (previous) (diff)

comment:10 Changed 4 years ago by casella

  • Milestone changed from 1.16.0 to 1.17.0

Retargeted to 1.17.0 after 1.16.0 release

comment:11 Changed 4 years ago by casella

  • Milestone changed from 1.17.0 to 1.18.0

Retargeted to 1.18.0 because of 1.17.0 timed release.

comment:12 Changed 3 years ago by casella

  • Milestone 1.18.0 deleted

Ticket retargeted after milestone closed

Note: See TracTickets for help on using tickets.