Opened 4 years ago

Last modified 3 years ago

#6436 new defect

Segmentation fault with non-linear Newton solver

Reported by: Andreas Heuermann Owned by: Andreas Heuermann
Priority: high Milestone:
Component: Run-time Version: 1.18.0-dev
Keywords: non-linear loop, newton Cc:

Description

When running the example from https://trac.openmodelica.org/OpenModelica/ticket/6409 with the newton solver as non-linear solver we get a segmentation fault:

Limited backtrace at point of segmentation fault
/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7f0e485c23c0]
/home/andreas/workspace/OpenModelica/build/bin/../lib/x86_64-linux-gnu/omc/libSimulationRuntimeC.so(solveNewton+0x3e)[0x7f0e4918097e]
/home/andreas/workspace/OpenModelica/build/bin/../lib/x86_64-linux-gnu/omc/libSimulationRuntimeC.so(solveNLS+0xd2)[0x7f0e4915a802]
/home/andreas/workspace/OpenModelica/build/bin/../lib/x86_64-linux-gnu/omc/libSimulationRuntimeC.so(solve_nonlinear_system+0x40e)[0x7f0e4915ae7e]
[...]
Segmentation fault

Since the model isn't public I'll try to solve this or find an example to show the segmentation fault.

The problem is that in solveNewton in nonlinearSolverNewton.c#L190 solverData is NULL, so solverData->ftol will crash. But I don't know why it is NULL.

Change History (8)

comment:2 by Andreas Heuermann, 4 years ago

Okay, the problem is because we are doing something like this:

    solverData = nonlinsys->solverData;
    nonlinsys->solverData = solverData->ordinaryData;
    success = solveNewton(data, threadData, sysNumber);
    nonlinsys->solverData = solverData;

and inside solveNewton() we trigger an assert, throw and jump to the next catch block without reverting nonlinsys->solverData.

Putting a try-catch-block arround solveNewton() solves this. But this is very error-prone so I will just change it to work without this pointer back and forth moving.

comment:3 by Andreas Heuermann, 4 years ago

This needs to be changed for all non-linear loops and involves changes to the structure we are using to save the data for the non-linear loops. Doing this in a good way needs some time.

This should be done in a clean up session for all the non-linear solvers to improve documentation and reduce duplicate code and have a common interface for all functions of the same type.

comment:4 by Francesco Casella, 4 years ago

@AnHeuermann, do you think this is doable for the 1.18.0 beta release (July 2021)?

comment:5 by Andreas Heuermann, 4 years ago

The easy solution would be to put a try-catch block in the code snipper above. That solves the issue and is done in a few minutes.

The proper solution needs maybe 1-3 days.
So yes, if we want to fix it we can have it in 1.18.0.

in reply to:  5 comment:6 by Francesco Casella, 4 years ago

Replying to AnHeuermann:

The easy solution would be to put a try-catch block in the code snipper above. That solves the issue and is done in a few minutes.

If that is real quick, could you do that, and then we can lower the priority for the proper solution?

comment:7 by Tin Rabuzin <trabuzin@…>, 4 years ago

I'm not sure if this is related, but sometimes I get the following fault with KINSOL:

Limited backtrace at point of segmentation fault
/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7fa6ce1fd3c0]
/usr/bin/../lib/x86_64-linux-gnu/omc/libsundials_kinsol.so.5(KINFree+0x4)[0x7fa6cdc0bf19]
/usr/bin/../lib/x86_64-linux-gnu/omc/libSimulationRuntimeC.so(nlsKinsolFree+0xd)[0x7fa6ced83ffe]
/usr/bin/../lib/x86_64-linux-gnu/omc/libSimulationRuntimeC.so(freeNonlinearSystems+0x1d8)[0x7fa6ced6f769]
/usr/bin/../lib/x86_64-linux-gnu/omc/libSimulationRuntimeC.so(_main_SimulationRuntime+0x96)[0x7fa6cedbf643]
/tmp/omc_xiej3di2/model(main+0x197)[0x40bcf4]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fa6ce01b0b3]
/tmp/omc_xiej3di2/model(_start+0x2e)[0x40952e]
Segmentation fault (core dumped)

I'll try to create a minimal example.

comment:8 by Francesco Casella, 3 years ago

Milestone: 1.18.0

Ticket retargeted after milestone closed

Note: See TracTickets for help on using tickets.