Opened 4 years ago
Last modified 3 years ago
#6436 new defect
Segmentation fault with non-linear Newton solver
Reported by: | Andreas Heuermann | Owned by: | Andreas Heuermann |
---|---|---|---|
Priority: | high | Milestone: | |
Component: | Run-time | Version: | 1.18.0-dev |
Keywords: | non-linear loop, newton | Cc: |
Description
When running the example from https://trac.openmodelica.org/OpenModelica/ticket/6409 with the newton solver as non-linear solver we get a segmentation fault:
Limited backtrace at point of segmentation fault /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7f0e485c23c0] /home/andreas/workspace/OpenModelica/build/bin/../lib/x86_64-linux-gnu/omc/libSimulationRuntimeC.so(solveNewton+0x3e)[0x7f0e4918097e] /home/andreas/workspace/OpenModelica/build/bin/../lib/x86_64-linux-gnu/omc/libSimulationRuntimeC.so(solveNLS+0xd2)[0x7f0e4915a802] /home/andreas/workspace/OpenModelica/build/bin/../lib/x86_64-linux-gnu/omc/libSimulationRuntimeC.so(solve_nonlinear_system+0x40e)[0x7f0e4915ae7e] [...] Segmentation fault
Since the model isn't public I'll try to solve this or find an example to show the segmentation fault.
The problem is that in solveNewton
in nonlinearSolverNewton.c#L190 solverData
is NULL
, so solverData->ftol
will crash. But I don't know why it is NULL.
Change History (8)
comment:1 by , 4 years ago
comment:2 by , 4 years ago
Okay, the problem is because we are doing something like this:
solverData = nonlinsys->solverData; nonlinsys->solverData = solverData->ordinaryData; success = solveNewton(data, threadData, sysNumber); nonlinsys->solverData = solverData;
and inside solveNewton()
we trigger an assert, throw and jump to the next catch block without reverting nonlinsys->solverData
.
Putting a try-catch-block arround solveNewton() solves this. But this is very error-prone so I will just change it to work without this pointer back and forth moving.
comment:3 by , 4 years ago
This needs to be changed for all non-linear loops and involves changes to the structure we are using to save the data for the non-linear loops. Doing this in a good way needs some time.
This should be done in a clean up session for all the non-linear solvers to improve documentation and reduce duplicate code and have a common interface for all functions of the same type.
comment:4 by , 4 years ago
@AnHeuermann, do you think this is doable for the 1.18.0 beta release (July 2021)?
follow-up: 6 comment:5 by , 4 years ago
The easy solution would be to put a try-catch block in the code snipper above. That solves the issue and is done in a few minutes.
The proper solution needs maybe 1-3 days.
So yes, if we want to fix it we can have it in 1.18.0.
comment:6 by , 4 years ago
Replying to AnHeuermann:
The easy solution would be to put a try-catch block in the code snipper above. That solves the issue and is done in a few minutes.
If that is real quick, could you do that, and then we can lower the priority for the proper solution?
comment:7 by , 4 years ago
I'm not sure if this is related, but sometimes I get the following fault with KINSOL:
Limited backtrace at point of segmentation fault /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7fa6ce1fd3c0] /usr/bin/../lib/x86_64-linux-gnu/omc/libsundials_kinsol.so.5(KINFree+0x4)[0x7fa6cdc0bf19] /usr/bin/../lib/x86_64-linux-gnu/omc/libSimulationRuntimeC.so(nlsKinsolFree+0xd)[0x7fa6ced83ffe] /usr/bin/../lib/x86_64-linux-gnu/omc/libSimulationRuntimeC.so(freeNonlinearSystems+0x1d8)[0x7fa6ced6f769] /usr/bin/../lib/x86_64-linux-gnu/omc/libSimulationRuntimeC.so(_main_SimulationRuntime+0x96)[0x7fa6cedbf643] /tmp/omc_xiej3di2/model(main+0x197)[0x40bcf4] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fa6ce01b0b3] /tmp/omc_xiej3di2/model(_start+0x2e)[0x40952e] Segmentation fault (core dumped)
I'll try to create a minimal example.
A temp fix: https://github.com/OpenModelica/OpenModelica/pull/7328