Opened 5 years ago

Last modified 3 years ago

#5775 new defect

memory leak in FMUs using CombiTimeTable

Reported by: gerrer@… Owned by: Lennart Ochel
Priority: blocker Milestone: 1.19.0
Component: FMI Version:
Keywords: CombiTimeTable, memory leak Cc:

Description

Take an extremely simple model with a CombiTimeTable loading a text file.
Export the model in FMU.
Run the FMU 10000 times: the RAM occupation increases until leading to crash (segmentation error or killed process).

The files simulate_fmpy.py and simulate_pyfmi.py in attachment simulate the FMU simple.fmu a few thousands times with the respective libraries FMPY and PyFMI.
Running each of them leads to crash.

Attachments (4)

bug_combiTimeTable.zip (1.8 MB ) - added by gerrer@… 5 years ago.
Contains the model simple.mo that loads a txt file, the corresponding FMU simple.mo, the scripts simulate_fmpy and simulate_pyfmi to simulate the FMU in Python using two different libraries, the file profil.txt that is loaded in the model, and the script generate_txt_file used to create profil.txt
debug.zip (162 bytes ) - added by Andreas Heuermann 4 years ago.
debug.2.zip (4.0 MB ) - added by Andreas Heuermann 4 years ago.
valgrind-out.txt (62.9 KB ) - added by Andreas Heuermann 4 years ago.
Output from Valgrind

Change History (16)

by gerrer@…, 5 years ago

Attachment: bug_combiTimeTable.zip added

Contains the model simple.mo that loads a txt file, the corresponding FMU simple.mo, the scripts simulate_fmpy and simulate_pyfmi to simulate the FMU in Python using two different libraries, the file profil.txt that is loaded in the model, and the script generate_txt_file used to create profil.txt

comment:1 by Francesco Casella, 5 years ago

Milestone: Future1.16.0
Priority: highblocker

Seems like a serious issue, tables are quite likely to be used in FMUs and they shoulnd't cause problems.

comment:2 by Andreas Heuermann, 4 years ago

I could reproduce the error on Ubuntu Focal (in WSL) and notice an segmentation fault after 1017 iterations with fmpy.

This matches the experience I have with a model I am working on which is much bigger and seg faults in the same way on the first simulation.

I changed connect(combiTimeTable.y[10], real_output); to connect(combiTimeTable.y[1], real_output); to get a valid model and build the FMU with:

loadModel(Modelica); getErrorString();
setCommandLineOptions("-d=newInst"); getErrorString();
loadFile("simple.mo"); getErrorString();
buildModelFMU(simple, version="2.0", fmuType="me", platforms={"static"}); getErrorString();

Let's see where it crashed. A super slow run with GDB later....

Run number 1016
stdout            | info    | ... loading "profiltest" from "profil.txt"                                                                                                    Run number 1017

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x00000000005f1b85 in _PyFunction_Vectorcall ()
(gdb) bt
#0  0x00000000005f1b85 in _PyFunction_Vectorcall ()
#1  0x00000000005f3083 in PyObject_CallObject ()
#2  0x00007ffff76a183a in ?? () from /usr/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so
#3  0x00007ffff7fbde06 in ?? () from /lib/x86_64-linux-gnu/libffi.so.7
#4  0x00007ffff7fbe188 in ?? () from /lib/x86_64-linux-gnu/libffi.so.7
#5  0x00007fffe9dadf8b in fmi2Instantiate (instanceName=0x7fffea1778c0 "simple", fmuType=fmi2ModelExchange,
    fmuGUID=0x7fffe9eea3c0 "{eef059e5-7118-4786-85da-965334c9bf7a}", fmuResourceLocation=0x7fffe9ec4287 "/tmp/tmpbvqqouaj/resources", functions=0x7fffe9f395d0, visible=0,
    loggingOn=0) at ./fmi-export/fmu2_model_interface.c.inc:453
#6  0x00007ffff7fbdff5 in ?? () from /lib/x86_64-linux-gnu/libffi.so.7
#7  0x00007ffff7fbd40a in ?? () from /lib/x86_64-linux-gnu/libffi.so.7
#8  0x00007ffff76a768c in _ctypes_callproc () from /usr/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so
#9  0x00007ffff76a7c8c in ?? () from /usr/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so
#10 0x00000000005f118e in PyObject_Call ()
#11 0x0000000000568e1f in _PyEval_EvalFrameDefault ()
#12 0x0000000000565972 in _PyEval_EvalCodeWithName ()
#13 0x00000000005f1d85 in _PyFunction_Vectorcall ()
#14 0x000000000056c475 in _PyEval_EvalFrameDefault ()
#15 0x0000000000565972 in _PyEval_EvalCodeWithName ()
#16 0x000000000050729f in ?? ()
#17 0x0000000000568819 in _PyEval_EvalFrameDefault ()
#18 0x0000000000565972 in _PyEval_EvalCodeWithName ()
#19 0x00000000005f1d85 in _PyFunction_Vectorcall ()
#20 0x000000000056769f in _PyEval_EvalFrameDefault ()
#21 0x0000000000565972 in _PyEval_EvalCodeWithName ()
#22 0x00000000005f1d85 in _PyFunction_Vectorcall ()
#23 0x0000000000568819 in _PyEval_EvalFrameDefault ()
#24 0x0000000000565972 in _PyEval_EvalCodeWithName ()
#25 0x0000000000686053 in PyEval_EvalCode ()
#26 0x00000000006753d1 in ?? ()
#27 0x000000000067544f in ?? ()
#28 0x0000000000675507 in PyRun_FileExFlags ()
#29 0x000000000067758a in PyRun_SimpleFileExFlags ()
#30 0x00000000006ae99e in Py_RunMain ()
#31 0x00000000006aed29 in Py_BytesMain ()
#32 0x00007ffff7de70b3 in __libc_start_main (main=0x4ebd20 <main>, argc=2, argv=0x7fffffffdf58, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
    stack_end=0x7fffffffdf48) at ../csu/libc-start.c:308
#33 0x00000000005f62ee in _start ()

Well that wasn't really helpful. Let's try with OMSimulator and maybe Valgrind next.

comment:3 by Andreas Heuermann, 4 years ago

Doing the same with OMSimulator and a lua script:

for i=1,4000,1 do
    print("Run number ", i)

    oms_setCommandLineOption("--deleteTempFiles=true")
    oms_setTempDirectory("./temp/")
    oms_newModel("model")
    oms_addSystem("model.root", oms_system_sc)
    -- instantiate FMUs
    oms_addSubModel("model.root.system1", "simple.fmu")
    -- simulation settings
    oms_setResultFile("model", "results.mat")
    oms_setStopTime("model", 0.1)
    oms_setFixedStepSize("model.root", 1e-4)
    oms_instantiate("model")
    oms_initialize("model")
    oms_simulate("model")
    oms_terminate("model")
    oms_delete("model")
end

It doesn't crash with a segmentation fault, but is definitely eating all my precious ram.
So I'm fairly sure the reason is independent of Python or FMPy.

by Andreas Heuermann, 4 years ago

Attachment: debug.zip added

comment:4 by Andreas Heuermann, 4 years ago

To run Valgrind I used

valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes --verbose --log-file=valgrind-out.txt OMSimulator simulate_oms.lua
Version 0, edited 4 years ago by Andreas Heuermann (next)

by Andreas Heuermann, 4 years ago

Attachment: debug.2.zip added

by Andreas Heuermann, 4 years ago

Attachment: valgrind-out.txt added

Output from Valgrind

comment:5 by Andreas Heuermann, 4 years ago

I uploaded the results from Valgrind in valgrind-out.txt.

A lot of memory is lost by ModelicaIO.

==24280== 960,000 bytes in 1 blocks are possibly lost in loss record 21 of 21
==24280==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==24280==    by 0x800E49B: readTxtTable (ModelicaIO.c:816)
==24280==    by 0x800E49B: ModelicaIO_readRealTable (ModelicaIO.c:375)
==24280==    by 0x8000EED: readTable (ModelicaStandardTables.c:7245)
==24280==    by 0x8000611: ModelicaStandardTables_CombiTimeTable_init2 (ModelicaStandardTables.c:629)
==24280==    by 0x7FF2E1C: omc_Modelica_Blocks_Types_ExternalCombiTimeTable_constructor (simple_functions.c:149)
==24280==    by 0x7FF482D: simple_eqFunction_18 (OpenModelica/build/lib/omlibrary/Modelica 3.2.3/Blocks/Sources.mo:2184)
==24280==    by 0x7FF4513: simple_updateBoundParameters_0 (simple_08bnd.c:433)
==24280==    by 0x7FF5461: simple_updateBoundParameters (simple_08bnd.c:467)
==24280==    by 0x7FFAD12: fmi2ExitInitializationMode (fmu2_model_interface.c.inc:639)
==24280==    by 0x3E37EC: oms::ComponentFMUME::initialize() (ComponentFMUME.cpp:547)
==24280==    by 0x365006: oms::SystemSC::initialize() (SystemSC.cpp:253)
==24280==    by 0x410EF0: oms::Model::initialize() (Model.cpp:709)
==24280==    by 0x2E55A1: oms_initialize (OMSimulator.cpp:931)
==24280==    by 0x2D5644: OMSimulatorLua_oms_initialize (OMSimulatorLua.c:919)
==24280==    by 0x4B5676: luaD_precall (in /home/andreas/workspace/OpenModelica/OMSimulator/install/linux/bin/OMSimulator)
==24280==    by 0x4C0C54: luaV_execute (in /home/andreas/workspace/OpenModelica/OMSimulator/install/linux/bin/OMSimulator)
==24280==    by 0x4B58C7: luaD_call (in /home/andreas/workspace/OpenModelica/OMSimulator/install/linux/bin/OMSimulator)
==24280==    by 0x4B58F4: luaD_callnoyield (in /home/andreas/workspace/OpenModelica/OMSimulator/install/linux/bin/OMSimulator)
==24280==    by 0x4B4D16: luaD_rawrunprotected (in /home/andreas/workspace/OpenModelica/OMSimulator/install/linux/bin/OMSimulator)
==24280==    by 0x4B5C4E: luaD_pcall (in /home/andreas/workspace/OpenModelica/OMSimulator/install/linux/bin/OMSimulator)
==24280==    by 0x4B307D: lua_pcallk (in /home/andreas/workspace/OpenModelica/OMSimulator/install/linux/bin/OMSimulator)
==24280==    by 0x2E7245: oms_RunFile (OMSimulator.cpp:1101)
==24280==    by 0x4022F4: oms::Flags::SetCommandLineOption(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (Flags.cpp:169)
==24280==    by 0x2DB12F: oms_setCommandLineOption (OMSimulator.cpp:373)
==24280==    by 0x2D260C: main (main.cpp:45)

comment:6 by Andreas Heuermann, 4 years ago

The problem boils down to this:

/*
equation index: 18
type: SIMPLE_ASSIGN
combiTimeTable.tableID = Modelica.Blocks.Types.ExternalCombiTimeTable.constructor(combiTimeTable.tableName, if combiTimeTable.fileName <> "NoName" and not Modelica.Utilities.Strings.isEmpty(combiTimeTable.fileName) then combiTimeTable.fileName else "NoName", {}, combiTimeTable.startTime, {2}, Modelica.Blocks.Types.Smoothness.LinearSegments, Modelica.Blocks.Types.Extrapolation.LastTwoPoints, combiTimeTable.shiftTime, combiTimeTable.timeEvents, combiTimeTable.verboseRead)
*/
OMC_DISABLE_OPT
static void simple_eqFunction_18(DATA *data, threadData_t *threadData)
{
  TRACE_PUSH
  const int equationIndexes[2] = {1,18};
  data->simulationInfo->extObjs[0] = omc_Modelica_Blocks_Types_ExternalCombiTimeTable_constructor(threadData, data->simulationInfo->stringParameter[1] /* combiTimeTable.tableName PARAM */, (((!stringEqual(data->simulationInfo->stringParameter[0] /* combiTimeTable.fileName PARAM */, _OMC_LIT2)) && (!omc_Modelica_Utilities_Strings_isEmpty(threadData, data->simulationInfo->stringParameter[0] /* combiTimeTable.fileName PARAM */)))?data->simulationInfo->stringParameter[0] /* combiTimeTable.fileName PARAM */:_OMC_LIT2), _OMC_LIT3, data->simulationInfo->realParameter[3] /* combiTimeTable.startTime PARAM */, _OMC_LIT4, 1, 2, data->simulationInfo->realParameter[2] /* combiTimeTable.shiftTime PARAM */, data->simulationInfo->integerParameter[4] /* combiTimeTable.timeEvents PARAM */, data->simulationInfo->booleanParameter[2] /* combiTimeTable.verboseRead PARAM */);
  TRACE_POP
}

We call simple_eqFunction_18 inside FMUs multiple times without freeing extObjs[0] first.
It's called from fmi2ExitInitializationMode and this function calls simple_eqFunction_18 from

    comp->fmuData->callback->updateBoundParameters(comp->fmuData, comp->threadData);

and

    if (initialization(comp->fmuData, comp->threadData, "fmi", "", 0.0))

which will call comp->fmuData->callback->updateBoundParameters again.

In addition comp->fmuData->callback->updateBoundParameters will be called from fmi2GetXXX for Real, Int, Boolean and String. Also IDA and CVODE will call it during their residual function in normla C simulation.

I guess we call the updateBoundParameters to often. It only needs to reallocate the CombiTimeTable if the parameters changed.

comment:7 by Andreas Heuermann, 4 years ago

I created a PR https://github.com/OpenModelica/OpenModelica/pull/6824 with an ugly hack, but it should solve the memory leak.

comment:8 by Andreas Heuermann, 4 years ago

It could still be possible to leak memory, when you set some values with fmi2SetXXX during initialization so that bound parameters need to be updated and then call fmi2GetXXX, still during initialization.
But I did some tests and only calling fmiGetXXX during initialization should be no problem.

To solve it completely we would need to change larger portions of the code generation to call the destructor every time before a constructor is called. But there is no information, if a external function call is a constructor or at least I could not find it.

comment:9 by Francesco Casella, 4 years ago

Milestone: 1.16.01.17.0

Retargeted to 1.17.0 after 1.16.0 release

comment:10 by Francesco Casella, 4 years ago

Milestone: 1.17.01.18.0

Rescheduled to 1.18.0

comment:11 by Francesco Casella, 3 years ago

Milestone: 1.18.0

Ticket retargeted after milestone closed

comment:12 by Francesco Casella, 3 years ago

Milestone: 1.19.0

1.18.0 blocker tickets moved to 1.19.0

Note: See TracTickets for help on using tickets.