Opened 4 years ago

Last modified 3 years ago

#6351 reopened defect

Severe memory management issue in Ubuntu

Reported by: Francesco Casella Owned by: Martin Sjölund
Priority: critical Milestone:
Component: Backend Version: 1.16.0
Keywords: Cc: Adrian Pop, Per Östlund, Karim Adbdelhak, Andreas Heuermann

Description

Please consider the attached test case, which handles a 116.000 equations model of a power grid. The model is large, but definitely not huge, and it should be handled by the current OMC.

When running the Test_N_6_M_4.mos script under Linux Ubuntu 18.04 LTS, the code generation process aborts with the following error message:

mmap(PROT_NONE) failed
Aborted (core dumped)

Note that the same model is run successfully under Windows 10.

By adding

setCommandLineOptions("-d=exectstat,optdaedump");getErrorString();

to the script before the simulate() command, I got some more intel about what goes wrong.

The log file reports

pre-optimization done.
...
Index reduction done.

########################################
created initial system
########################################
...
BackendDAEType: initialization

Known variables only depending on parameters and constants - globalKnownVars (75826)
...
########################################
partitioned initial system
########################################
...
BackendDAEType: initialization


Known variables only depending on parameters and constants - globalKnownVars (75826)

the crash happens at the end of the list of known variables.

Unfortunately the buffer is not flushed when the crash takes place, so I can't see the last lines, but I guess the crash takes place during the analyzeInitialSystem phase.

Currently this issue prevents me to test models above 50.000 equations. This is too bad, considering that four years ago we tested systems with 750.000 equations and almost 1.000.000 initial equations.

Please investigate.

Attachments (2)

TestGrid.tar.gz (1.0 MB ) - added by Francesco Casella 4 years ago.
TestGrid_N_16.zip (1.1 MB ) - added by Francesco Casella 4 years ago.

Change History (39)

by Francesco Casella, 4 years ago

Attachment: TestGrid.tar.gz added

comment:1 by Francesco Casella, 4 years ago

@sjoelund.se pointed to https://github.com/ivmai/bdwgc/issues/334

It's not clear to me what kind of actions should we take.

comment:2 by Per Östlund, 4 years ago

For me it crashes on this line, about 20 lines after analyzeInitialSystem is called. It seems to be completely consistent in that it always crashes on that line for me, so it seems it's hitting some hard limit rather than running out of memory (fragmentation can cause issues with large allocations even if there seems to be enough free memory).

The line in question seems to be making a copy of the entire initialization system for use by some homotopy stuff here. The copy can happen in some cases where it's not needed, I made PR 7114 to fix that, but it doesn't apply to this case unfortunately.

I'm not sure if the copying itself is the actual problem though, it might just be the last straw. Someone with more knowledge about the GC probably needs to take a closer look at the issue.

in reply to:  2 ; comment:3 by Francesco Casella, 4 years ago

Replying to perost:

For me it crashes on this line, about 20 lines after analyzeInitialSystem is called. It seems to be completely consistent in that it always crashes on that line for me, so it seems it's hitting some hard limit rather than running out of memory (fragmentation can cause issues with large allocations even if there seems to be enough free memory).

Yes, that sound reasonable to me as well.

I ran blame on the source code in the neighbourhood, it's mostly 3 to 7 years old, so it doesn't seem like a recent regression.

However, whenever we tried out large power grid models, we always used daeMode, and until #5806 was fixed a few days ago, this part of the code was never called with such huge models. I guess that explains why this issue only popped up now.

It strange that it doesn't show up with Windows, apparently GC works better there.

The line in question seems to be making a copy of the entire initialization system for use by some homotopy stuff here.

If homotopy is used, it is convenient to exploit the structural simplifications often made possible by the equations when lambda = 0. For this purpose, we make a copy of the initial equations and generate a separate set from there, labelled initial-lambda0, by substituting homotopy with its simplified argument. Then, this is passed through the usual optimization steps.

The copy can happen in some cases where it's not needed, I made PR 7114 to fix that, but it doesn't apply to this case unfortunately.

Yep, for this model you really need the initial-lambda0 equations.

I'm not sure if the copying itself is the actual problem though, it might just be the last straw. Someone with more knowledge about the GC probably needs to take a closer look at the issue.

I'm not sure either. I wonder if the implementation of copyBackendDAE is as efficent as possible in terms of memory usage patterns. Would it be possible to step in during debugging and check which of the called functions actually fail, or get a stack trace from the core dump? Also, could you have a look at it and report if you see something that is obviously going to be inefficient?

in reply to:  3 comment:4 by Per Östlund, 4 years ago

Replying to casella:

I wonder if the implementation of copyBackendDAE is as efficent as possible in terms of memory usage patterns. Would it be possible to step in during debugging and check which of the called functions actually fail, or get a stack trace from the core dump? Also, could you have a look at it and report if you see something that is obviously going to be inefficient?

I started to write an answer saying that it's just doing a lot of shallow array copies, but then I noticed something weird. The backend DAE consists of two parts: a list of equation systems and a structure with shared data common to all the equation systems.

copyBackendDAE uses the mapEqSystem function to do the copying, which maps over each equation system and passes along the shared data to each function call. copyEqSystemAndShared then did what it's name suggests: it made a copy of an equation system and the shared data. This means that copyBackendDAE made a copy of the shared data for each equation system, overwriting the previous copy each time.

The model crashed on your model after about 6000 equation systems, which would be a massive amount of unnecessary copying. The fix is trivial, just copy the shared data once. I opened PR 7115 to fix that.

Once that was fixed I instead got a stack overflow in BackendDAEOptimize.warnAboutVars that constructed a string in a non-tail recursive way, which I also fixed in the PR. Now I get a stack overflow in BackendDAEOptimize.listAllIterationVariables2 instead, also because it's not tail recursive. That one is not quite trivial to fix, so it'll have to wait to tomorrow.

comment:5 by Francesco Casella, 4 years ago

@perost, thanks for the late-night analysis and for the fixes, I hope that works out well!

Regarding stack overflows, in the past I had to set ulimit -s 65535 to get large grid models going. Of course avoiding ludicrous stack use would be better, but you can also consider this workaround for the time being.

Also consider that the future backend will have to avoid this kind of huge sets of individual equations; we'll have to find ways to group them together and use arrays and loops werever possible, and use smart structural analysis algorithms that avoid the need of expanding the equation set to scalars. So, it's probably not worth spending to much time to uber-optimize this code.

comment:6 by Per Östlund, 4 years ago

I fixed the listAllIterationVariables functions in PR 7117. The model now compile successfully, but when I ran the script it failed during simulation after a couple of seconds with:

ScalableTestGrids.Models.Type1_N_6_M_4: malloc.c:2394: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.

This seems to be coming from the OS malloc rather than our GC, and the internet suggests it's probably due to corrupted memory. When I run the simulation executable manually it runs longer, but I don't know if it hangs or just takes a very long time. Running it in valgrind just kills the process after a while with no explanation why.

Running it with gdb also either hangs or takes a very long time, but everytime I breaked the process it was doing the same thing. So it might be it's just slow and takes longer to run than I have patience for. The stack trace for where it seems to be spending the majority of the time is:

0x00007ffff7e984e3 in solveSystemWithTotalPivotSearch (n=13987, x=0x55559097f2c0, A=0x7ffaea2a1010, indRow=0x555590a07ca0, indCol=0x555590a15740, 
    pos=0x7fffffffa448, rank=0x7fffffffa444, casualTearingSet=0) at ./simulation/solver/nonlinearSolverHomotopy.c:1153
1153          A[indRow[k] + indCol[j]*n] = A[indRow[k] + indCol[j]*n] + hValue*A[indRow[i] + indCol[j]*n];
(gdb) bt
#0  0x00007ffff7e984e3 in solveSystemWithTotalPivotSearch (n=13987, x=0x55559097f2c0, A=0x7ffaea2a1010, indRow=0x555590a07ca0, indCol=0x555590a15740, 
    pos=0x7fffffffa448, rank=0x7fffffffa444, casualTearingSet=0) at ./simulation/solver/nonlinearSolverHomotopy.c:1153
#1  0x00007ffff7e99647 in solveHomotopy (data=0x7fffffffd920, threadData=0x7fffffffdb18, sysNumber=650)
    at ./simulation/solver/nonlinearSolverHomotopy.c:2226
#2  0x00007ffff7ea518e in solveNLS (data=0x7fffffffd920, threadData=0x7fffffffdb18, sysNumber=650) at ./simulation/solver/nonlinearSystem.c:854
#3  0x00007ffff7ea58f7 in solve_nonlinear_system (data=0x7fffffffd920, threadData=0x7fffffffdb18, sysNumber=650)
    at ./simulation/solver/nonlinearSystem.c:1029
#4  0x00005555572ce0af in ScalableTestGrids_Models_Type1_N_6_M_4_eqFunction_140818 ()
#5  0x00005555575c25ad in ScalableTestGrids_Models_Type1_N_6_M_4_functionInitialEquations_lambda0 ()

Maybe that gives some clues about what it's doing.

in reply to:  6 ; comment:7 by Francesco Casella, 4 years ago

Replying to perost:

I fixed the listAllIterationVariables functions in PR 7117. The model now compile successfully, but when I ran the script it failed during simulation after a couple of seconds with:

ScalableTestGrids.Models.Type1_N_6_M_4: malloc.c:2394: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.

This seems to be coming from the OS malloc rather than our GC, and the internet suggests it's probably due to corrupted memory.

Hmm. "Corrupted memory" doesn't look nice.

When I run the simulation executable manually it runs longer, but I don't know if it hangs or just takes a very long time.

It is taking a much longer time because by default it uses the dense solver, which takes forever on such a large model. You should run the simulation executable with flag -nls=kinsol, which is set in a __OpenModelica_simulationFlags() annotation. I guess this is lost if you run the simulation executable manually.

Running it in valgrind just kills the process after a while with no explanation why.

Running it with gdb also either hangs or takes a very long time, but everytime I breaked the process it was doing the same thing.

I.e., factoring the huge dense matrix. This is expected behaviour. Please try with -nls=kinsol.

In fact, for models such as this one, the choice of solver should be automatic, that's the subject of ticket #6342. We'll try to sort this out for 1.18.0.

in reply to:  7 comment:8 by Per Östlund, 4 years ago

Replying to casella:

It is taking a much longer time because by default it uses the dense solver, which takes forever on such a large model. You should run the simulation executable with flag -nls=kinsol, which is set in a __OpenModelica_simulationFlags() annotation. I guess this is lost if you run the simulation executable manually.

Yes, I forgot that annotations can influence the simulation settings. Running it with -nls=kinsol gives the same behaviour as running the script, i.e. it crashes within a few seconds with the mentioned malloc assertion.

I ran it through valgrind and get lots of invalid writes and reads, the first being this one:

==1375111== Invalid write of size 8
==1375111==    at 0x6FEB1BF: klu_l_defaults (KLU/Source/klu_defaults.c:54)
==1375111==    by 0x67F156F: SUNLinSol_KLU (src/sunlinsol/klu/sunlinsol_klu.c:146)
==1375111==    by 0x56E888C: resetKinsolMemory (kinsolSolver.c:194)
==1375111==    by 0x56E8548: nlsKinsolAllocate (kinsolSolver.c:260)
==1375111==    by 0x56C8EA9: initializeNonlinearSystems (nonlinearSystem.c:458)
==1375111==    by 0x575E35F: initRuntimeAndSimulation (simulation_runtime.cpp:1077)
==1375111==    by 0x575EAD9: _main_SimulationRuntime (simulation_runtime.cpp:1152)
==1375111==    by 0x167B8CC: main
==1375111==  Address 0x7c1d9a10 is 0 bytes after a block of size 224 alloc'd
==1375111==    at 0x55D277F: malloc (vg_replace_malloc.c:307)
==1375111==    by 0x67F14C5: SUNLinSol_KLU (src/sunlinsol/klu/sunlinsol_klu.c:118)
==1375111==    by 0x56E888C: resetKinsolMemory (kinsolSolver.c:194)
==1375111==    by 0x56E8548: nlsKinsolAllocate (kinsolSolver.c:260)
==1375111==    by 0x56C8EA9: initializeNonlinearSystems (nonlinearSystem.c:458)
==1375111==    by 0x575E35F: initRuntimeAndSimulation (simulation_runtime.cpp:1077)
==1375111==    by 0x575EAD9: _main_SimulationRuntime (simulation_runtime.cpp:1152)
==1375111==    by 0x167B8CC: main

The line at klu_default.c:54 is just:

    Common->work = 0 ;    /* work done by btf_order */

Address 0x7c1d9a10 is 0 bytes after a block of size 224 alloc'd from valgrind means that Common->work refers to an address just after the block that was actually allocated, so it seems the issue might be that not enough memory was allocated for Common by SUNLinSol_KLU. Both KLU and sunlinsol define their own data types, so it might be that they don't agree on the size of some data type.

I tried to add some debug printouts to the code, but couldn't get my changes to have any effect. Maybe someone more familiar with this part of the simulation runtime can have a look.

comment:9 by Andreas Heuermann, 4 years ago

That looks like something I changed last year.

There is some old comment in kinsolSolver.c (maybe) reffering to that

      /* TODO: Is this still true? */
      /* It seems if we don't free KINSol on every iteration, it leaks memory.
       * But if we reset KINSol, it takes an enormous amount of time...
       */
      if (0) { /* Save time; leak memory */
        resetKinsolMemory(kinsolData, nlsData);
      } else {
        /* reset configuration settings */
        nlsKinsolConfigSetup(kinsolData);
      }

We could try to resetKinsolMemory on each step.

comment:10 by Andreas Heuermann, 4 years ago

Also we are using a very recent version of sundials kinsol (2020), but a very old klu (2007). Maybe we should consider updating SuiteSparse. But I get nightmares just thinking of it. My last update in 3rdParty took ages.

Last edited 4 years ago by Andreas Heuermann (previous) (diff)

in reply to:  9 comment:11 by Francesco Casella, 4 years ago

Replying to AnHeuermann:

That looks like something I changed last year.

There is some old comment in kinsolSolver.c (maybe) reffering to that

      /* TODO: Is this still true? */
      /* It seems if we don't free KINSol on every iteration, it leaks memory.
       * But if we reset KINSol, it takes an enormous amount of time...
       */
      if (0) { /* Save time; leak memory */
        resetKinsolMemory(kinsolData, nlsData);
      } else {
        /* reset configuration settings */
        nlsKinsolConfigSetup(kinsolData);
      }

I remember this, and to be honest I never fully understood the status. Obviously resetting KINSol each time didn't make sense then and doesn't make sense now, if there is a memory leak we should investigate with the authors of Kinsol why that is the case and get it fixed upstream. In any case, a memory leak is one thing, and an invalid memory access is another.

We could try to resetKinsolMemory on each step.

We can do it temporarily to see if it resolves the issue, in order to gain some intel. I am strongly against this solution as a permanent feature, since it would kill the performance of simulations using the kinsol solver to solve algebraic loops and then some ODE solver for the differential part. It is not the case of power grids (which use IDA that calls Kinsol internally), but it would be the case, e.g., for thermo-hydraulic grids with incompressible fluid.

Last edited 4 years ago by Francesco Casella (previous) (diff)

in reply to:  10 comment:12 by Francesco Casella, 4 years ago

Replying to AnHeuermann:

Also we are using a very recent version of sundials kinsol (2020), but a very old klu (2007). Maybe we should consider updating SuiteSparse. But I get nightmares just thinking of it. My last update in 3rdParty took ages.

I'm not sure the problem is with KLU.

The debug report from @perost refers to KLU/Source/klu_defaults.c:54. I'm not sure where this file is in the OpenModelica sources, but I managed to find the corresponding .h file in my installation (C:\Program Files\OpenModelica1.17.0-dev-64bit\include\omc\c\sundials\sunlinsol), and it says copyright 2020, so it's not old stuff. Maybe there is bug there, or maybe we are not calling this function correctly from resetKinsolMemory or from some other upstream C function that we wrote?

in reply to:  10 comment:13 by Francesco Casella, 4 years ago

Replying to AnHeuermann:

Also we are using a very recent version of sundials kinsol (2020), but a very old klu (2007).

Wow. Do you mean we are still using SparseSuite 3.0.0? The current documentation of Kinsol says that Sundials was tested with SparseSuite 5.7.2 (!). I guess it's time to update it, particularly if we experience memory leaks and other memory management issues. BTW, newer versions may be significantly faster.

Up-to-date sparse solver support is essential for us on the front of large-scale systems and of power grid models.

@AnHeuermann, what do you think?

Last edited 4 years ago by Francesco Casella (previous) (diff)

comment:14 by Andreas Heuermann, 4 years ago

I can try to update SuiteSparse and hope it is easier than the previous SUNDIALS update. I hope they didn't change the API as well. But it hast to wait a bit until I can make time for it.

Also it was only I a guess and I can't say if it related to this ticket. But it should only be positive to update klu.

comment:15 by Francesco Casella, 4 years ago

I guess the scope is more limited. The API apparently hasn't changed, otherwise Sundials won't work at all after the upgrade, so I guess it should be straightforward. Keeping 16+ years old software in OMC which is not in synch with sundials is really not an option.

It's not urgent (though I'd like to have it sooner than later for RTE-related projecs), but it should be done for 1.18.0

Last edited 4 years ago by Francesco Casella (previous) (diff)

comment:16 by Andreas Heuermann, 4 years ago

I'll update SuiteSparse, see #6362.

comment:17 by Andreas Heuermann, 4 years ago

SuiteSparse is now updated. @perost can you check if this solved the issue?

in reply to:  17 comment:18 by Francesco Casella, 4 years ago

Replying to AnHeuermann:

SuiteSparse is now updated. @perost can you check if this solved the issue?

I'll take care of that.

comment:19 by Francesco Casella, 4 years ago

Resolution: fixed
Status: newclosed

I just ran Test_N_6_M_4.mos after updating to the latest nightly, and it ran fine. I looks like the issue has now been resolved. Thanks Andreas!

I'll run even larger tests and check what happens.

comment:20 by Francesco Casella, 4 years ago

Resolution: fixed
Status: closedreopened

There are improvements, but the problem is still there.

I first tried to run a batch of simulations. It is a simple bash script

ulimit -s 65000
omc Test_N_2_M_4.mos > log_test.txt
omc Test_N_3_M_4.mos >> log_test.txt
omc Test_N_4_M_4.mos >> log_test.txt
omc Test_N_6_M_4.mos >> log_test.txt
omc Test_N_8_M_4.mos >> log_test.txt
omc Test_N_11_M_4.mos >> log_test.txt

that runs several .mos scripts in sequence and piles up the results in a log file. Each .mos file loads the models, sets some flag and runs simulate

loadModel(Modelica);getErrorString();
loadFile("../GIT/PowerGrids/PowerGrids/package.mo");getErrorString();
loadFile("../GIT/ScalableTestGrids/ScalableTestGrids/package.mo");getErrorString();
cd("Temp");getErrorString();
simulate(ScalableTestGrids.Models.Type1_N_2_M_4, cflags="-O0");getErrorString();

All tests until N_8_M_4 went fine and scaled more or less linearly on all phases, as expected. The N_11_M_4 failed with mmap(PROT_NONE) failed and dumped core, apparently somewhere during the first phases of compilation, because no .c file was produced, and no output was sent to the log file, or if it did, the buffer was not flushed before the core dump. I saw a python script working for a few minutes, so the core dump should have been transmitted to the home base. I guess it was huge, maybe you can do some post-mortem analysis on it.

Shortly before the crash, htop reported about 16 GB used out of 72 available used, so there was still plenty of memory available. Of course I have no idea about fragmentation.

I then re-ran Test_N_11_M_4.mos >> log.txt (400.000 equations) outside the bash script, and this time it went fine. Not sure why. I understand Linux does not release the memory until the process is ended, but I guess this doesn't hold for the entire duration of the bash script; I'm starting separate omc processes sequentially in that script, so I'm a bit puzzled.

I'm now running Test_N_16_M_4.mos (800.000 equations), let's see how it works. At some point, we should also see larger models in this report, though I'm afraid that memory and time limits will not allow to go beyond Type1_N_6_M_4. Unless @sjoelund.se gives it a bit more slack, that is :)

I guess the 400.000 equations is currently the limit for practical applications, even though there are some cases where you may be willing to go a bit further and wait for one or two hours of compilation, provided that the simulation is fast. A notable case was reported in this paper, that we were able to write a few years ago. In that case the simulation was run a lot of times and was really fast, compared to CFD, so waiting for the compilation time was not a big deal.

I understand the way out of this problem in the long term is array-based code generation, but that will take some time, and it would be a shame that we have a regression compared to what we could accomplish in 2016.

Last edited 4 years ago by Francesco Casella (previous) (diff)

comment:21 by Francesco Casella, 4 years ago

OK, the N_16_M_4 case dumped core after just a few minutes of compilation, most likely during frontend or backend processing.

I attached TestGrid_N_16.zip with everything that is needed to reproduce the issue. Just run the .mos script.

@perost, @sjoelund.se, could you try this with valgrind and try to figure out what goes wrong? Maybe there is some new function in the frontend or backend that is particularly bad with memory fragmentation.

by Francesco Casella, 4 years ago

Attachment: TestGrid_N_16.zip added

in reply to:  21 ; comment:22 by Per Östlund, 4 years ago

Replying to casella:

OK, the N_16_M_4 case dumped core after just a few minutes of compilation, most likely during frontend or backend processing.

I got a stack overflow in SynchronousFeatures.partitionIndependentBlocksEq. It's non-tail recursive, so I guess is recurses too deep and runs out of stack. But rewriting it to be tail recursive seems non-trivial, so someone who knows what that code is doing should probably take a look at it.

comment:23 by Martin Sjölund, 4 years ago

I guess I could take a look.

in reply to:  22 comment:24 by Francesco Casella, 4 years ago

Replying to perost:

Replying to casella:

OK, the N_16_M_4 case dumped core after just a few minutes of compilation, most likely during frontend or backend processing.

I got a stack overflow in SynchronousFeatures.partitionIndependentBlocksEq. It's non-tail recursive, so I guess is recurses too deep and runs out of stack. But rewriting it to be tail recursive seems non-trivial, so someone who knows what that code is doing should probably take a look at it.

I forgot to mention that I always run ulimit -s 65500 to avoid that.

comment:25 by Francesco Casella, 4 years ago

BTW, the issue persists also in Ubuntu 20.04, so I changed the subject.

comment:26 by Francesco Casella, 4 years ago

Summary: Severe memory management issue in Ubuntu 18.04 LTSSevere memory management issue in Ubuntu

comment:27 by Martin Sjölund, 4 years ago

@casella: I can't run the model on the 32GB RAM I have at home (not even the new frontend). Do you have some spare sticks to send to Sweden? :D

comment:28 by Martin Sjölund, 4 years ago

I tried creating a flat model on ripper1 and sent myself that model, but it's so large it won't even parse :D

comment:29 by Francesco Casella, 4 years ago

I can give you access to our server, then you can play around as you wish. The preferred authentication is by certificate, can you please send me public SSH key?

comment:30 by Martin Sjölund, 4 years ago

I'd prefer some sticks. Otherwise, asap, ripper1, ripper2 and lib have 128GB each. It takes a lot longer to modify the compiler when you need to send it to a server.

comment:31 by Martin Sjölund, 4 years ago

https://patchwork.kernel.org/project/linux-fsdevel/patch/23066.59196.909026.689706@gargle.gargle.HOWL/

I tried the model with sudo sysctl -w vm.max_map_count=262144, but it did not help. However, the following gets further:

marsj@omodripper:~/TestGrid$ (ulimit -s 2048 -v 90000000 ; GC_INITIAL_HEAP_SIZE=80000000000 /usr/bin/time -v ~/OpenModelica/build/bin/omc a.mos) 
true
true
"[/home/marsj/.openmodelica/libraries//Modelica 3.2.3+maint.om:0:0-0:0:readonly] Notification: Modelica 3.2.3+maint.om will not be installed since version 3.2.3+maint.om is installed.
Notification: Automatically loaded package Modelica 3.2.3 due to uses annotation.
Notification: Automatically loaded package Complex 3.2.3 due to uses annotation.
Notification: Automatically loaded package ModelicaServices 3.2.3 due to uses annotation.
"
true
""
false
"Notification: Performance of FrontEnd - loaded program: time 0.0004441/0.0004442, allocations: 12 kB / 0.8532 GB, free: 73.65 GB / 74.51 GB
Notification: Performance of FrontEnd - Absyn->SCode: time 0.2706/0.2711, allocations: 164.7 MB / 1.014 GB, free: 73.49 GB / 74.51 GB
Notification: Performance of NFInst.instantiate(ScalableTestGrids.Models.Type1_N_16_M_4): time 24.77/25.04, allocations: 19.4 GB / 20.42 GB, free: 54.04 GB / 74.51 GB
Notification: Performance of NFInst.instExpressions: time 18.98/44.02, allocations: 7.718 GB / 28.14 GB, free: 46.3 GB / 74.51 GB
Notification: Performance of NFInst.updateImplicitVariability: time 4.032/48.05, allocations: 79.94 MB / 28.21 GB, free: 46.22 GB / 74.51 GB
Notification: Performance of NFTyping.typeComponents: time 4.131/52.18, allocations: 0.8285 GB / 29.04 GB, free: 45.38 GB / 74.51 GB
Notification: Performance of NFTyping.typeBindings: time 13.04/65.22, allocations: 2.841 GB / 31.88 GB, free: 42.54 GB / 74.51 GB
Notification: Performance of NFTyping.typeClassSections: time 3.151/68.38, allocations: 0.5851 GB / 32.47 GB, free: 41.95 GB / 74.51 GB
Notification: Performance of NFFlatten.flatten: time 15.7/84.08, allocations: 5.751 GB / 38.22 GB, free: 36.19 GB / 74.51 GB
Notification: Performance of NFFlatten.resolveConnections: time 2.825/86.9, allocations: 0.6568 GB / 38.88 GB, free: 35.53 GB / 74.51 GB
Notification: Performance of NFEvalConstants.evaluate: time 5.907/92.81, allocations: 2.039 GB / 40.92 GB, free: 33.48 GB / 74.51 GB
Notification: Performance of NFSimplifyModel.simplify: time 5.739/98.55, allocations: 2.123 GB / 43.04 GB, free: 31.35 GB / 74.51 GB
Notification: Performance of NFPackage.collectConstants: time 1.167/99.72, allocations: 120.9 MB / 43.16 GB, free: 31.23 GB / 74.51 GB
Notification: Performance of NFFlatten.collectFunctions: time 3.042/102.8, allocations: 373.4 MB / 43.52 GB, free: 30.87 GB / 74.51 GB
Notification: Performance of NFScalarize.scalarize: time 1.498/104.3, allocations: 0.4936 GB / 44.02 GB, free: 30.37 GB / 74.51 GB
Notification: Performance of NFVerifyModel.verify: time 2.537/106.8, allocations: 0.7904 GB / 44.81 GB, free: 29.58 GB / 74.51 GB
Notification: Performance of NFConvertDAE.convert: time 10.08/116.9, allocations: 4.038 GB / 48.84 GB, free: 25.53 GB / 74.51 GB
Notification: Performance of FrontEnd - DAE generated: time 4.208e-06/116.9, allocations: 0 / 48.84 GB, free: 25.53 GB / 74.51 GB
Notification: Performance of FrontEnd: time 2.985e-06/116.9, allocations: 0 / 48.84 GB, free: 25.53 GB / 74.51 GB
Notification: Performance of Transformations before backend: time 0.1857/117.1, allocations: 0 / 48.84 GB, free: 25.53 GB / 74.51 GB
Notification: Performance of Generate backend data structure: time 29.62/146.7, allocations: 5.016 GB / 53.86 GB, free: 20.49 GB / 74.51 GB
Notification: Performance of prepare preOptimizeDAE: time 7.51e-05/146.7, allocations: 8.031 kB / 53.86 GB, free: 20.49 GB / 74.51 GB
Notification: Performance of preOpt normalInlineFunction (simulation): time 4.222/150.9, allocations: 0.6173 GB / 54.48 GB, free: 19.87 GB / 74.51 GB
Notification: Performance of preOpt evaluateParameters (simulation): time 10.7/161.6, allocations: 2.234 GB / 56.71 GB, free: 17.62 GB / 74.51 GB
Notification: Performance of preOpt simplifyIfEquations (simulation): time 0.5816/162.2, allocations: 246.9 MB / 56.95 GB, free: 17.38 GB / 74.51 GB
Notification: Performance of preOpt expandDerOperator (simulation): time 1.3/163.5, allocations: 325.7 MB / 57.27 GB, free: 17.07 GB / 74.51 GB
Notification: Performance of preOpt clockPartitioning (simulation): time 21.95/185.4, allocations: 8.06 GB / 65.33 GB, free: 6.454 GB / 74.51 GB
Notification: Performance of preOpt findStateOrder (simulation): time 0.1913/185.6, allocations: 6.791 MB / 65.34 GB, free: 6.447 GB / 74.51 GB
Notification: Performance of preOpt replaceEdgeChange (simulation): time 0.9491/186.6, allocations: 133.6 MB / 65.47 GB, free: 6.316 GB / 74.51 GB
Notification: Performance of preOpt inlineArrayEqn (simulation): time 0.193/186.8, allocations: 80.17 MB / 65.55 GB, free: 6.238 GB / 74.51 GB
Notification: Performance of preOpt removeEqualRHS (simulation): time 13.77/200.5, allocations: 3.182 GB / 68.73 GB, free: 3.054 GB / 74.51 GB
Warning: The model contains alias variables with redundant start and/or conflicting nominal values. It is recommended to resolve the conflicts, because otherwise the system could be hard to solve. To print the conflicting alias sets and the chosen candidates please use -d=aliasConflicts.
Notification: Performance of preOpt removeSimpleEquations (simulation): time 107.4/307.9, allocations: 20.87 GB / 89.6 GB, free: 16.78 GB / 74.51 GB
[/home/marsj/OpenModelica/OMCompiler/Compiler/BackEnd/DAEMode.mo:155:5-155:157:writable] Error: Internal error Stack overflow in DAEMode.getEqSystemDAEmode...
[bt] #1 /home/marsj/OpenModelica/build/bin/../lib/x86_64-linux-gnu/omc/libOpenModelicaRuntimeC.so(mmc_setStacktraceMessages_threadData)
[bt] #2 /home/marsj/OpenModelica/build/bin/../lib/x86_64-linux-gnu/omc/libOpenModelicaRuntimeC.so(mmc_do_stackoverflow)
[bt] #3 /home/marsj/OpenModelica/build/bin/../lib/x86_64-linux-gnu/omc/libOpenModelicaCompiler.so(BackendVarTransform.replaceCrefSubs2)
[bt] #4...6 /home/marsj/OpenModelica/build/bin/../lib/x86_64-linux-gnu/omc/libOpenModelicaCompiler.so(BackendVarTransform.replaceCrefSubs)
[bt] #7 /home/marsj/OpenModelica/build/bin/../lib/x86_64-linux-gnu/omc/libOpenModelicaCompiler.so(BackendVarTransform.replaceExp)
[bt] #8 /home/marsj/OpenModelica/build/bin/../lib/x86_64-linux-gnu/omc/libOpenModelicaCompiler.so(BackendVarTransform.makeTransitive2)
[bt] #9 /home/marsj/OpenModelica/build/bin/../lib/x86_64-linux-gnu/omc/libOpenModelicaCompiler.so(BackendVarTransform.makeTransitive)
[bt] #10 /home/marsj/OpenModelica/build/bin/../lib/x86_64-linux-gnu/omc/libOpenModelicaCompiler.so(BackendVarTransform.addReplacement)
[bt] #11...1024 /home/marsj/OpenModelica/build/bin/../lib/x86_64-linux-gnu/omc/libOpenModelicaCompiler.so(CommonSubExpression.commonSubExpressionUpdate)
[bt] #1025 [...]
"
        Command being timed: "/home/marsj/OpenModelica/build/bin/omc a.mos"
        User time (seconds): 870.55
        System time (seconds): 22.09
        Percent of CPU this job got: 101%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 14:42.93

I'll retry with a larger stack size to see if it then works...

comment:32 by Martin Sjölund, 4 years ago

 Command being timed: "/home/marsj/OpenModelica/build/bin/omc a.mos"
        User time (seconds): 1803.51
        System time (seconds): 61.68
        Percent of CPU this job got: 117%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 26:24.97
        Exit status: 0

For translateModel. Then compiling the source code with -O0 -g takes another 17 minutes.

Simulation fails because in some places int is used, but a Jacobian is 1017072 in size, which will not fit in 32-bit.

comment:33 by Martin Sjölund, 4 years ago

Anyway, if we fix allocation of the matrix... It still takes forever to do anything with it because the matrix is traversed in the wrong order for most operations.

comment:34 by Francesco Casella, 4 years ago

OK, I'll try to summarize relevant facts I understood so far:

  1. ulimit -s 65500 helps avoiding stack overflows
  2. setting GC_INITIAL_HEAP_SIZE=80000000000 (or to any other large value that is less than the amount of available RAM) pre-allocates a large heap size thus reducing fragmentation
  3. somewhere (@sjoelund.se, could you be more specific?) dense jacobian(s) are generated, which take a huge amount of memory, and also a huge amount of time to be traversed, because they appear to be traversed in the wrong order (column first instead of row first, or the other way round).

It may just be the case that we are running in a lot of trouble simply because of humongous dense jacobians which shouldn't be generated in the first place. In that case, we should fix the root cause, i.e. avoid generating those dense jacobians in the first place, rather than forcing GC to behave.

Also the wrong order traversing could be not a big deal for smaller jacobians, so fixing it may be not that important, though it would be of course nice.

comment:35 by Francesco Casella, 4 years ago

See also #6377.

PR 7164 should address the traversing order issue, at least partially.

comment:36 by Francesco Casella, 4 years ago

Milestone: 1.17.01.18.0

Retargeted to 1.18.0 because of 1.17.0 timed release.

comment:37 by Francesco Casella, 3 years ago

Milestone: 1.18.0

Ticket retargeted after milestone closed

Note: See TracTickets for help on using tickets.