Opened 7 years ago

Closed 7 years ago

Last modified 6 years ago

#4906 closed defect (fixed)

Memory management is extremely inefficient when running multiple simulations in a script

Reported by: Francesco Casella Owned by: Adrian Pop
Priority: critical Milestone: 1.13.0
Component: Interactive Environment Version:
Keywords: Cc: Per Östlund, Adrian Pop, Martin Sjölund

Description

I ran a script containing these two lines

simulate(ScalableTestSuite.Electrical.DistributionSystemDC.ScaledExperiments.DistributionSystemModelicaIndividual_N_40_M_40, simflags=simflags, variableFilter=".*_[1-4]_[1-2]_.*|.*_10_10_.*");getErrorString();
simulate(ScalableTestSuite.Electrical.DistributionSystemDC.ScaledExperiments.DistributionSystemModelicaIndividual_N_40_M_40, simflags=simflags, variableFilter=".*_[1-4]_[1-2]_.*|.*_10_10_.*");getErrorString();

on a Linux workstation with 20 CPUs and 72 GB of memory, using the latest nightly build.

The execution statistics for the first run were:

    timeFrontend = 16.558262787,
    timeBackend = 6.284129372,
    timeSimCode = 2.21432826,
    timeTemplates = 1.338698727,
    timeCompile = 4.979096652,
    timeSimulation = 1.897257977,
    timeTotal = 33.271994613

while the statistics for the second run were:

    timeFrontend = 1434.521649766,
    timeBackend = 6.789682942,
    timeSimCode = 1.347459057,
    timeTemplates = 1.554605467,
    timeCompile = 5.170174612,
    timeSimulation = 1.904497514,
    timeTotal = 1451.28830158

For some reason, the (old) front-end took almost 100 (!!) times longer to complete its job. Running htop, I noticed that while during the first run the average CPU load was almost 200%, thanks to the GC running in parallel on the other 19 CPUs, during the second run it was always stuck at 100%.

This could be a very serious problem if one wants to write scripts containing more than one simulation run. In fact, I just wanted to to do that to carry out the task described in #4885.

I am also a bit worried that this may also happen when using the various Java, Python, Julia, etc. interfaces to OMC, that may call simulate commands multiple times, with the same omc process running in the background

I guess this has to do with memory fragmentation and garbage collection. Isn't there any way to clean up the used memory once a call to simulate() has been completed, without the need to kill the omc process and restart it anew, so as to avoid this kind of problems?

Change History (13)

comment:1 by Adrian Pop, 7 years ago

I noticed this too in OMEdit when running one simulation with EngineV6 then leave OMEdit for some hours then run it again. I will investigate why this happens, it shouldn't happen. I guess we could run the GC before running the simulate command.

comment:2 by Adrian Pop, 7 years ago

Owner: changed from somebody to Adrian Pop
Status: newaccepted

comment:3 by Francesco Casella, 7 years ago

@adrpo, should I wait to rewrite the testing script for #4885 as a bash or python script that starts omc anew for each simulation until you've tried this out?

comment:4 by Martin Sjölund, 7 years ago

See also #4785. When running N=M=10 with clear() and GC_gcollect_and_unmap() between the calls (and a second loadModel), 40MB more is used on the second run (which takes between 2 and 4 times longer than the first to finish). This could be due to some external "C" code adding memory, but when I was checking #4785 before it seemed like some memory was simply never released. Running with -n=1 does not seem to help; using GC_DONT_GC=1 does not change the behaviour here either, so it might have to do with opening threads or files.

comment:5 by Martin Sjölund, 7 years ago

I initially though there was a difference between translateModel and instantiateModel, but this was just due to the correct execStat not being printed. The error lies somewhere within Inst.instantiateClass, which is one of the functions I hate to look into. The performance is the same for the third and subsequent runs though and is always the same for the new frontend, so it should simply be some global state in the old frontend messing with us; I don't plan to investigate further.

in reply to:  5 comment:6 by Francesco Casella, 7 years ago

Replying to sjoelund.se:

The performance is the same for the third and subsequent runs though and is always the same for the new frontend, so it should simply be some global state in the old frontend messing with us; I don't plan to investigate further.

OK, I'll perform some more tests with the new frontend, and if indeed this problem doesn't show up, we can close this ticket in view of 2.0.0. In the meantime, I'll write my script with a separate system call for each simulation.

comment:7 by Francesco Casella, 7 years ago

@perost, maybe you could also run valgrind and see what you can get out of that, though if the problem is really solved with the NF, I wouldn't bother.

in reply to:  7 comment:8 by Per Östlund, 7 years ago

Replying to casella:

@perost, maybe you could also run valgrind and see what you can get out of that, though if the problem is really solved with the NF, I wouldn't bother.

I'm already doing that, it should be done in an hour or so (if the execution time of the second simulation scales the same as the first under valgrind).

comment:9 by Per Östlund, 7 years ago

Resolution: fixed
Status: acceptedclosed

Fixed in a0d5d3b.

The issue was Inst.releaseInstHashTable which was supposed to release the global hash table used to cache instances. Since there's no way to unset a global root it instead set it to a new hash table of size 1, which was reused by subsequent instantiations. So the performance issue was simply due to the first instantiation using a proper hashtable, and the rest using a glorified linear list (which caused a ridiculous number of calls to Absyn.pathEqual).

I first attempted to fix it by setting the global root to 0, a value that can't be mistaken for a hash table, but that somehow broke the bootstrapping. So instead I just create a new hash table of the normal size, which is a bit wasteful but probably negligible. If we had a way to actually unset a global root that would be a slightly better solution.

comment:10 by Adrian Pop, 7 years ago

We could do unset if we make the global root to be an option of the thing. Then we could set it to NONE().

in reply to:  10 comment:11 by Per Östlund, 7 years ago

Replying to adrpo:

We could do unset if we make the global root to be an option of the thing. Then we could set it to NONE().

It probably doesn't matter either way performance wise.

comment:12 by Adrian Pop, 7 years ago

Yeah, i guess so. Will see how it works now, Francesco can do some testing with his script.

comment:13 by Francesco Casella, 6 years ago

Works like a charm, I've used this many times now to run the scripts that generate the reference data for ScalableTestSuite

Note: See TracTickets for help on using tickets.