Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#4991 closed defect (fixed)

Too many simulations fail with too long a timeout in the TEST_LIBS_FMI_MASTER Hudson job

Reported by: Francesco Casella Owned by: Lennart Ochel
Priority: critical Milestone: 1.13.0
Component: FMI Version:
Keywords: Cc: Martin Sjölund

Description

The job triggered by TEST_LIBS_FMI_MASTER can take up to 14 hrs, see, e.g.

https://test.openmodelica.org/hudson/job/OpenModelica_TEST_LIBS_RIPPER/1581/

which is way too much, since it is run on a daily basis.

I have checked the reports. If you check the Building_latest results, you'll see that about 100 models fail after 480 seconds, which I guess is the timeout. Considering 16 threads that gives about one hour of real time, which is then multiplied by 5 since we are testing five different versions of Buildings. There are probably similar issue with other libraries, which explains the 14 hrs issue.

If you check the regular version of the test, you'll see that most of those 480 s simulations take less than 0.1 s, so there are obviously bugs in the FMI generation/simulation that generate infinite loops.

As an immediate measure, I would suggest to reduce the timeout for the Simulate phase of the master-fmi task to 50 s. With the exception of the ScalableTestSuite (which probably doesn't make sense to run with FMI), there are actually few models in the testsuite that require more than 10 seconds of simulation time, so 50 should be more than adequate.

I guess this measure should already reduce the running time of that task dramatically.

I would then suggest to @lochel to investigate why so many simulation end up in an infinite loop.

Change History (13)

comment:1 by Lennart Ochel, 6 years ago

I guess there are no infinite loops, but just bad performance. Most of the time is probably spend to write the result files. I expect much better performance with my latest changes.

Reducing the timeout is probably also a good idea, if none of the tests is expected to run for that long.

in reply to:  1 comment:2 by Francesco Casella, 6 years ago

Replying to lochel:

I guess there are no infinite loops, but just bad performance. Most of the time is probably spend to write the result files.

That would be weird. Take the very first test in Buildings_latest, Buildings.Air.Systems.SingleZone.VAV.BaseClasses.Validation.ControllerEconomizer. It has 387 variables, and with the standard runtime it simulates in 0.04 seconds. It's true that stopTime is 604800 s, so the result file won't be small, and I have no idea what is the communication interval, but on the other hand a 4 orders of magnitude difference is really huge.

Can you comment on the type of algoritm that you are using to simulate the FMU? Is is implicit/explicit? CS or ME?

comment:3 by Martin Sjölund, 6 years ago

It's IDA Sundials on ME FMUs

comment:4 by Francesco Casella, 6 years ago

So it should basically be the same as DASSL on the standard runtime

comment:5 by Martin Sjölund, 6 years ago

Except the FMUs don't have the same information available. (Coloured Jacobians, etc? I'm not sure what OMSimulator does there)

comment:6 by Lennart Ochel, 6 years ago

It seems that the job takes now less than 2h, right? No it takes a bit longer. But much less than 14h I guess.

Last edited 6 years ago by Lennart Ochel (previous) (diff)

comment:7 by Francesco Casella, 6 years ago

The last run took 11 hrs. Better than 14 hrs, but not that much.

I still see a lot of 480 s timeouts for the Buildings library.

I still recommend reducing the simulation timeout to 50 s ASAP.

comment:8 by Martin Sjölund, 6 years ago

I have added the functionality to change global defaults for jobs now: a4f026

Last edited 6 years ago by Martin Sjölund (previous) (diff)

comment:9 by Francesco Casella, 6 years ago

Resolution: fixed
Status: newclosed

Running time of the master-fmi job with the new 50 s timeout is now around 5 hrs, which is more reasonable. It will become even less when #4992 is closed.

I guess it still needs to be investigated why so many tasks break the timeout, despite the length of the simulation run, this seems to me as a serious problem of the current FMI simulation, but that's a separate issue.

comment:10 by Adrian Pop, 6 years ago

Is not that resonable. I had a look during testing and compiling huge files Model_init_fmu.c with init data takes forever. Maybe it would be better to split that function into several to make it faster.

comment:11 by Adrian Pop, 6 years ago

If I remember correctly ModelicaTest_trunk test compilation was really slow.

comment:12 by Francesco Casella, 6 years ago

@adrpo, the biggest problem here is simulation time, not compilation time, see e.g. the first line in Building_latest.

These models do not have a lot of data; on the other hand they have a StopTime of about 6e5 seconds (one week), which is quite unusual for Modelica test models.

Last edited 6 years ago by Francesco Casella (previous) (diff)

comment:13 by Francesco Casella, 6 years ago

In fact, I may add some tests to the ScalableTestSuite where I only scale StopTime to see what happens. Most test models in the MSL last less than 1000 seconds, so there may be some scalability issues there that we may not be testing properly.

After all, there's nothing wrong at simulating a Building model for a one-year span, or possibly simulating a planetary system model for a million years :)

In fact, a model of the Solar System would make a very nice test case, except that it will be chaotic, so it will probably generate a lot of regressions.

I'll see what I can do.

Note: See TracTickets for help on using tickets.