Opened 5 years ago
Last modified 3 years ago
#5926 new defect
Save a proper result file when interrupting a simulation
Reported by: | Francesco Casella | Owned by: | Lennart Ochel |
---|---|---|---|
Priority: | blocker | Milestone: | 2.0.0 |
Component: | Run-time | Version: | |
Keywords: | Cc: | Karim Adbdelhak, Andreas Heuermann, Niklas Worschech, Mahder Alemseged Gebremedhin, Andrea Bartolini, Martin Sjölund |
Description
Everybody's happy when a simulation terminates successfully: you look at the results and then either they are good for you, or if they are not you can figure out what's wrong, change something, and re-simulate.
However, there are many cases in which simulations get stuck, because of chattering or any other reason. This happens all the time in real life. After some time, you just need to kill the simulation process. Unfortunately, when you do so the process does not die gracefully as it should, and in particular no valid simulation results are saved up to that point, which makes it real hard to figure out what's wrong and apply suitable remedies.
This is a MWE of such a situation:
model Chattering Real x(start = 1, fixed = true); equation der(x) = noEvent(if x > 0 then -1 else 1); annotation(experiment(StopTime = 2)); end Chattering;
If you simulate it in OMEdit, the simulation gets stuck indefinitely at 50% of the simulation time span. When you finally hit the Cancel Simulation button you get
Process crashed Simulation process failed. Exited with code 62097.
and if you try to check the results in the Variables Browser, you get an error window saying:
Corrupt file. nvar 0
I think this behaviour is really not professional. Particularly if it takes place after your precious simulation has been running for half an hour.
This is what we need to happen in case the Cancel Simulation button is pressed:
- A terminate signal is sent to the simulation executable
- The simulation executable stops doing whatever it is doing, saves the simulation results obtained so far, and terminates with an error message like: "Simulation stopped by the user at time = <current_time>".
- The simulation results from StartTime to current_time are loaded in the Variables Browser.
I guess this requires updates to OMEdit as well as to the runtime. I would assign this provisionally to @adeas31, please coordinate with @karim, @AnHeuermann, @mahge930 and @niklwors to come up with a working solution. @sjoelund.se may also be interested, since this has connections with the debugger.
I think this is really a must-have for 2.0.0. It would be nice to have it in 1.16.0 already.
Change History (4)
follow-up: 2 comment:1 by , 5 years ago
comment:2 by , 5 years ago
Replying to AnHeuermann:
The quick (and dirty) solution: Use csv files as output and OMEdit will open it for you and (mostly) not complain at all.
The models we are currently trying to debug have about 50,000 variables. The .mat files easily exceed 1 GB. CSV may not be an option in this case :)
In general, please note that the cases where you really need help for debugging are not MWEs...
Those are basically text files and will be valid as long the process won't be killed while writing a new entry (new line). Even in that case one could maybe clean the file by removing the last corrupt line(s) as long the encoding is not damaged in some way.
But I guess the binary mat files on the other hand are a different story. I'm not sure how they are encoded but I assume you really need to stop the simulation by a user signal and then handle all saving of variables and deallocation stuff. But there are so many cases where the simulation could be stuck, so implementing that would be a good but time consuming thing.
I'm not sure I am following you here. The .mat file is written one row at time, when each step is accepted, in order not to allocate any RAM to the simulation results, which could be disastrous in heavyweight simulations. The runtime already contains the code to properly finalize the file, we simply need to call that upon receiving the terminate signal. Whatever the runtime was doing when the signal is received will simply be discarded. Do I miss something?
We also eventually need to get information about the current time step, for advanced debugging, that is what I am trying to sort out with @sjoelund.se. But that is another story.
comment:3 by , 5 years ago
Component: | OMEdit → Run-time |
---|---|
Owner: | changed from | to
When user cancels the simulation then OMEdit calls kill which does the following,
Kills the current process, causing it to exit immediately. On Windows, kill() uses TerminateProcess, and on Unix and macOS, the SIGKILL signal is sent to the process.
Basically in the runtime we need to implement a handler that catches the SIGKILL (SetUnhandledExceptionFilter for Windows) and then finishes the process in a better way.
There is similar code that I did to catch OMEdit crashes. See main.cpp. The same code could be copied to the runtime as well.
comment:4 by , 3 years ago
Continuing the discussion on https://github.com/OpenModelica/OpenModelica/issues/5926
The quick (and dirty) solution: Use csv files as output and OMEdit will open it for you and (mostly) not complain at all.
Those are basically text files and will be valid as long the process won't be killed while writing a new entry (new line). Even in that case one could maybe clean the file by removing the last corrupt line(s) as long the encoding is not damaged in some way.
But I guess the binary mat files on the other hand are a different story. I'm not sure how they are encoded but I assume you really need to stop the simulation by a user signal and then handle all saving of variables and deallocation stuff. But there are so many cases where the simulation could be stuck, so implementing that would be a good but time consuming thing.
In that case we should use some user input that is not a system signal. E.g. listen for a key combination during the simulation. If we would catch a terminate signal but keep stuck the user has a hard time to kill the process and will be annoyed. And it should behave the same for C and C++, so a lot of work.