#3588 closed defect (fixed)
diffModelicaFileListings consumes huge time and memory
Reported by: | Adeel Asghar | Owned by: | Martin Sjölund |
---|---|---|---|
Priority: | blocker | Milestone: | 1.9.4 |
Component: | Interactive Environment | Version: | |
Keywords: | Cc: | Adrian Pop, massimo ceraolo, Francesco Casella |
Description
When diffModelicaFileListings
is used with huge strings (which is common case when you working for example in MSL) it uses alot of CPU and memory. Most of the time crashing out with Too many heap sections
error.
When I run the attached script on my machine then OMC is consuming 20% of CPU and almost more than 2GB of memory.
Attachments (1)
Change History (14)
by , 9 years ago
Attachment: | script.mos added |
---|
comment:1 by , 9 years ago
Cc: | added |
---|
comment:2 by , 9 years ago
comment:3 by , 9 years ago
The thing that consumes the memory is the myer's paths. So we need to do something about the diff algorithm before a release :)
comment:4 by , 9 years ago
As I understand it, the changes are rather localized to a class so we should only apply the diff algorithm to that part of the file or buffer. Shouldn't this be possible?
comment:5 by , 9 years ago
We try to do this, but I guess in this case the unparsing does not match the Dymola one very well, and so we fail and resort to a general diff. But yes, it would be possible if we add some grouping of tokens into elements or similar, and do a diff on those to find out which parts of the file changed.
comment:6 by , 9 years ago
Sorry for the naive question, is this related to the new commits by Adeel? Or was it already there before?
comment:8 by , 9 years ago
My new approach handles this a lot faster:
Notification: Performance of diffModelicaFileListings scan string 1: time 0.02594/0.02594 Notification: Performance of diffModelicaFileListings parse string 1: time 0.06621/0.09221 Notification: Performance of diffModelicaFileListings scan string 2: time 0.03426/0.1265 Notification: Performance of diffModelicaFileListings parse string 2: time 0.2994/0.426 Notification: Performance of treeDiff: time 0.001079/0.4271
But it does not yet consider whitespace changes and indentation in as nice a way as the previous approach. But at least the basic algorithm runs a lot faster. Note that it will run slower once the whitespace issue is fixed (because it will go deeper into the trees to find out where the actual diff resides; right now it just thinks the diffs are too many and outputs suboptimal results).
It is based on an LL(k) hand-writen recursive-descent parser (written in MetaModelica). I had intended to use ANTLR, but the unmaintained C version was not powerful enough.
The treeDiff algorithm can be improved as well (does a lot of substring calls to compare tokens and print the output to Print.mo, which currently dominates execution time; adding support to do a string compare with offset as well as Print.mo taking an offset+length). This is more apparent for larger models (then treeDiff takes longer than parsing even though the number of diffs is only equal to 2; this is simply a scaling issue).
comment:9 by , 9 years ago
The diff looks almost good now (updateComponent moves the component, which it should not do). And there is a whitespace issue for the within clause...
--- old.mo 2015-12-29 09:55:28.218333811 +0100 +++ merged.mo 2015-12-29 09:55:29.442376401 +0100 @@ -1,5 +1,4 @@ -within Modelica.Electrical; -package PowerConverters "Rectifiers, Inverters and DC/DC converters" +within Modelica.Electrical;package PowerConverters "Rectifiers, Inverters and DC/DC converters" extends Modelica.Icons.Package; package UsersGuide "User's Guide" extends Modelica.Icons.Information; @@ -3138,7 +3137,7 @@ package MultiPhaseTwoLevel "Multi phase two level inverter example" extends Modelica.Icons.ExamplesPackage; - model MultiPhaseTwoLevel_R "Multi phase DC to AC converter with R load" + model MultiPhaseTwoLevel_R "Multi phase DC to AC converter with R load" import Modelica_Electrical_PowerConverters = Modelica.Electrical.PowerConverters; extends Modelica.Icons.Example; @@ -3188,13 +3187,6 @@ offset=fill(0.5, m), freqHz=fill(f1, m)) annotation (Placement(transformation(extent={{-30, -64},{-50,-44}}))); - Modelica.Blocks.Math.Harmonic fundamentalWaveCurrent[m]( - each k=1, - each x0Cos=0, - each x0Sin=0, - each f=f1) annotation (Placement(transformation( - extent={{-10,-10},{10,10}}, - origin={90,-50}))); Modelica.Blocks.Math.Harmonic fundamentalWaveVoltage[m]( each k=1, each x0Cos=0, @@ -3213,6 +3205,11 @@ extent={{-10,-10},{10,10}}, rotation=270, origin={40,-90}))); + Modelica.Blocks.Math.Harmonic fundamentalWaveCurrent[m]( + each k=1, + each x0Cos=0, + each x0Sin=0, + each f=f1) annotation (Placement(visible = true, transformation(origin = {90, -48}, extent = {{-10, -10}, {10, 10}}, rotation = 0))); equation connect(constantVoltage_p.n, constantVoltage_n.p) annotation (Line( points={{-70,40},{-70,20}},
Total time to process:
Notification: Performance of diffModelicaFileListings scan string 1: time 0.2323/0.2323, GC stats: heapsize_full: 295669760 free_bytes_full: 77459456 unmapped_bytes: 0 bytes_allocd_since_gc: 8094992 allocd_bytes_before_gc: 390124848 non_gc_bytes: 0 gc_no: 14 markers_m1: 7 bytes_reclaimed_since_gc: 83400400 reclaimed_bytes_before_gc: 98056592 Notification: Performance of diffModelicaFileListings parse string 1: time 0.04121/0.2735, GC stats: heapsize_full: 295669760 free_bytes_full: 54882304 unmapped_bytes: 0 bytes_allocd_since_gc: 35319632 allocd_bytes_before_gc: 390124848 non_gc_bytes: 0 gc_no: 14 markers_m1: 7 bytes_reclaimed_since_gc: 88051184 reclaimed_bytes_before_gc: 98056592 Notification: Performance of diffModelicaFileListings scan string 2: time 0.03661/0.3102, GC stats: heapsize_full: 295669760 free_bytes_full: 41791488 unmapped_bytes: 0 bytes_allocd_since_gc: 48267808 allocd_bytes_before_gc: 390124848 non_gc_bytes: 0 gc_no: 14 markers_m1: 7 bytes_reclaimed_since_gc: 88060032 reclaimed_bytes_before_gc: 98056592 Notification: Performance of diffModelicaFileListings parse string 2: time 0.04889/0.3592, GC stats: heapsize_full: 295669760 free_bytes_full: 13393920 unmapped_bytes: 0 bytes_allocd_since_gc: 76694832 allocd_bytes_before_gc: 390124848 non_gc_bytes: 0 gc_no: 14 markers_m1: 7 bytes_reclaimed_since_gc: 88093776 reclaimed_bytes_before_gc: 98056592 Notification: Performance of treeDiff: time 0.7631/1.122, GC stats: heapsize_full: 295669760 free_bytes_full: 11444224 unmapped_bytes: 0 bytes_allocd_since_gc: 78706992 allocd_bytes_before_gc: 390124848 non_gc_bytes: 0 gc_no: 14 markers_m1: 7 bytes_reclaimed_since_gc: 88176848 reclaimed_bytes_before_gc: 98056592
So all garbage collection happens during parsing of MSL for the script. Still, requires 13 MB memory for lexing, 26 MB for parsing. The tree diff only required around 2 MB :)
comment:10 by , 9 years ago
I fixed updateComponent, so it no longer moves components when updated...
Latest stats:
Notification: Performance of diffModelicaFileListings scan string 1: time 0.02745/0.02745 Notification: Performance of diffModelicaFileListings parse string 1: time 0.07575/0.1033 Notification: Performance of diffModelicaFileListings scan string 2: time 0.281/0.3844 Notification: Performance of diffModelicaFileListings parse string 2: time 0.03475/0.4192 Notification: Performance of treeDiff: time 0.6675/1.087
Diff is now acceptable:
-
.mo
old new 3192 3192 each k=1, 3193 3193 each x0Cos=0, 3194 3194 each x0Sin=0, 3195 each f=f1) annotation (Placement(transformation( 3196 extent={{-10,-10},{10,10}}, 3197 origin={90,-50}))); 3195 each f=f1) annotation (Placement(visible = true, transformation(origin = {90, -48}, extent = {{-666, -10}, {10, 10}}, rotation = 0))); 3198 3196 Modelica.Blocks.Math.Harmonic fundamentalWaveVoltage[m]( 3199 3197 each k=1, 3200 3198 each x0Cos=0,
comment:11 by , 9 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Summary: | diffModelicaFileListings consumes huge memory → diffModelicaFileListings consumes huge time and memory |
Well, diff algorithms scale O(ND). There is a huge number of diffs in there, and Myer's diff algorithm will take upwards of 2e10 iterations :)
Perhaps it would be better to implement a simple Modelica parser on top of the lexer (possibly just scanning for semicolons and ignoring comments to make a grouping), in order to only perform diffs on elements that were changed (similar to a line diff)?