Changes between Version 5 and Version 6 of WritingEfficientMetaModelica


Ignore:
Timestamp:
2014-10-15T12:18:37Z (10 years ago)
Author:
Martin Sjölund
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WritingEfficientMetaModelica

    v5 v6  
    1 == Writing efficient MetaModelica code ==
     1= Writing efficient MetaModelica code =
    22
    33In the bootstrapped compiler, together with the extended MetaModelica/Modelica things you can use there are some restrictions:
    4 1. avoid {{{matchcontinue}}} as much as possible and use {{{match}}} instead, combined with tail recursion if needed (functions where you expect to maybe iterate over more than 50 elements should be tail-recursive).
    5 1. If possible, rewrite {{{match}}}-expressions to use switch instead (+d=patternmAllInfo shows which expressions are optimised to switch). A switch needs to have one pattern that can be uniquely switched on a uniontype, Integer, or String type. If the last pattern is a default pattern, this pattern has to be the only one that is matched against (there may exist more, but they may only be patterns that cannot fail to match). If the last pattern is a default pattern, all uniontypes matched against in previous cases must have subpatterns that cannot fail to match. If all these restrictions are fulfilled, the match-expression avoids linear search of the patterns.
    6 1. Write tail recursive functions: if a function calls itself it should do that as last thing in the then part or in the last statement. You are required to bind all outputs in the same order as the function outputs or tail recursion does not work (so no wildcards ignoring one output). {{{matchcontinue}}} expressions can never be tail-recursive.
    7 1. Inlining functions could be used to great effect, but it currently interferes with separate compilation so in practice it will not work at the moment.
    8 1. Use builtin functions whenever possible: they have implementations that are better than you can achieve using MetaModelica code (for example: stringAppendList and stringDelimitList only use a single memory allocation, list reduce,map, and filter using the built-in operator avoid the extra listReverse)
    9 1. Avoid using the construct {{{case x equation true = fn(x); then (); case x equation false = fn(x); then ();}}}. The bootstrapped compiler will not merge the two cases into one and algorithms that ran in linear time using RML might run in quadratic time using the bootstrapped compiler if you do this. It also precludes optimisations such as tail recursion because you use {{{matchcontinue}}} instead of {{{match}}}. Use {{{case x guard fn(x) then ()}}} instead; it is possible to use this with {{{match}}}.
    10 1. Memory allocations are expensive. When optimizing the OMCC lexer generator it was possible to get speed similar to the ANTLR C version due to smarter algorithms avoiding memory allocations. That said, memory allocations are not incredibly expensive; use them when needed. But if you have the choice of sending arguments as a tuple or as 4 separate arguments, allocation of the tuple might be the lion's share of execution time depending on what the function does. Traversal routines can be rewritten for better performance if they do not create tuples all the time for example.
    11 1. Compiling efficiently depends a lot on the public interfaces of packages. If you do not intend for a function to be called from other packages, make it protected.
    12 1. Note that inlining calls across packages does not work due to the separate compilation scheme. Link-time optimizations in gcc/clang could remove function calls, but it probably will not.
    13 1. Note that tail recursion optimization is done by omc, not gcc. As such, -O0 and -O3 will generate the same function calls. But the stack usage is very different due to local optimizations within the function in C. If you use -O0 to debug something, you might trigger a stack overflow in a function that you thought was tail recursive but is actually not (-O3 just made the frames smaller so you could iterate over more elements). If this happens and you need -O0 to debug all variables, you need to first fix the function that triggers the stack overflow with -O0 or increase the stack size (simple on Linux; does not require re-compilation).
     4== Use builtin functions and operators whenever possible ==
     5Builtin functions have implementations that are better than you can achieve using MetaModelica code (for example: stringAppendList and stringDelimitList only use a single memory allocation).
     6
     7List reductions (like Modelica array reductions) can be used to avoid the extra listReverse at the end of functions written by the user. They are also easier to read. Example:
     8{{{#!mo
     9// List reduction, creates 1 list
     10list(2*x*y for x guard x>0 in lst);
     11// RML style, creates 6 lists including the listReverse operations
     12List.map1(List.map1(List.filter(lst, isPositive), realMul, 2.0), realMul, y);
     13}}}
     14
     15If you need to loop over a list, use a for- or while-loop rather than writing auxiliary tail-recursive functions (to save time writing and maintaining code):
     16{{{#!mo
     17for x in lst loop
     18  // ...
     19end for;
     20}}}
     21
     22Use if-expressions or if-statements to conditionally do things in code that is very similar:
     23{{{#!mo
     24// RML-style
     25case PAT1()
     26  algorithm
     27    f(x);
     28    true = g(y);
     29    h(z);
     30    i(x);
     31  then ();
     32case PAT1()
     33  algorithm
     34    f(x);
     35    false = g(y);
     36    i(x);
     37  then ();
     38}}}
     39{{{#!mo
     40// OMC style
     41case PAT1()
     42  algorithm
     43    f(x);
     44    if g(y) then
     45      h(z);
     46    end if;
     47    i(x);
     48  then ();
     49}}}
     50
     51== Avoid matchcontinue, use tail recursion ==
     52`matchcontinue` is slow. Whenever possible, use `match` instead. This avoids calling `setjmp`.
     53
     54Avoid using the construct {{{case x equation true = fn(x); then (); case x equation false = fn(x); then ();}}} since this requires matchcontinue.
     55The bootstrapped compiler will not merge the two cases into one and algorithms that ran in linear time using RML might run in quadratic time using the bootstrapped compiler if you do this.
     56It also precludes optimisations such as tail recursion because you use `matchcontinue` instead of `match`. Use {{{case x guard fn(x) then ()}}} instead; it is possible to use this with `match`.
     57
     58Also, write tail recursive functions: if a function calls itself it should do that as last thing in the then part or in the last statement.
     59You are required to bind all outputs in the same order as the function outputs or tail recursion does not work (no wildcards ignoring one output).
     60`matchcontinue` expressions can never be tail-recursive.
     61
     62== Switch optimization ==
     63If possible, rewrite `match`-expressions to use switch instead (+d=patternmAllInfo shows which expressions are optimised to switch). A switch needs to have one pattern that can be uniquely switched on a uniontype, Integer, or String type. If the last pattern is a default pattern, this pattern has to be the only one that is matched against (there may exist more, but they may only be patterns that cannot fail to match). If the last pattern is a default pattern, all uniontypes matched against in previous cases must have subpatterns that cannot fail to match. If all these restrictions are fulfilled, the match-expression avoids linear search of the patterns.
     64
     65== Inlining functions ==
     66
     67Inlining functions could be used to great effect, but it currently interferes with separate compilation so in practice it will not work at the moment.
     68Inlining calls across packages does not work due to the separate compilation scheme.
     69Link-time optimizations in gcc/clang could remove function calls, but it probably will not.
     70
     71== Avoid memory allocations ==
     72
     73Memory allocations are expensive.
     74When optimizing the OMCC lexer generator it was possible to get speed similar to the ANTLR C version due to smarter algorithms avoiding memory allocations.
     75That said, memory allocations are not incredibly expensive; use them when needed.
     76But if you have the choice of sending arguments as a tuple or as 4 separate arguments, allocation of the tuple might be the lion's share of execution time depending on what the function does.
     77
     78Traversal routines can be rewritten for better performance if they do not create tuples all the time for example.
     79
     80== Think about build system performance ==
     81
     82Compiling efficiently depends a lot on the public interfaces of packages. If you do not intend for a function to be called from other packages, make it protected.
     83
     84== Tail recursion ==
     85
     86Note that tail recursion optimization is done by omc, not gcc. As such, -O0 and -O3 will generate the same function calls. But the stack usage is very different due to local optimizations within the function in C. If you use -O0 to debug something, you might trigger a stack overflow in a function that you thought was tail recursive but is actually not (-O3 just made the frames smaller so you could iterate over more elements). If this happens and you need -O0 to debug all variables, you need to first fix the function that triggers the stack overflow with -O0 or increase the stack size (simple on Linux; does not require re-compilation).