Opened 10 years ago

Closed 9 years ago

#3286 closed defect (fixed)

Array storage order

Reported by: Rüdiger Franke Owned by: Rüdiger Franke
Priority: high Milestone: 1.9.3
Component: Run-time Version: trunk
Keywords: Cc: Marcus Walther, Volker Waurich, Niklas Worschech, Nils Menager

Description

The storage order of multi-dimensional arrays (row major or column major) is generally arbitrary. The Cpp runtime currently follows the common C convention of row major order. On the other hand side most external functions require column major order (in particular LAPACK used in Modelica.Math.Matrices). This requires expensive O(n2) transposition operations. LAPACK sets the standard for linear algebra since decades -- C++ matrix libs don't appear to re-implement linear algebra, but provide flexible storage layouts instead.

StatArrayDim2 already supports column major as option. Supporting multiple storage orders is error prone though if all possible combinations of array operations shall work -- like A<rowMajor>.assign(B<colMajor>.getData()).

This is why the Cpp runtime shall be changed to use column major storage order internally.
How does the C runtime deal with this?

Change History (17)

comment:1 by Rüdiger Franke, 10 years ago

Owner: changed from somebody to Rüdiger Franke
Status: newassigned

comment:2 by Rüdiger Franke, 10 years ago

Cc: Nils Menager added

comment:3 by Adrian Pop, 10 years ago

See CodegenC.tpl

    // Arrays are converted to fortran format that are stored in _ext-variables.
    'data_of_<%expTypeShort(t)%>_f77_array(&(<%extVarName(c)%>))'

comment:4 by Rüdiger Franke, 10 years ago

Thanks. It appears that the code generation only considers row major storage layout so far, e.g. in the function SimCodeTV.flattenArrayExpToList.

Last edited 10 years ago by Rüdiger Franke (previous) (diff)

comment:5 by Rüdiger Franke, 10 years ago

r25613 changes the internal storage layout of the Cpp runtime to column major order. The main changes are:

  1. StatArray*: flip index calculation in operator() methods
  2. DynArray*: create boost::multi-array with boost::fortran_storage_order
  3. ArraySlice: adapt assign and getData methods as they access raw memory
  4. CodegenCpp.tpl, StatArrayDim2: remove template parameter fortran

Literal array data is currently put into linear memory with row major order, e.g. using the function SimCodeTV.flattenArrayExpToList, and then assigned to an array. These assignments have been changed to use the new function

  void assignRowMajorData(const S *data, BaseArray<T> &array)

that considers arrays with arbitrary storage layout. Alternatively the generating functions could be changed to create the literal data in column major order. Also could the faster memcpy used so far be re-activated for one-dimensional arrays. But this should only effect initialization -- afterwards the transfer of data between arrays works with memcpy as before, because all arrays use the same storage layout.

The external C interface has not been touched yet, because it appears to have some flaws and it is questionable if someone uses it with multi-dimensional arrays. But ArrayOperations.cpp provides the second new funtion

  void convertArrayLayout(const BaseArray<S> &s, BaseArray<T> &d)

that could be used to flip all dimensions into temporarily created _ext-arrays.

Last but not least the external F77 interface could be simplified to directly pass through multi-dimensional arrays. This avoids overhead when using Modelica.Math.Matrices, making the generated code competitive with other linear algebra packages.

comment:6 by Martin Sjölund, 10 years ago

Note that the cost to transpose a matrix is linear in its size (2d-matrix of course n2 where n is the size of one side). However, the cost of most of the lapack operations is n3 or log (n)*n2 or so, so the cost of n2 is not so important.

Also, for integer matrices, they might need to be converted even if they are stored in the correct row order (because the size of the integer is different in C/F77/Modelica).

comment:7 by Rüdiger Franke, 10 years ago

Computational complexity alone does indeed not provide for a sufficient motivation. Another argument is cache hits that can destroy performance -- it's a non issue if the storage layout matches. Moreover, looking at the PowerSystems library that naturally calls Modelica.Math.Matrices.{inv,solve,eigenValues} for relatively small matrices, one really does not want to have all the temporary matrices and transposition operations in between. The same should apply to mechanical models.

Is there a way to tell functions like flattenArrayExpToList about the wanted storage layout?

comment:8 by Rüdiger Franke, 10 years ago

Regarding your last comment: isn't size of int 4 bytes in C, F77 and Modelica, both on 32 and 64 bit systems?

in reply to:  8 comment:9 by Martin Sjölund, 10 years ago

Replying to rfranke:

Regarding your last comment: isn't size of int 4 bytes in C, F77 and Modelica, both on 32 and 64 bit systems?

No, it is not :) The Integer in OpenModelica 64-bit is 64-bit.

comment:10 by Rüdiger Franke, 10 years ago

Not in the Cpp runtime :)

It translates Modelica Integer to regular int. Meaning that the target system decides about the appropriate size of an Integer and other's like Fortran seem to follow this as well.

But wait a moment. If the C runtime uses a 64bit modelica_integer, then you should check the code generation for external F77. It passes a modelica_integer lda as (int *)&lda. This only appears to work on little endian platforms if sizeof(modelica_integer) > sizeof(int)!?

comment:11 by Martin Sjölund, 10 years ago

We convert from long to int in the external calls (both C and F77).

comment:12 by Rüdiger Franke, 10 years ago

But for F77 you only convert the pointer to the address of the long!

comment:13 by Martin Sjölund, 10 years ago

That might be due to using f2c.h for Fortran code in the past; f2c.h says a Fortran integer is a long.

comment:14 by Rüdiger Franke, 10 years ago

r25614 adds missing pieces after having seen the results of the nightly tests. In particular:

  • adapt StatArray*::append methods to the internal storage layout
  • fix DynArrayDim2<T>::append method -- the previous version was never assigning data from the call argument b to the local vector a -- assignment from DynArrayDim1<T> into DynArrayDim2<T> can be formulated without a local vector when using a friend declaration.
  • move assignRowMajorData from ArrayOperations.cpp to Array.h, because it must also work for arrays of model defined records, like ComplexType used in Modelica.Electrical.QuasiStationary

It is questionable why the append methods exist at all (array_alloc_<%scalarPrefix%><%arrayTypeStr%> in the C runtime). See Modelica.Electrical.Digital.Examples.BUF3S for an example. It uses constant arrays from the package Modelica.Electrical.Digital.Tables. These arrays are created dimension by dimension at runtime during the evaluation of model equations. Why aren't they pre-generated into one chunk of data like arrays of literals?

comment:15 by Rüdiger Franke, 10 years ago

The questionable creation of arrays at runtime does only seem to assign single-dimensional data and use the append methods for higher dimensions. This enables to put back the new assignRowMajorData to the precompiled ArrayOperations.cpp (see r25623).

Moreover r25620 optimizes the access to elements of DynArrayDim*. The operator[] of the underlying boost multi_array returns array slices for arbitrary storage layouts -- quite impressive, but introducing some overhead, in particular if successively called for multiple dimensions if just one element is needed at the end. The optimized versions pick the element directly out of the array data -- the non-optimized versions are kept in comments, e.g.:

  inline virtual T& operator()(size_t i, size_t j, size_t k)
  {
    //return _multi_array[i][j][k];
    const size_t *shape = _multi_array.shape();
    return _multi_array.data()[i-1 + shape[0]*(j-1 + shape[1]*(k-1))];
  }

comment:16 by Rüdiger Franke, 10 years ago

r25629 (re-)adds support for multi-dimensional arrays to the external "C" interface -- it's needed for external tables. See e.g. Modelica.Magnetic.FluxTubes.Examples.MovingCoilActuator.ForceCurrentBehaviour.

The change renames the template extCBoolCast, which was already called for any array arg, to extCArrayArg. The implementation was changed in two ways:

  1. consider any array arg of the external function, also if it is a local variable of the calling Modelica function (previously only inputs of the Modelica function were converted)
  2. don't cast pointers passed to the external function, because this is dangerous (see comment:10 above)

Hopefully the changes introduce no new problems, but if so, then it needs to be treated additionally.

comment:17 by Rüdiger Franke, 9 years ago

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.