Opened 4 years ago

Closed 3 years ago

#6099 closed defect (fixed)

Low level issues with arrays in Windows

Reported by: casella Owned by: mahge930
Priority: blocker Milestone: 1.17.0
Component: Code Generation Version:
Keywords: Cc: lucciud@…, adrpo, davide10.mariani@…

Description

While running a large model of a FEM multibody system, I get this runtime error

index[1] == 0, size == 1

Simulation process failed. Exited with code 3.

This is caused by this function call.

@adrpo found that the same issue takes place when running Modelica.Magnetic.FundamentalWave.Examples.BasicMachines.SMEE_Generator_MultiPhase.sim with OMEdit on Windows. The same model runs fine on Linux on the testerver works fine.

More specifically, the error is triggered when invoking the function ToSpacePhasor, involving a parameter array which is declared but not used in the function body. @adrpo checked the execution of the corresponding function and reports:

All I can tell is that some very fishy is happening with the stack. Sometime between this call:

create_index_spec(&tmp18, 2, (0),
make_index_array(1, (int) tmp15++), 'S', (1), (int*)0, 'W');

and this call (which is just one after the other):

indexed_assign_real_array(
  __omcQ_24tmpVar42, &__omcQ_24tmpVar43, &tmp18);

tmp18 is overwritten with bad data which triggers the assert.

Somehow on Linux, valgrind doesn't find any issue which leads me to believe it might be a GCC 5.3 optimization bug. Other strange stuff is that the model runs fine in the terminal but gets you the error via gdb or OMEdit.

Maybe the solution could be to go ahead with #5307 and use a more recent version of gcc on Windows (or clang), since the currently used version of gcc is more than four years old, if I'm not mistaken.

Attachments (2)

TestVectors.mo (1.8 KB) - added by casella 4 years ago.
Test.mo (365 bytes) - added by Lucciud@… 3 years ago.

Download all attachments as: .zip

Change History (25)

comment:1 Changed 4 years ago by casella

@Andrea.Bartolini managed to come up with a much smaller MWE, see attachment. Functions handling arrays are still involved.

On some Windows machines both M1 and M2 fail with the error reported in the original description of this issue. On my laptop, only M2 does.

I tried to compile the model with -O0 but that didn't solve the issue, so maybe it is really a fundamental gcc bug rather than an issue with gcc optimization.

Changed 4 years ago by casella

comment:2 Changed 3 years ago by adrpo

Both M1 and M2 works fine with clang using the latest OMDev.

comment:3 Changed 3 years ago by casella

Very good!

comment:4 Changed 3 years ago by Andrea.Bartolini

Hi,
tested with the last nightly build on win 10 enterprise:

  • M1 test works fine with Clang and fails with GCC,
  • M2 test works fine with both Clang and GCC

OpenModelica v1.17.0-dev-108-g0f229baaeb (64-bit)

Last edited 3 years ago by Andrea.Bartolini (previous) (diff)

comment:5 Changed 3 years ago by Andrea.Bartolini

  • Cc lucciud@… added

comment:6 Changed 3 years ago by casella

Unfortunately, Andrea reports that we still get failures of the same kind with larger models, even using clang. Are we really sure that this is a compiler issue and not some weird memory overflow issue due to bad pointer management?

comment:7 Changed 3 years ago by lucciud@…

Please consider also the attached package Test.mo.

The model M1 fails with both Clang and Gcc with the following error msg:

 index[1] == 0, size == 1
Assertion failed: index_spec_ok(dest), file ./util/index_spec.c, line 112

Simulation process failed. Exited with code 3.

It seems that the problem is caused by passing a matrix constructed using iterators to a function which uses iterators.

system: Windows 10 Enterprise
OMC: OpenModelica v1.17.0-dev-108-g0f229baaeb (64-bit)

Changed 3 years ago by Lucciud@…

comment:8 Changed 3 years ago by Andrea.Bartolini

The model M1 in Test.mo works fine on linux with both Clang and Gcc.

OMEdit - OpenModelica Connection Editor
Connected to OpenModelica 1.17.0~dev-108-g0f229ba
Linux Ubuntu 18.04

comment:9 Changed 3 years ago by casella

  • Cc adrpo added
  • Owner changed from adrpo to mahge930
  • Status changed from new to assigned

@mahge939, maybe you could also have a look at this? You should be quite familiar with low-level C code issues.

comment:10 Changed 3 years ago by casella

  • Cc davide10.mariani@… added

comment:11 Changed 3 years ago by mahge930

  • Status changed from assigned to accepted

comment:12 Changed 3 years ago by mahge930

I think PR #7009 has fixed it. It runs fine for me on old OMDev with GCC.

The issue was a mismatch between size of C integer literals (int) vs what our runtime functions expect, which was modelcia_integer (a.k.a. long long).

Please check and confirm if it is fixed. If it is not I will try with clang and new OMDev.
It is merged to master so you can have it on the next nightly build.

Last edited 3 years ago by mahge930 (previous) (diff)

comment:13 follow-up: Changed 3 years ago by casella

Thank @mahge390, I really appreciate this :) I just started a Windows nightly, so we can check ASAP.

I love the C language! BTW, how long is long long? :)

comment:14 in reply to: ↑ 13 Changed 3 years ago by mahge930

Replying to casella:

Thank @mahge390, I really appreciate this :) I just started a Windows nightly, so we can check ASAP.

I love the C language! BTW, how long is long long? :)

:)

As far as I can tell, technically, there is no long long in C. At least the C89 which we are supposed to use. It is in C99 and it is supposed to be 8 bytes. We do not claim that our C code is C99. However, we use C99 features thanks to the fact that GCC enables them by default.

On systems and compiler that do not know it it might be demoted to long (don't quote me on this).
The expectation on long is that it has to be at least as big as int, i.e. at least 2 bytes.
However, it can also be 4 bytes or 8 bytes depending on implementation and architecture. For example

long with x64 MSVC it is 4 bytes. With x64 GCC and Clang it is 8.
long long is 8 bytes with all of them.

It is a whole lot of mess as you can see. But at least long long is 8 bytes long :)

Last edited 3 years ago by mahge930 (previous) (diff)

comment:15 Changed 3 years ago by lucciud@…

@mahge390 and @casella

thank you a lot for working on this.

I tested testVectors.M1 & M2 and Test.M1 on my Windows system with the current nightly v1.17.0-dev-242-g691f9c5c4a (64-bit). They run.

During compilation they give some messages (duplicate section ... has different size), for which I assume I will have to wait the next nightly, as you said.

Moreover, I was also able to run the "real" cases that showed the problem. So for sure something has been fixed.

thank you a lot

comment:16 follow-up: Changed 3 years ago by adrpo

That's why we have our own types in MetaModelica and C runtime but it seems they were not used everywhere.

comment:17 Changed 3 years ago by adrpo

For the "duplicate section ... has different size" see #6179. Is just a notice because we link both with static and dynamic libstdc++. The plan is to fix it sometime next week.

comment:18 in reply to: ↑ 16 Changed 3 years ago by mahge930

Replying to adrpo:

That's why we have our own types in MetaModelica and C runtime but it seems they were not used everywhere.

Yup. Fortunately there were not too many such usages. They were mostly in code for reductions (e.g sum(), max() ...) and for-itered array expressions.

I should have checked the CPP code generators as well because, I think, they use the same code. I will check.

comment:19 Changed 3 years ago by casella

I confirm that also the large FEM multibody I mentioned in the original description of the ticket now runs without causing this error anymore.

I'm still getting some issues during simulation, but that's another story.

I'll keep this ticket open until @mahge930 has checked that the CPP code generators are also fine from this point of view.

@mahge930, I would also suggest to add the attached Test.mo to the core testsuite, so that it is tested with all runtimes by default.

comment:20 Changed 3 years ago by lucciud@…

I checked today (v1.17.0-dev-254-gf6d7f171aa). I confirm that compilation is clean and cases run.

comment:21 Changed 3 years ago by mahge930

@casella I checked the CPP runtimes and they do not use this. It was only the now-removed code generators (e.g. CodegenAdevs) that used it because they were not maintained.

I will add the testcase and close this ticket if there are not further issues.

comment:22 Changed 3 years ago by mahge930

I have added the tests in PR7022. One of the models in the TestVectors.mo attached to this ticket has a division by zero problem in function f2.

internal := sum(v2[i] / v1[i] for i in 1:size(v2, 1)) ....

where v[1] (v is input to the function) is 0. I have fixed it (set it 1) to avoid that. If this example is striped from a bigger model you should probably check.

This is important since we do not do division by zero checks in functions.

comment:23 Changed 3 years ago by mahge930

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.