Opened 4 years ago
Closed 4 years ago
#6099 closed defect (fixed)
Low level issues with arrays in Windows
Reported by: | Francesco Casella | Owned by: | Mahder Alemseged Gebremedhin |
---|---|---|---|
Priority: | blocker | Milestone: | 1.17.0 |
Component: | Code Generation | Version: | |
Keywords: | Cc: | lucciud@…, Adrian Pop, davide10.mariani@… |
Description
While running a large model of a FEM multibody system, I get this runtime error
index[1] == 0, size == 1 Simulation process failed. Exited with code 3.
This is caused by this function call.
@adrpo found that the same issue takes place when running Modelica.Magnetic.FundamentalWave.Examples.BasicMachines.SMEE_Generator_MultiPhase.sim
with OMEdit on Windows. The same model runs fine on Linux on the testerver works fine.
More specifically, the error is triggered when invoking the function ToSpacePhasor, involving a parameter array which is declared but not used in the function body. @adrpo checked the execution of the corresponding function and reports:
All I can tell is that some very fishy is happening with the stack. Sometime between this call:
create_index_spec(&tmp18, 2, (0), make_index_array(1, (int) tmp15++), 'S', (1), (int*)0, 'W');and this call (which is just one after the other):
indexed_assign_real_array( __omcQ_24tmpVar42, &__omcQ_24tmpVar43, &tmp18);tmp18 is overwritten with bad data which triggers the assert.
Somehow on Linux, valgrind doesn't find any issue which leads me to believe it might be a GCC 5.3 optimization bug. Other strange stuff is that the model runs fine in the terminal but gets you the error via gdb or OMEdit.
Maybe the solution could be to go ahead with #5307 and use a more recent version of gcc on Windows (or clang), since the currently used version of gcc is more than four years old, if I'm not mistaken.
Attachments (2)
Change History (25)
comment:1 by , 4 years ago
by , 4 years ago
Attachment: | TestVectors.mo added |
---|
comment:4 by , 4 years ago
Hi,
tested with the last nightly build on win 10 enterprise:
M1
test works fine with Clang and fails with GCC,M2
test works fine with both Clang and GCC
OpenModelica v1.17.0-dev-108-g0f229baaeb (64-bit)
comment:5 by , 4 years ago
Cc: | added |
---|
comment:6 by , 4 years ago
Unfortunately, Andrea reports that we still get failures of the same kind with larger models, even using clang. Are we really sure that this is a compiler issue and not some weird memory overflow issue due to bad pointer management?
comment:7 by , 4 years ago
Please consider also the attached package Test.mo
.
The model M1
fails with both Clang and Gcc with the following error msg:
index[1] == 0, size == 1 Assertion failed: index_spec_ok(dest), file ./util/index_spec.c, line 112 Simulation process failed. Exited with code 3.
It seems that the problem is caused by passing a matrix constructed using iterators to a function which uses iterators.
system: Windows 10 Enterprise
OMC: OpenModelica v1.17.0-dev-108-g0f229baaeb (64-bit)
by , 4 years ago
comment:8 by , 4 years ago
The model M1
in Test.mo
works fine on linux with both Clang and Gcc.
OMEdit - OpenModelica Connection Editor
Connected to OpenModelica 1.17.0~dev-108-g0f229ba
Linux Ubuntu 18.04
comment:9 by , 4 years ago
Cc: | added |
---|---|
Owner: | changed from | to
Status: | new → assigned |
@mahge939, maybe you could also have a look at this? You should be quite familiar with low-level C code issues.
comment:10 by , 4 years ago
Cc: | added |
---|
comment:11 by , 4 years ago
Status: | assigned → accepted |
---|
comment:12 by , 4 years ago
I think PR #7009 has fixed it. It runs fine for me on old OMDev with GCC.
The issue was a mismatch between size of C integer literals (int
) vs what our runtime functions expect, which was modelcia_integer
(a.k.a. long long
).
Please check and confirm if it is fixed. If it is not I will try with clang and new OMDev.
It is merged to master so you can have it on the next nightly build.
follow-up: 14 comment:13 by , 4 years ago
Thank @mahge390, I really appreciate this :) I just started a Windows nightly, so we can check ASAP.
I love the C language! BTW, how long is long long? :)
comment:14 by , 4 years ago
Replying to casella:
Thank @mahge390, I really appreciate this :) I just started a Windows nightly, so we can check ASAP.
I love the C language! BTW, how long is long long? :)
:)
As far as I can tell, technically, there is no long long
in C. At least the C89 which we are supposed to use. It is in C99 and it is supposed to be 8 bytes. We do not claim that our C code is C99. However, we use C99 features thanks to the fact that GCC enables them by default.
On systems and compiler that do not know it it might be demoted to long (don't quote me on this).
The expectation on long
is that it has to be at least as big as int, i.e. at least 2 bytes.
However, it can also be 4 bytes or 8 bytes depending on implementation and architecture. For example
long
with x64 MSVC it is 4 bytes. With x64 GCC and Clang it is 8.
long long
is 8 bytes with all of them.
It is a whole lot of mess as you can see. But at least long long is 8 bytes long :)
comment:15 by , 4 years ago
@mahge390 and @casella
thank you a lot for working on this.
I tested testVectors.M1 & M2 and Test.M1 on my Windows system with the current nightly v1.17.0-dev-242-g691f9c5c4a (64-bit). They run.
During compilation they give some messages (duplicate section ... has different size), for which I assume I will have to wait the next nightly, as you said.
Moreover, I was also able to run the "real" cases that showed the problem. So for sure something has been fixed.
thank you a lot
follow-up: 18 comment:16 by , 4 years ago
That's why we have our own types in MetaModelica and C runtime but it seems they were not used everywhere.
comment:17 by , 4 years ago
For the "duplicate section ... has different size" see #6179. Is just a notice because we link both with static and dynamic libstdc++. The plan is to fix it sometime next week.
comment:18 by , 4 years ago
Replying to adrpo:
That's why we have our own types in MetaModelica and C runtime but it seems they were not used everywhere.
Yup. Fortunately there were not too many such usages. They were mostly in code for reductions (e.g sum(), max() ...
) and for-iter
ed array expressions.
I should have checked the CPP code generators as well because, I think, they use the same code. I will check.
comment:19 by , 4 years ago
I confirm that also the large FEM multibody I mentioned in the original description of the ticket now runs without causing this error anymore.
I'm still getting some issues during simulation, but that's another story.
I'll keep this ticket open until @mahge930 has checked that the CPP code generators are also fine from this point of view.
@mahge930, I would also suggest to add the attached Test.mo to the core testsuite, so that it is tested with all runtimes by default.
comment:20 by , 4 years ago
I checked today (v1.17.0-dev-254-gf6d7f171aa). I confirm that compilation is clean and cases run.
comment:21 by , 4 years ago
@casella I checked the CPP runtimes and they do not use this. It was only the now-removed code generators (e.g. CodegenAdevs) that used it because they were not maintained.
I will add the testcase and close this ticket if there are not further issues.
comment:22 by , 4 years ago
I have added the tests in PR7022. One of the models in the TestVectors.mo
attached to this ticket has a division by zero problem in function f2
.
internal := sum(v2[i] / v1[i] for i in 1:size(v2, 1)) ....
where v[1]
(v
is input to the function) is 0. I have fixed it (set it 1) to avoid that. If this example is striped from a bigger model you should probably check.
This is important since we do not do division by zero checks in functions.
comment:23 by , 4 years ago
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
@Andrea.Bartolini managed to come up with a much smaller MWE, see attachment. Functions handling arrays are still involved.
On some Windows machines both
M1
andM2
fail with the error reported in the original description of this issue. On my laptop, onlyM2
does.I tried to compile the model with
-O0
but that didn't solve the issue, so maybe it is really a fundamental gcc bug rather than an issue with gcc optimization.