Opened 12 years ago

Closed 11 years ago

#2068 closed defect (fixed)

Segfault (Stackoverflow?) when bootstrapping omc

Reported by: choeger Owned by: sjoelund.se
Priority: high Milestone: 1.9.0
Component: MetaModelica Version: trunk
Keywords: Cc: adrpo

Description

Hi all,

I try to compile omc using the bootstrap from tarball method.

In the second run (bootstrap from build/) omc segfaults after ~ 12m

It looks alot like a stack overflow in

omc_SimCodeUtil_elaborateRecordDeclarationsForMetarecords

(gdb) frame
#0 0x0000000001219268 in omc_SimCodeUtil_elaborateRecordDeclarationsForMetarecords (_inExpl=<error reading variable: Cannot access memory at address 0x7fffff7fefe8>,

_inAccRecordDecls=<error reading variable: Cannot access memory at address 0x7fffff7fefe0>, _inReturnTypes=<error reading variable: Cannot access memory at address 0x7fffff7fefd8>) at Main_main2.c:1444347

1444347 {

The rest of the stack-trace contains > 14000 entries. I leave out the coredump for now since it's ~ 2.4GB. Just let me know where to put it, if it helps.

Attachments (1)

config.log (243.6 KB) - added by choeger 12 years ago.
Configure output

Download all attachments as: .zip

Change History (13)

Changed 12 years ago by choeger

Configure output

comment:1 Changed 12 years ago by sjoelund.se

I have considered adding some manual guards for checking stack overflow. This should not be too complicated if we assume it is a grave error to use more than some reasonable size of the stack. The problem is of course that "reasonable" is subjective in some OS'es. Do we know of any nice ways to query the total allocated size of the stack at runtime?

comment:2 Changed 12 years ago by sjoelund.se

  • Cc adrpo added
  • Owner changed from somebody to sjoelund.se
  • Status changed from new to accepted

comment:3 Changed 12 years ago by sjoelund.se

This is hopefully fixed in r15150.

comment:4 Changed 12 years ago by choeger

This seems to help a little bit. Unfortunately, the next segfault follows closely:

#0  0x00007ffff7c98f76 in GC_clear_stack_inner (arg=<error reading variable: Cannot access memory at address 0x7fffff7fed48>, limit=<error reading variable: Cannot access memory at address 0x7fffff7fed40>) at misc.c:287
#1  0x00007ffff7c98fca in GC_clear_stack_inner (arg=0x0, limit=0x7fffff7fd600 <Address 0x7fffff7fd600 out of bounds>) at misc.c:292
#2  0x00007ffff7c98fca in GC_clear_stack_inner (arg=0x0, limit=0x7fffff7fd600 <Address 0x7fffff7fd600 out of bounds>) at misc.c:292
#3  0x00007ffff7c98fca in GC_clear_stack_inner (arg=0x0, limit=0x7fffff7fd600 <Address 0x7fffff7fd600 out of bounds>) at misc.c:292
#4  0x00007ffff7c98fca in GC_clear_stack_inner (arg=0x0, limit=0x7fffff7fd600 <Address 0x7fffff7fd600 out of bounds>) at misc.c:292
#5  0x00007ffff7c98fca in GC_clear_stack_inner (arg=0x0, limit=0x7fffff7fd600 <Address 0x7fffff7fd600 out of bounds>) at misc.c:292
#6  0x00007ffff7c99063 in GC_clear_stack (arg=0x0) at misc.c:338
#7  0x00007ffff7c94c39 in GC_generic_malloc_many (lb=32, k=1, result=0x7ffff7eed138 <first_thread+280>) at mallocx.c:429
#8  0x00007ffff7c9fb16 in GC_malloc (bytes=24) at thread_local_alloc.c:161
#9  0x00000000014184f6 in mmc_alloc_words (nwords=3) at meta/gc/marksweep.c:600
#10 0x0000000000445df4 in mmc_mk_box2 (ctor=0, x0=0x2168b183, x1=0x1535e83 <mmc_nil+3>)
#11 0x0000000000c7ee8a in omc_Expression_traverseExp (_inExp=0x2168b183, func=0x11a5da2 <omc_SimCodeUtil_matchMetarecordCalls>, _inTypeA=0x1535e83 <mmc_nil+3>)
#12 0x0000000000c92af8 in omc_Expression_traverseSubexpressionsHelper (_itpl=0x2be7d123)
#13 0x0000000000bdbbbe in omc_DAEUtil_traverseDAEEquationsStmts (_inStmts=0x8ce80dc3, func=0xc92a41 <omc_Expression_traverseSubexpressionsHelper>, _iextraArg=0x2be7d6e3)
#14 0x00000000010631c6 in omc_Patternm_traverseCases (_inCases=0x8ce80be3, func=0x11a5da2 <omc_SimCodeUtil_matchMetarecordCalls>, _inA=0x1535e83 <mmc_nil+3>)
#15 0x00000000010632f7 in omc_Patternm_traverseCases (_inCases=0x8ce80bc3, func=0x11a5da2 <omc_SimCodeUtil_matchMetarecordCalls>, _inA=0x1535e83 <mmc_nil+3>)
#16 0x0000000000c82cb2 in omc_Expression_traverseExp (_inExp=0x8f6e9e43, func=0x11a5da2 <omc_SimCodeUtil_matchMetarecordCalls>, _inTypeA=0x1535e83 <mmc_nil+3>)
#17 0x0000000000c92af8 in omc_Expression_traverseSubexpressionsHelper (_itpl=0x2be7dd43)
#18 0x0000000000bdbb4b in omc_DAEUtil_traverseDAEEquationsStmts (_inStmts=0x8ce809c3, func=0xc92a41 <omc_Expression_traverseSubexpressionsHelper>, _iextraArg=0x2be7df23)
#19 0x00000000005757bc in omc_BackendDAEUtil_traverseAlgorithmExps (_inAlgorithm=0x8ce809a3, func=0xc92a41 <omc_Expression_traverseSubexpressionsHelper>, _inTypeA=0x2be7df23)
#20 0x0000000001186b77 in omc_SimCodeUtil_elaborateRecordDeclarations (_inVars=0x8ce80963, _inAccRecordDecls=0x2c3e6ba3, _inReturnTypes=0x2c3e6b83)
#21 0x00000000011869d2 in omc_SimCodeUtil_elaborateRecordDeclarations (_inVars=0x8ce80943, _inAccRecordDecls=0x2c3e6ba3, _inReturnTypes=0x2c3e6b83)
#22 0x00000000011869d2 in omc_SimCodeUtil_elaborateRecordDeclarations (_inVars=0x8ce80923, _inAccRecordDecls=0x2c3e6ba3, _inReturnTypes=0x2c3e6b83)
#23 0x0000000001182898 in omc_SimCodeUtil_elaborateFunction (_program=0x38c93ea3, _inElement=0x90ef2fa3, _inRecordTypes=0x2c3e6b83, _inRecordDecls=0x2c3e6ba3, _inIncludes=0x1605c03 <mmc_nil+3>, _inIncludeDirs=0x1605c03 <mmc_nil+3>, 
    _inLibs=0x2d1edfe3)
#24 0x0000000001187d37 in omc_SimCodeUtil_elaborateFunctions2 (_program=0x38c93ea3, _daeElements=0x4a373d03, _inFunctions=0x2be6b1a3, _inRecordTypes=0x2c3e6b83, _inDecls=0x2c3e6ba3, _inIncludes=0x1605c03 <mmc_nil+3>, 
    _inIncludeDirs=0x1605c03 <mmc_nil+3>, _inLibs=0x2d1edfe3)

followed by ~ 6000 frames of omc_SimCodeUtil_elaborateFunctions2.

Again, I have a dump at hand if you point me a place to upload it.

comment:5 Changed 12 years ago by adrpo

Do you have any ulimit on stack?

comment:6 Changed 12 years ago by choeger

Yes, actually, I do have a ulimit set (obviously choosen by fedora):

[choeger@localhost openmodelica]$ ulimit -H -s
unlimited
[choeger@localhost openmodelica]$ ulimit -S -s
8192

Since stack size limitation is IMO a good idea, couldn't you just set a similar limit for the build bot instances? That way you could figure out where a tailrecursion optimization might bring some performance enhancements.

comment:7 Changed 12 years ago by adrpo

We could probably do that. However, what happens if you remove the ulimit on stack?

comment:8 Changed 12 years ago by choeger

Without running the whole testsuite on the end product, it seems to have worked.

comment:9 Changed 12 years ago by sjoelund.se

The build bot has those exact stack limits though :( Maybe we could reduce the size... Or compile with checks for smashing the stack.

comment:10 Changed 12 years ago by sjoelund.se

Ok, we managed to break the bootstrap build by setting fstack-protector-all fstack-check and ulimit -s 1024.

comment:11 Changed 12 years ago by sjoelund.se

This should be working in r15197. The automatic builds are set to 4MB stack and uses gcc stack checking (since 8MB is the default, this should ensure we don't break bootstrapping). 2MB stack seems to have been too little anyway.

comment:12 Changed 11 years ago by sjoelund.se

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.