Crash by pure virtual method called when threads are enabled

Report any bugs here and we'll post fixes

Moderators: Sascha Willems, Thomas

Crash by pure virtual method called when threads are enabled

Postby Boost113 » Thu Mar 15, 2018 5:46 am

Hi,

When I compile Newton on linux with "-DTHREAD_EMULATION=OFF" (the default) and I run my project's tests in valgrind I get this crash (this also happens sometimes when normally running, so it is pretty random):

Code: Select all
terminate called without an active exception
pure virtual method called

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LeviathanTest is a Catch v2.0.1 host application.
terminate called recursively
Run with -? for options

==11332==
==11332== Process terminating with default action of signal 6 (SIGABRT): dumping core
==11332==    at 0x734966B: raise (in /usr/lib64/libc-2.26.so)
==11332==    by 0x734B380: abort (in /usr/lib64/libc-2.26.so)
==11332==    by 0x6AB0FEC: __gnu_cxx::__verbose_terminate_handler() (in /usr/lib64/libstdc++.so.6.0.24)
==11332==    by 0x6AAEC15: ??? (in /usr/lib64/libstdc++.so.6.0.24)
==11332==    by 0x6AAEC60: std::terminate() (in /usr/lib64/libstdc++.so.6.0.24)
==11332==    by 0x6AAFA4E: __cxa_pure_virtual (in /usr/lib64/libstdc++.so.6.0.24)
==11332==    by 0x54D5C60: dgThread::dgThreadSystemCallback(void*) (dgThread.cpp:166)
==11332==    by 0x6ADB14E: ??? (in /usr/lib64/libstdc++.so.6.0.24)
==11332==    by 0x680761A: start_thread (in /usr/lib64/libpthread-2.26.so)
==11332==    by 0x742998E: clone (in /usr/lib64/libc-2.26.so)


But when I compile with "-DTHREAD_EMULATION=ON" I don't get a crash anymore. So I'm pretty sure this is related to the threading and is caused by a race condition. I tried to get more info about this using helgrind (valgrind thread error detector) but it didn't print anything related to my physics tests.

I'm also experiencing another bug (for some reason NewtonConvexCollisionCalculateInertialMatrix returns NaNs randomly) that I wanted to debug with valgrind but this bug is preventing me from doing that and I'd prefer to have the threading working to properly test that.

Edit: I forgot to mention that I reported this on github: https://github.com/MADEAPPS/newton-dynamics/issues/88
Boost113
 
Posts: 26
Joined: Thu Oct 23, 2014 3:24 am

Re: Crash by pure virtual method called when threads are ena

Postby Julio Jerez » Thu Mar 15, 2018 4:03 pm

what happened if you do not run it with valgrin?
It would be surprised if this was a race condition, because all the treading is done on job that are data parallel so there can not be race condition. My guess is that some how the standart thread support is some what diffrent in Linux than is is in windows.

I do no have a Linux system now but is about a month I will get a thread ripper and turn my current system onto a Linux box, so that I ca build the engine. other than that I do not have a way to run or debug a Linux build.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Crash by pure virtual method called when threads are ena

Postby Boost113 » Thu Mar 15, 2018 4:20 pm

Sometimes when just running it (without debugger or valgrind) just segfaults. I haven't stored any of the core files, as I just thought that I can do that. I tried looking at earlier core files but as I've recompiled since then this is the best I got:
(missing symbols)
Code: Select all
Stack trace of thread 345:
 #0  0x00007f638fa1a66b raise (libc.so.6)
 #1  0x00007f638fa1c381 abort (libc.so.6)
 #2  0x00007f63903c4025 _ZN9__gnu_cxx27__verbose_terminate_handlerEv (libstdc++.so.6)
 #3  0x00007f63903c1c16 _ZN10__cxxabiv111__terminateEPFvvE (libstdc++.so.6)
 #4  0x00007f63903c1c61 _ZSt9terminatev (libstdc++.so.6)
 #5  0x00007f6391970e70 n/a (/home/hhyyrylainen/Projects/leviathan/build/bin/lib/libNewton.so)



Also I've tried over 30 times to get the error to show up when running through gdb but I haven't gotten an error with it.

Without running in the debugger I just randomly get the message "terminate called without an active exception" and then "SIGABRT - Abort (abnormal termination) signal" and the test fails.

There's two more stack traces in the github link I posted.
Boost113
 
Posts: 26
Joined: Thu Oct 23, 2014 3:24 am

Re: Crash by pure virtual method called when threads are ena

Postby Boost113 » Thu Mar 15, 2018 5:15 pm

After continuously running the tests until it segfaulted using a bash loop. I now have a corefile of the crash.

This is the backtrace of the crashing thread:
Code: Select all
#0  0x00007f57a16a266b in raise () from /lib64/libc.so.6
#1  0x00007f57a16a4381 in abort () from /lib64/libc.so.6
#2  0x00007f57a204bfed in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
#3  0x00007f57a2049c16 in __cxxabiv1::__terminate(void (*)()) () from /lib64/libstdc++.so.6
#4  0x00007f57a2049c61 in std::terminate() () from /lib64/libstdc++.so.6
#5  0x00007f57a204aa4f in __cxa_pure_virtual () from /lib64/libstdc++.so.6
#6  0x00007f57a35f8c61 in dgThread::dgThreadSystemCallback (threadData=0x1b03e78)
    at /home/hhyyrylainen/Projects/leviathan/ThirdParty/newton-dynamics/sdk/dgCore/dgThread.cpp:166
#7  0x00007f57a207614f in execute_native_thread_routine () from /lib64/libstdc++.so.6
#8  0x00007f57a234861b in start_thread () from /lib64/libpthread.so.0
#9  0x00007f57a178298f in clone () from /lib64/libc.so.6


This is the line in Newton that triggers the error:
Code: Select all
(gdb) frame 6
#6  0x00007f57a35f8c61 in dgThread::dgThreadSystemCallback (threadData=0x1b03e78)
    at /home/hhyyrylainen/Projects/leviathan/ThirdParty/newton-dynamics/sdk/dgCore/dgThread.cpp:166
166      me->Execute(me->m_id);
(gdb) list
161      dgSetPrecisionDouble precision;
162   
163      dgThread* const me = (dgThread*) threadData;
164   
165      dgInterlockedExchange(&me->m_threadRunning, 1);
166      me->Execute(me->m_id);
167      dgInterlockedExchange(&me->m_threadRunning, 0);
168      dgThreadYield();
169      return 0;
170   }


Which seems to be in the function "void* dgThread::dgThreadSystemCallback(void* threadData)"

Here's me printing some of the variables in that function
Code: Select all
(gdb) p me
$1 = (dgThread * const) 0x1b03e78
(gdb) p *me
$2 = (dgThread) {
  _vptr.dgThread = 0x7f57a391e5c0 <vtable for dgThread+16>,
  m_handle = {
    _M_id = {
      _M_thread = 140014192289536
    }
  },
  m_id = 1,
  m_terminate = 0,
  m_threadRunning = 1,
  m_name =     "newtonAsyncThread", '\000' <repeats 14 times>, "n"
}


I couldn't think of anything else to check so please ask if there's something I specifically should try to get out of the core file to help debug.
Boost113
 
Posts: 26
Joined: Thu Oct 23, 2014 3:24 am

Re: Crash by pure virtual method called when threads are ena

Postby Julio Jerez » Thu Mar 15, 2018 5:53 pm

the engine is a mutple ingeritance

Code: Select all
dgWorld::dgWorld(dgMemoryAllocator* const allocator)
   :dgBodyMasterList(allocator)
...
   ,dgMutexThread("newtonSyncThread", DG_MUTEX_THREAD_ID)
   ,dgAsyncThread("newtonAsyncThread", DG_ASYNC_THREAD_ID)



in has tow thread, ta are initialize by this funtion
void dgThread::Init ()
{
m_handle = std::thread(dgThreadSystemCallback, this);
dgThreadYield();
}


this is the offset tp the dgWord, it seem in Linux multiple inheritance does work as pours you, by that would be hard to believe.
set a breakpoint in the Init function and check that this is in fact the and offset into the dgworld, is so it soudl be the same in the tyhread callback
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Crash by pure virtual method called when threads are ena

Postby Julio Jerez » Thu Mar 15, 2018 5:56 pm

are you running the newton demos, or you own app?
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Crash by pure virtual method called when threads are ena

Postby Boost113 » Thu Mar 15, 2018 6:07 pm

I'm running my own app.

Stepping through the initialization I get
Code: Select all
(gdb) p this
$9 = (dgMutexThread * const) 0x9f47f8
and in dgThreadSystemCallback:
(gdb) p threadData
$10 = (void *) 0x9f47f8
(gdb) p me
$11 = (dgMutexThread * const) 0x9f47f8
(gdb) p *me
$12 = (dgMutexThread) {
  <dgThread> = {
    _vptr.dgThread = 0x7ffff77cf158 <vtable for dgMutexThread+16>,
    m_handle = {
      _M_id = {
        _M_thread = 140736976324352
      }
    },
    m_id = 0,
    m_terminate = 0,
    m_threadRunning = 0,
    m_name =       "newtonSyncThread", '\000' <repeats 15 times>
  },
  members of dgMutexThread:
  m_isBusy = 0,
  m_myMutex = {
    m_sem = {
      _M_cond = {
        __data = {
          {
            __wseq = 0,
            __wseq32 = {
              __low = 0,
              __high = 0
            }
          },
          {
            __g1_start = 0,
            __g1_start32 = {
              __low = 0,
              __high = 0
            }
          },
          __g_refs =             {0,
            0},
          __g_size =             {0,
            0},
          __g1_orig_size = 0,
          __wrefs = 0,
          __g_signals =             {0,
            0}
        },
        __size =           '\000' <repeats 47 times>,
        __align = 0
      }
    },
    m_mutex = {
      <std::__mutex_base> = {
        _M_mutex = {
          __data = {
            __lock = 0,
            __count = 0,
            __owner = 0,
            __nusers = 0,
            __kind = 0,
            __spins = 0,
            __elision = 0,
            __list = {
              __prev = 0x0,
              __next = 0x0
            }
          },
          __size =             '\000' <repeats 39 times>,
          __align = 0
        }
      }, <No data fields>},
    m_count = 0
  },
  m_callerMutex = {
    m_sem = {
      _M_cond = {
        __data = {
          {
            __wseq = 0,
            __wseq32 = {
              __low = 0,
              __high = 0
            }
          },
          {
            __g1_start = 0,
            __g1_start32 = {
              __low = 0,
              __high = 0
            }
          },
          __g_refs =             {0,
            0},
          __g_size =             {0,
            0},
          __g1_orig_size = 0,
          __wrefs = 0,
          __g_signals =             {0,
            0}
        },
        __size =           '\000' <repeats 47 times>,
        __align = 0
      }
    },
    m_mutex = {
      <std::__mutex_base> = {
        _M_mutex = {
          __data = {
            __lock = 0,
            __count = 0,
            __owner = 0,
            __nusers = 0,
            __kind = 0,
            __spins = 0,
            __elision = 0,
            __list = {
              __prev = 0x0,
              __next = 0x0
            }
          },
          __size =             '\000' <repeats 39 times>,
          __align = 0
        }
      }, <No data fields>},
    m_count = 0
  }
}


Which to me seems right. So I think that the object is already being destructed while the callback hasn't started yet or stopped. I'll try tweaking the "m_threadRunning" setting a bit.

Edit: Just running I got a crash where one thread is trying to call "dgInterlockedExchange(&me->m_threadRunning, 1);" and The main thread is running "dgThread::~dgThread" To me this strongly implies that the thread shutdown method doesn't work correctly. The m_threadRunning should be incremented before the thread is actually started.
Boost113
 
Posts: 26
Joined: Thu Oct 23, 2014 3:24 am

Re: Crash by pure virtual method called when threads are ena

Postby Boost113 » Thu Mar 15, 2018 6:26 pm

I think I got this fixed (I ran valgrind multiple times and the tests standalone hundreds of times and got 0 crashes).

Here's a pull request: https://github.com/MADEAPPS/newton-dynamics/pull/92
Boost113
 
Posts: 26
Joined: Thu Oct 23, 2014 3:24 am

Re: Crash by pure virtual method called when threads are ena

Postby Julio Jerez » Thu Mar 15, 2018 7:07 pm

ok I merged, and committed, can you tell what the problem so that we know what could be for future references?
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Crash by pure virtual method called when threads are ena

Postby Julio Jerez » Thu Mar 15, 2018 7:11 pm

Boost113 wrote:Edit: Just running I got a crash where one thread is trying to call "dgInterlockedExchange(&me->m_threadRunning, 1);" and The main thread is running "dgThread::~dgThread" To me this strongly implies that the thread shutdown method doesn't work correctly. The m_threadRunning should be incremented before the thread is actually started.

Oh I do remember I had similar problem there whic I resolt in adding waits liek yeild, but I beleive yor solution if the right one. Nice!!
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles


Return to Bugs and Fixes

Who is online

Users browsing this forum: No registered users and 5 guests

cron