Segfault when rapidly creating/destroying worlds

Report any bugs here and we'll post fixes

Moderators: Sascha Willems, Thomas

Segfault when rapidly creating/destroying worlds

Postby oliver » Mon Feb 20, 2017 5:59 am

Hi,

Newton often (not always) segfaults when I create and destroy worlds in rapid succession. Here is an extreme example that always triggers the bug for me:

Code: Select all
#include "Newton.h"

int main (void) {
  for (int ii=0; ii < 10000; ii++) NewtonDestroy(NewtonCreate());
  return 0;
}


The problem usually disappears if I sleep for >20ms between creation and deletion.

My use case for such short lived worlds are a series of unit tests that rapidly create and destroy worlds. The 20ms workaround is not a big deal for me, but I thought I mention it.
oliver
 
Posts: 36
Joined: Sat Sep 17, 2016 10:31 am

Re: Segfault when rapidly creating/destroying worlds

Postby Julio Jerez » Mon Feb 20, 2017 8:02 am

I just pated that code in the demo sand box file: ..\sdkDemos\DemoEntityManager.cpp
line 134
Code: Select all
/*
for (int ii = 0; ii < 10000; ii++) {
   NewtonDestroy(NewtonCreate());
}
*/


and other than taking a long time to run, the loop is does not crashes.
this is in window, I suppose you are testing in Linux, my guess this is a problem with thread creation and timing in Linux. Threads in Linux are far more heavyweight than they are in windows but I am using the standard thread interface of cpp 11.

In any case make a work take more that 20 ms, because newton create tow threads, one to run synchronous and one to run asynchronous.

do you really need that many worlds?
why do you make a pool of worlds and simple keep the around, just clear them and re use them.
Julio Jerez
Moderator
Moderator
 
Posts: 11048
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Segfault when rapidly creating/destroying worlds

Postby oliver » Mon Feb 20, 2017 8:12 am

Yes, I am on Linux (Ubuntu 16.04).

I do not need that many worlds, but I have a few dozen tests and it is cleaner if they all start from scratch. Usually they run without a hiccup, but sometimes they segfault on teardown.

Pity the bug does not materialise on your machine. Anyway, thank you for looking into it.
oliver
 
Posts: 36
Joined: Sat Sep 17, 2016 10:31 am

Re: Segfault when rapidly creating/destroying worlds

Postby Julio Jerez » Mon Feb 20, 2017 8:29 am

maybe it crashes for a different reason than a thread issue.
try compiling with this option DG_USE_THREAD_EMULATION

this will emulate the threads and when It crashes maybe you can get the line where it did.
if it does not crashes then we know is the thread implementation that maybe needs a wait function after a thread object is ready.
I suspect that's the problem because on initialization newton does not wait for any on the thread objects to be created since they are not expected to be used right away.
Julio Jerez
Moderator
Moderator
 
Posts: 11048
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Segfault when rapidly creating/destroying worlds

Postby oliver » Mon Feb 20, 2017 6:45 pm

The crash disappears with DG_USE_THREAD_EMULATION. What are the practical disadvantages of this option?

and other than taking a long time to run, the loop is does not crashes.


It only takes ~1s to create/destroy 10k worlds. Is that what you mean by "long"?

I also linked the program against a debug build of libNewton and run it under GDB. After creating/deleting ~2k worlds it crashed. This is the back trace:

Code: Select all
#0  0x00007ffff6f85428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff6f8702a in __GI_abort () at abort.c:89
#2  0x00007ffff78c784d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff78c56b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff78c5701 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00000000004379c7 in std::thread::~thread (this=0x79e058, __in_chrg=<optimised out>) at /usr/include/c++/5/thread:151
#6  0x0000000000437662 in dgThread::~dgThread (this=0x79e050, __in_chrg=<optimised out>) at /home/oliver/projects/ds2/dependencies/newton-dynamics/coreLibrary_300/source/core/dgThread.cpp:130
#7  0x000000000052d4da in dgAsyncThread::~dgAsyncThread (this=0x79e050, __in_chrg=<optimised out>) at /home/oliver/projects/ds2/dependencies/newton-dynamics/coreLibrary_300/source/core/dgAsyncThread.cpp:35
#8  0x000000000044a125 in dgWorld::~dgWorld (this=0x79de40, __in_chrg=<optimised out>) at /home/oliver/projects/ds2/dependencies/newton-dynamics/coreLibrary_300/source/physics/dgWorld.cpp:315
#9  0x000000000050a551 in Newton::~Newton (this=0x79de40, __in_chrg=<optimised out>) at /home/oliver/projects/ds2/dependencies/newton-dynamics/coreLibrary_300/source/newton/NewtonClass.cpp:44 
#10 0x000000000050a5e6 in Newton::~Newton (this=0x79de40, __in_chrg=<optimised out>) at /home/oliver/projects/ds2/dependencies/newton-dynamics/coreLibrary_300/source/newton/NewtonClass.cpp:49
#11 0x0000000000403acb in NewtonDestroy (newtonWorld=0x79de40) at /home/oliver/projects/ds2/dependencies/newton-dynamics/coreLibrary_300/source/newton/Newton.cpp:205
#12 0x000000000040389b in main () at bug_dealloc.cpp:4


From what I can glean, the thread object does not exist when you trigger its dtor, but I am not familiar enough with Newton's threading model to debug this. I will defer to the master :)

Btw, I am on commit #b4590e3 (around 4 weeks old), in case that makes a difference.
oliver
 
Posts: 36
Joined: Sat Sep 17, 2016 10:31 am

Re: Segfault when rapidly creating/destroying worlds

Postby Julio Jerez » Mon Feb 20, 2017 7:11 pm

Is a timing issue, I added I added a thread yield after a thread is closed. That you fixed.
The disadvantage of running with DG_USE_THREAD_EMULATION is that the engine run on the thread that created the engine, and there will be not multithreaded functionality.
Julio Jerez
Moderator
Moderator
 
Posts: 11048
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Segfault when rapidly creating/destroying worlds

Postby oliver » Mon Feb 20, 2017 7:38 pm

I have merged the dgThreadYield() calls from your last commit into my build, but the problem persists.

Just to be sure, I also called it 100x in a loop (instead of of just twice), and also added a usleep(10000) in between. It still breaks in the same way on the same line :(
oliver
 
Posts: 36
Joined: Sat Sep 17, 2016 10:31 am

Re: Segfault when rapidly creating/destroying worlds

Postby Julio Jerez » Mon Feb 20, 2017 8:38 pm

It look like in Linux the function
inline void thread::join() do no set the thread handle to null and the the destructor try to destroy the thread twice.

Code: Select all
4  0x00007ffff78c5701 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00000000004379c7 in std::thread::~thread (this=0x79e058, __in_chrg=<optimised out>) at /usr/include/c++/5/thread:151


I see if I can change so that the destruction happen in the destructor, but I do no know how tat will work, but that is not so easy to change because the thread are create in one order and must destruct in the same order, but destructor are call in the reverse order in which the are created

there has to be a way to set the thread handle to NULL, but I do no see how in c++11.
Julio Jerez
Moderator
Moderator
 
Posts: 11048
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Segfault when rapidly creating/destroying worlds

Postby Julio Jerez » Mon Feb 20, 2017 8:40 pm

can you please change the function void dgThread::Close () to this?
it seem that a empty contructor does set the thread handle to NULL.

Code: Select all
void dgThread::Close ()
{
   m_handle.join();
   m_handle = std::thread();
}
 
Julio Jerez
Moderator
Moderator
 
Posts: 11048
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Segfault when rapidly creating/destroying worlds

Postby oliver » Mon Feb 20, 2017 8:46 pm

Nope, still breaks.
oliver
 
Posts: 36
Joined: Sat Sep 17, 2016 10:31 am

Re: Segfault when rapidly creating/destroying worlds

Postby oliver » Tue Feb 21, 2017 1:21 am

I just tried with the latest code, but it still breaks.

By the way, the latest Newton does not compile on Linux. PR to fix it is on GH - please review and merge.

Furthermore, the sandbox compiles but does not link. It misses symbols related to a TIFF library, I think, no idea yet. This has been broken for a while now. I guess not many people use Newton on Linux :)
oliver
 
Posts: 36
Joined: Sat Sep 17, 2016 10:31 am

Re: Segfault when rapidly creating/destroying worlds

Postby Julio Jerez » Tue Feb 21, 2017 10:02 am

I believe is a timing problem, you loop some how call to destroy he thread before it has time to be initialize, maybe the problem is not in the destroy function but in the create, try adding this
Code: Select all
void dgThread::Init (dgInt32 stacksize)
{
   m_handle = std::thread(dgThreadSystemCallback, this);
   dgThreadYield();
}


I can not build Linux at the moment, but in Linux all libraries are supposed to be native to the system.
next week I will see if I can build Linux on my laptop.
Julio Jerez
Moderator
Moderator
 
Posts: 11048
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Segfault when rapidly creating/destroying worlds

Postby d.l.i.w » Tue Feb 21, 2017 3:27 pm

oliver wrote:more, the sandbox compiles but does not link. It misses symbols related to a TIFF library, I think, no idea yet. This has been broken for a while now. I guess not many people use Newton on Linux :)


With your fix Newton does compile on Linux and at least for me the sandbox compiles, links and runs just fine.
Are you using CMake? If so it really should work for you, if not - the makefiles are very outdated indeed.

I use Newton on Linux only and try to provide patches for Linux and CMake, if things break and if my time allows it.
d.l.i.w
 
Posts: 81
Joined: Mon Sep 26, 2011 4:35 am

Re: Segfault when rapidly creating/destroying worlds

Postby oliver » Tue Feb 21, 2017 6:35 pm

Are you using CMake? If so it really should work for you, if not - the makefiles are very outdated indeed.

Yes, I use CMake. Upon closer inspection, it is not Newton that misses symbols, but, strangely enough, one of my system libraries (libwx_gtk2u_core-3.0.so). I will have another look later. I assume you are not using Ubuntu 16.04?

Can you run the tiny Newton program from my first post on this thread and tell us if it segfaults on your system, please?

@julio: the new Init method also did not fix it :( I also tried with USE_PTHREADS, but no change.
oliver
 
Posts: 36
Joined: Sat Sep 17, 2016 10:31 am

Re: Segfault when rapidly creating/destroying worlds

Postby oliver » Tue Feb 21, 2017 7:01 pm

I have figured out the missing symbols problem. It was entirely due to my virtualenv I use to develop. The sandbox compiles fine outside of it. My apologies for the confusion.
oliver
 
Posts: 36
Joined: Sat Sep 17, 2016 10:31 am

Next

Return to Bugs and Fixes

Who is online

Users browsing this forum: No registered users and 1 guest

cron