crash in ndDynamicsUpdate::SortJoints

Report any bugs here and we'll post fixes

Moderators: Sascha Willems, Thomas

crash in ndDynamicsUpdate::SortJoints

Postby Bird » Sun Dec 27, 2020 12:16 pm

Hi Julio,

I'm getting some instability in the latest github version

I'm trying to make stacks of geometry instances and getting almost immediate crashes. I'm using Convex Hull collision shapes. The previous version of Newton4 that I was using was working fine.

Here's a video of what I'm trying to do
https://youtu.be/RYBMb5bciOU

And here's a release mode stack dump

******* STACKDUMP *******
stack dump [0] newton4\source\dNewton\ndDynamicsUpdate.cpp L: 271 ndDynamicsUpdate::SortJoints
stack dump [1] newton4\source\dNewton\ndDynamicsUpdate.cpp L: 70 ndDynamicsUpdate::Update
stack dump [2] newton4\source\dNewton\ndWorld.cpp L: 719 ndWorld::SubStepUpdate
stack dump [3] newton4\source\dNewton\ndWorld.cpp L: 647 ndWorld::ThreadFunction
stack dump [4] newton4\source\dCore\dThread.cpp L: 110 dThread::ThreadFunctionCallback
stack dump [5] C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29333\include\thread L: 44 std::thread::_Invoke<std::tuple<void (__cdecl dThread::*)(void),dThread *>,0,1>
stack dump [6] minkernel\crts\ucrt\src\appcrt\startup\thread.cpp L: 97 thread_start<unsigned int (__cdecl*)(void *),1>
stack dump [7] BaseThreadInitThunk
stack dump [8] RtlUserThreadStart
Bird
 
Posts: 623
Joined: Tue Nov 22, 2011 1:27 am

Re: crash in ndDynamicsUpdate::SortJoints

Postby Julio Jerez » Sun Dec 27, 2020 7:18 pm

it is hard to match the trace stack to the current code since I made other changes already.
this changes are for unifying special solvers, like avx2.

btw these solvers now are for the first time synchronization free, no atomics or mutexes other than the one needed to dispatch the jobs to the thread pool. For the first time I see a very significant difference in performance between sse and avx2.
It turns out Intel claims are true, but is really, really hard to get that extra performance from avx2.
you really need long vectors of contiguous operations to make up for the overhead of transposing data to make use of the 8 ways vector registers. Apply the same transformation to the SSE (only four ways) result on a net lost for short vector arrays or a marginal gain, no worth the effort for the extra complexity of the code.
for avx2 the gain are substantial when you get long enough vectors while the speed is still practical.
I wonder if it translate to avx512.

believe or not I have the avx2 solver about three time faster than the normal, maybe even more, but since it is only the avx2 joint solver, we see overall from 1.5 to 2.0 speed gain using avx2 solver and the best of all, it yield the exact same results.
these gains are possible because it pay the price for transposing ther input data, then there are a whole new set of operations that are very efficient in avx2 that are not available in sse. Stuff like fuse muladd, gather, scatter and blends operations that save a lot in tight loops.

In fact we could say the avx2 solver can be more acurate because solvers are all about matrix time vector operations, and when using fuse muladd operations the result is more acurate than a mul follow by an add, somehow the intermediate result preserve one extra bit accuracy in the mantiza.
not that is notizable but so far the solver in 4.0 seems to be the fastest and more accurate yet for newton.

anyway on the bug, it seems to be reproducible, it looks like is writing to an out of bound array.
could you sync and try to get it again. if possible try in debug maybe it happens there and we can get more info.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: crash in ndDynamicsUpdate::SortJoints

Postby Bird » Sun Dec 27, 2020 8:30 pm

Good news about better performance from avx2 now!

I synced again but still have a crashing. I experimented some more and I can reproduce the crash even when running Newton without any bodies in the scene in Debug mode. The crash happens when calling UpdateSkeletons() in SubStepUpdate.

Code: Select all
void ndWorld::SubStepUpdate(dFloat32 timestep)
{
   D_TRACKTIME();

   // do the a pre-physics step
   m_scene->m_lru = m_scene->m_lru + 1;
   m_scene->SetTimestep(timestep);

   UpdateSkeletons(); // crash
}



I am initializing Newton like this
Code: Select all
void NewtonState::initialze()
{
    int m_solverPasses = 4;
    int m_solverSubSteps = 2;
    int m_broadPhaseType = 0;
    int m_workerThreads = 1;
    int m_solverMode = 1; // avx2 ??

    world = NewtonWorld::create();
    world->SetSubSteps (m_solverSubSteps);
    world->SetSolverIterations (m_solverPasses);
    world->SetThreadCount (m_workerThreads);
    world->SelectSolver (m_solverMode);
}


I am compiling Newton as a static library in vs 2019 with Enhanced Instruction Set = Advanced Vector Extensions 2 (/arch:AVX2)
Bird
 
Posts: 623
Joined: Tue Nov 22, 2011 1:27 am

Re: crash in ndDynamicsUpdate::SortJoints

Postby Bird » Sun Dec 27, 2020 10:20 pm

Okay, it looks like Newton update() doesn't handle the case when there's no bodies in the scene.

Easy test is to comment out the code in ndBasicVehicle() demo so no bodies are added and it will crash on line 793 on ndWorld.cpp

Code: Select all
ndIslandMember* const islands = (ndIslandMember*)&solverUpdate.m_leftHandSide[0];



And it also looks like there's a problem in ndDynamicsUpdateAvx2::SortJoints() when there are no joints

You can hit this crash by switching the solver to avx2 and commenting out of all of the code in ndBasicVehicle () except BuildFloorBox(scene). Then Newton will crash on line 434 on ndDynamicsUpdateAvx2.cpp

Code: Select all
ndConstraint** const jointPtr = &jointArray[0];
Bird
 
Posts: 623
Joined: Tue Nov 22, 2011 1:27 am

Re: crash in ndDynamicsUpdate::SortJoints

Postby Julio Jerez » Sun Dec 27, 2020 10:47 pm

stand by, I think I found the bug. at least one bug.
when build avx there is a 6x6 matrix class used by the joint solver that still has some bugs.
I set the same option you are using and I get the assert.

I am fixing it now,
let us get that fix and let us try again.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: crash in ndDynamicsUpdate::SortJoints

Postby Julio Jerez » Sun Dec 27, 2020 11:37 pm

ok I think I fix then all now.
please try again an tell me is there are still more of those zero array bugs.

I could eassy make the array crash proof, by making it resize automatically like stl.
but that's the price we paid for high performance code.
At that point I would be better off using the normal stl library, but then I get the memory reallocation and the double check each time I dereference an element from the container.
Having the array resizing translate to a big difference in performance when iterating over a loop since cpp does not knows when the pointer to the head of the array changes so it and can't cache it to a register. Big, big different.

this is the only reason left I still use my template library over STL, it is many time faster and more economical in memory usage. I compared then recently to see if is was time to switch to stl, but STL still does not comes close to the newton versions in performance and allocator flexibility.

anyway please try again.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: crash in ndDynamicsUpdate::SortJoints

Postby Bird » Mon Dec 28, 2020 10:51 am

Zero array bugs are fixed but I'm still crashing in the simplest scene

I tried adding just 1 dynamic object and when the sim runs Newton crashes when exiting the ndWorld::UpdateSkeletons() function.

Here's a screen grab of VS 2019 at the crash. The top left where the exception is thrown is the end of UpdateSkeletons(). I'm using the Default solver instead of AVX2 solver
Attachments
NewtonCrash.jpg
NewtonCrash.jpg (246 KiB) Viewed 10311 times
Bird
 
Posts: 623
Joined: Tue Nov 22, 2011 1:27 am

Re: crash in ndDynamicsUpdate::SortJoints

Postby Julio Jerez » Mon Dec 28, 2020 11:36 am

you said you added one dynamic object and crashed in void ndWorld::UpdateSkeletons()
that can't be possible this is the function beginning
Code: Select all
void ndWorld::UpdateSkeletons()
{
   D_TRACKTIME();

   if (m_skeletonList.m_skelListIsDirty)
   {
...
...


it only enter when a joint was configuration was modifired.
so you must added a body with at least one joint since the alloca a function is inside the if bracket.
can you recreate the crash in the sandbox demo?

set a break point at line 748 see if is enter there after you add the body.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: crash in ndDynamicsUpdate::SortJoints

Postby Julio Jerez » Mon Dec 28, 2020 11:48 am

for precaution I added some extra space to alloca, 256 extra entries instead of just 1.
I tried few times with vs2019 and I did not get any crash.
please sync again.

in fact that break point should hit only when the world is created and each time joints are changed, added or removed. and in the case of the sandbox demos, the world is created few times, so you get few hits before you get to the scene, but after that, only joint changes can make it execute that routine.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: crash in ndDynamicsUpdate::SortJoints

Postby Bird » Mon Dec 28, 2020 2:29 pm

I can't get the Newton SDK demos to crash. I modified the ndBasicTrigger.cpp demo to

Code: Select all
void ndBasicTrigger (ndDemoEntityManager* const scene)
{
   AddSphere(scene, dVector(0.0f, 0.0f, 0.0f, 1.0f));
   
   dQuaternion rot;
   dVector origin(-40.0f, 5.0f, 0.0f, 0.0f);
   scene->SetCameraMatrix(rot, origin);
}


And that reaches line 748 because it looks like m_skeletonList.m_skelListIsDirty is true by default. But, unlike in my project, the demo exits the UpdateSkeletons() function cleanly and the simulation works as expected.


My project still crashes when exiting the UpdateSkeletons() function. Here's my body creation function

Code: Select all
void NewtonBodyHandler::addBody (RenderableNode& node)
{
    ndShape* const shape = ops->createCollisionShape (node);

    ndShapeInstance shapeInst (shape);

    Scale s = node->getSpaceTime().scale;
    dVector scale = dVector (s.x(), s.y(), s.z(), 0.0f);
    shapeInst.SetScale (scale);

    ndBodyDynamic* const body = new ndBodyDynamic();
    body->SetCollisionShape (shapeInst);
    body->SetMassMatrix (node->description().mass, shapeInst);
    node->setUserData (body);
    body->SetGyroMode (false);

    // Newton takes ownership of the NewtonCallback object
    body->SetNotifyCallback (new NewtonCallbacks (node));

    dMatrix startPose;
    eigenToNewton (node->getSpaceTime().worldTransform, startPose);
    body->SetMatrix (startPose);

    state->world->AddBody (body);
}
Bird
 
Posts: 623
Joined: Tue Nov 22, 2011 1:27 am

Re: crash in ndDynamicsUpdate::SortJoints

Postby Julio Jerez » Mon Dec 28, 2020 3:04 pm

I pasted your test in demo ../ndSandbox\demos\ndBasicStacks.cpp
but does not crashes.

I did this, the skeleton list manager does not need to be initialized to true on creation since there aren't any joint at that point. I set the Boolean to false, so it should never hit that line.
this will let it pass that point or maybe can generate different info about that crash.

please try again, see what you get.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: crash in ndDynamicsUpdate::SortJoints

Postby Bird » Mon Dec 28, 2020 3:51 pm

Definitely more stable. The simplest scene with 1 dynamic body does not crash anymore.

But if I run the original simulation with lots of geometric instances in a pile, it will eventually crash in Debug mode by hitting the assert on line 250 of ndDynamicsUpdate::SortJoints() in ndDynamicsUpdate.cpp. If I run in Release mode it crashes in the same function.

One thing I found is that I don't get a crash if I disable my ndBodyNotify::OnApplyExternalForce ()

Code: Select all
void NewtonCallbacks::OnApplyExternalForce (dInt32 threadIndex, dFloat32 timestep)
{
   // return here and no crashing
   // return

    ndBodyDynamic* const dynamicBody = GetBody()->GetAsBodyDynamic();
    if (dynamicBody)
    {
        dVector massMatrix (dynamicBody->GetMassMatrix());
        dVector force (dVector (0.0f, -10.0f, 0.0f, 0.0f).Scale (massMatrix.m_w));
        dynamicBody->SetForce (force);
    }
}
Bird
 
Posts: 623
Joined: Tue Nov 22, 2011 1:27 am

Re: crash in ndDynamicsUpdate::SortJoints

Postby Julio Jerez » Mon Dec 28, 2020 4:22 pm

Bird wrote:it will eventually crash in Debug mode by hitting the assert on line 250 of ndDynamicsUpdate::SortJoints() in ndDynamicsUpdate.cpp. If I run in Release mode it crashes in the same function.


that really bad and will definitlly cause a crash, sin it will write to an outside bound array.
but that should not happens.
it is this code
Code: Select all
      const dInt32 rowKey = (1 << D_RADIX_BITS) - joint->m_rowCount;
      const dInt32 restingKey = resting << (D_RADIX_BITS + 1);
      const dInt32 key = restingKey + rowKey;
      dAssert(key >= 0);
      dAssert(key < sizeof(jointCountSpans) / sizeof(jointCountSpans[0]));


when you run it against and is stop in the assert, see if you can inspect variables

joint->m_rowCount,
resting
restingKey
key

joint->m_rowCount should be between 0 and 32.
oh think I now see what could be wrong, you are using convex hulls, and maybe they are generation a large number of contacts creation a out of bound key.

I will change the key to be a struct and a union so that It can be more robust.
but please do that test anyway something else may be wrong.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: crash in ndDynamicsUpdate::SortJoints

Postby Bird » Mon Dec 28, 2020 4:44 pm

Okay, here are the variables when the assert is triggered

joint->m_rowCount = 33
resting = 0
restingKey = 0
key = -1

I tried using Spheres instead of ConvexHulls and could not get a crash
Bird
 
Posts: 623
Joined: Tue Nov 22, 2011 1:27 am

Re: crash in ndDynamicsUpdate::SortJoints

Postby Julio Jerez » Mon Dec 28, 2020 5:02 pm

bingo that's the bug.
I was stupid, when making the code for calculating the sort key for a counting sort, I made a two digits key code. the upper digit is one bit and the lower digit is 6 bits

six bit can hold up to 64 values which is enough for the max number of rows. but then in code I used 5 bits to calculate the value as (1<<5) - joint->row
this can generate a negative key value if the number of rows is larger than 32.

the right calculation is (1<<6) - joint->row-1 so that's 63 - row
and that never wrap around.
I fixed it, so we should not get that bug anymore.

please sync and try again, let us see if ther are more bugs.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Next

Return to Bugs and Fixes

Who is online

Users browsing this forum: No registered users and 18 guests

cron