Multiple Floating Point Traps

by **Esharc** » Mon Sep 09, 2019 8:58 am

Good morning/Afternoon,

I recently update our version of Newton to the latest release. The release that we were using was very old (last updated in 2017 I believe).

But now for some reason that I am not sure why, I am getting a "Multiple floating point traps" assertion in dgMatrix.cpp at line 705: "dgFloat64 den = dgFloat64(1.0f) / tmp[i][i];" I am not sure what I am doing wrong or what has changed here. But the vehicle that I am testing on is on of our old vehicle that worked fine before updating.

The vehicle is a multi body vehicle that is attached using hinge joints amongst others. It seem to happen when two bodies collide with each other that are not connected via a joint. But this does not make sense to me because I have custom collision callback functions to ignore collisions in the same vehicle .

Will you be able to shed some light for me please.

by **Julio Jerez** » Mon Sep 09, 2019 3:43 pm

some how a singular matrix is being generated.
can you sow the call stack?

a lot of work was added to test for singular matrix, in the last few month.
if you have an older version it probably did not have that check.
I try to add the robustness of newton 1.5 and abandon the iterative solution
but this is not so essay, however these changes are all for the better.
the engine now solve all bilateral joints exactly.
must the work is focus on correctness, so blown up is possibility, and the client side has to be more careful in no creating unsolvable loops that yield singular matrices.
this is why there is a check for so condition number is you look at line 665,
somehow is pass that test, so sure why.

it is possible there is a bug but without a retrod is hard to say
it is possible that I can check some retro test.

by **Esharc** » Tue Sep 10, 2019 7:09 am

Thank you for the quick reply. While I was trying to get a call stack so that I could past it here, you seemed to have directed me to where the issues was lying and I was able to resolve it.

The problem was with our custom hinge and slider joints that we make. We were still using the old SubmitConstraintsFreeDof function to handle and motor that we want to manipulate the joints. But we also handled limit of joint functionality here.

What I did not notice is that when I upgraded Newton to the latest release you added code to handle limits of the joints as well, so there were too many rows being added which ended up causing the crash. When I removed our custom joint limit code and used just yours all my problems went away.

by **Julio Jerez** » Tue Sep 10, 2019 11:32 am

Oh good, one question.
Are synced to the current head revision on GitHub?
Try that if you haven't, there are lots of fixes and improvements that deal with solver stability and accuracy and collision quality.

by **Esharc** » Wed Sep 11, 2019 12:18 am

I will sync and let you know

by **Esharc** » Wed Sep 11, 2019 2:40 am

I have updated to the latest head revision and everything looks really good.

There have been a lot of improvements since we last updated in 2017. Everything is just so much more stable.

Thanks for all the hard work

by **Lax** » Wed Nov 13, 2019 3:53 pm

Hi,

I'm using the newest newton version from gitlab and getting "Multiple Floating Point Traps".
In my scenario, I'm using several joint connections: ballandsocket -> hinge -> hinge

The error occurs in dbBroadphase. Stacktrace:

Code: Select all: > newton.dll!dgBroadPhaseNode::SetAABB(const dgVector & minBox, const dgVector & maxBox) Zeile 96 C++ newton.dll!dgBroadPhase::UpdateBody(dgBody * const body, int) Zeile 740 C++ newton.dll!dgBody::UpdateCollisionMatrix(float timestep, int) Zeile 264 C++ newton.dll!dgWorldDynamicUpdate::IntegrateVelocity(const dgBodyCluster * const cluster, float timestep, float) Zeile 606 C++ newton.dll!dgWorldDynamicUpdate::ResolveClusterForces(dgBodyCluster * const cluster, int threadID, float timestep) Zeile 912 C++ newton.dll!dgWorldDynamicUpdate::CalculateClusterReactionForcesKernel(void * const context, void * const worldContext, int threadID) Zeile 435 C++ [Inlineframe] newton.dll!dgThreadHive::dgWorkerThread::RunNextJobInQueue(int) Zeile 224 C++ [Inlineframe] newton.dll!dgThreadHive::dgWorkerThread::ConcurrentWork(int) Zeile 242 C++ newton.dll!dgThreadHive::dgWorkerThread::Execute(int threadId) Zeile 259 C++ newton.dll!dgThread::dgThreadSystemCallback(void * threadData) Zeile 202 C++ [Externer Code] [Inlineframe] newton.dll!invoke_thread_procedure(unsigned int(__stdcall*)(void *)) Zeile 91 C++ newton.dll!thread_start<unsigned int (__stdcall*)(void *)>(void * const parameter) Zeile 115 C++ [Externer Code] [Die unten aufgeführten Frames sind möglicherweise nicht korrekt und/oder fehlen, für "kernel32.dll" wurden keine Symbole geladen.] Unbekannt

Line 96:
m_minBox = p0.Floor() * m_broadInvPhaseScale;
// m_broadInvPhaseScale = 0.125

but p0 has really weird values:
p0 = {m_f= {8.95627014e+13, -8.55390080e+12, -1.57280402e+13, 0.000000000} m_i= {1453517282, -721882233, ...} ...}

Its always the same place, it crashes.

Best Regards
Lax

by **Julio Jerez** » Wed Nov 13, 2019 4:08 pm

I made a recent improvement on the joint solver that can cause that.
Do not panic, I ask few people to check it out.
I pass my test but test this may need some running.
The change is if def in cause of this, but I am not in front of the code to tell you how.

Basically the new change is in two parts.
1 all kinematic loop row are now solved exactly.
2 the solver uses an LDLt factorization instead of cholesky.

This make the solver in most cases faster and more robust, but is much less tolerate to I'll firm system.

If you want, I can use you for calibrationg the new system, I was looking for a repro case that can reproduce a fail test. In general the SDK demos are week set do is not easy to recreate wah user do.
But this is a very, very good improve my worth pursuing.

by **Julio Jerez** » Wed Nov 13, 2019 4:13 pm

My suspicions is that the LDLt factorization, is the problem, because as opposed to cholesky we us what I usually use, it does not required regularization. But it can factor into a matrices that are very ill formed.

There are few ways to go about fixing it, but I need test cases, which are hard to get.

by **Julio Jerez** » Wed Nov 13, 2019 6:22 pm

Ok the first test you can try an see if the problem goes away is this on file
...\newton-dynamics\sdk\dgPhysics\dgSkeletonContainer.cpp

find this line and uncommon the define.
let us see if that solve if no there is another check that can be set, but we can start from the top first.
// Cholesky can not be used when the matrix loses the PSD status because of round off of a stiff system
//#define USE_CHOLESKY

by **Julio Jerez** » Wed Nov 13, 2019 7:17 pm

I had try many time LDLt and LU factorization of medium to large size matrix system and I always kicked me is my ass, I am not sure why I though it will work work.
so I set back to Use Cholesky.

all you need to do is think and try again.

by **Lax** » Thu Nov 14, 2019 12:14 pm

Hi,

so you mean, that you gave up the new approach and i can download newst version from gitlab?

Or do yoe still want to test my scenario?

by **Julio Jerez** » Thu Nov 14, 2019 12:37 pm

Not I did not give up, the optimizations are still there.
What I am not doing is solving a linear system using LDLT factorization because this method aways give and answer wether the system is a valid psd or not.
With Cholesky is far, far more stable but will fail on illigal systems.
When it fail is possible to make a pseudo system that find an approximation to the problem.

In general an I'll form system is the result of some bad joint arrangement, but I can't keep polycing the people telling them how to do their stuff, when if I have a way around.

Any way the SDK that is committed now show work.
And has the optimizations.

Please try sync again, it should work.
What will happens is that if you have a bad joint configuration it will get worse, but viable configuration will be strong and faster.

My guess is that you has a bad joint configuration that is masked by poop converge of the solver.
That's what need to be investigated.

The optimizations is most beneficial connecting joint that for kinematic loop, like I believe you are by what you said.

by **Lax** » Wed Nov 20, 2019 4:40 pm

Hi Julio,

with the newst newton version, my ragdoll now does work better, at least in release mode. In debug mode dgAssert will throw with nan numbers. So something is really strange. I can switch to ragdoll state, as long as the player is on ground. But if the player is set to ragdoll state and also falling down. I get nan values at the end from newton... See the attached video:

http://www.lukas-kalinowski.com/Homepag ... lIssue.mp4

what could be causing this?
Would It help, if I would create a debug mode package executables for you?

What also does not work is setting:

Code: Select all: NewtonJointSetCollisionState

to true, so that the joints will collide with each other in the ragdoll. As soon as I set the collision state to true, and joints are created, the player will vanish immediately because of nan values.

Best Regards
Lax

by **Lax** » Wed Nov 20, 2019 5:12 pm

oh ok... I was because in the movecallback I calculated clamped omega:

Code: Select all: Ogre::Vector3 omega = body->getOmega(); Ogre::Real mag2 = omega.dotProduct(omega); if (mag2 > (50.0f * 50.0f)) { omega = omega.normalise() * 50.0f; body->setOmega(omega); }

This caused corrupt values when player ragdoll was falling :oops:

Now it does work!

Multiple Floating Point Traps

Multiple Floating Point Traps

Re: Multiple Floating Point Traps

Re: Multiple Floating Point Traps

Re: Multiple Floating Point Traps

Re: Multiple Floating Point Traps

Re: Multiple Floating Point Traps

Re: Multiple Floating Point Traps

Re: Multiple Floating Point Traps

Re: Multiple Floating Point Traps

Re: Multiple Floating Point Traps

Re: Multiple Floating Point Traps

Re: Multiple Floating Point Traps

Re: Multiple Floating Point Traps

Re: Multiple Floating Point Traps

Re: Multiple Floating Point Traps

Who is online