**Moderators:** Sascha Willems, Thomas

42 posts
• Page **1** of **3** • **1**, 2, 3

Hi Julio,

I am running some of my tests and there is one test that has 25 bodies all interconnected with dCustomFixDistance constraints. So 25^2 joints. It used to be the engine was handling it, but now it goes unstable.

when I lower the body count to say 10 it becomes stable.

I'm using the AVX plugin and I'm on commit 879dee60bff1d6190faad3aef9ba0ba57aa118ab.

Also I'm excited to try out the vulkan plugin you've been working on. This game I'm working on will really push things to the limit I think.

Thanks,

Trevor

I am running some of my tests and there is one test that has 25 bodies all interconnected with dCustomFixDistance constraints. So 25^2 joints. It used to be the engine was handling it, but now it goes unstable.

when I lower the body count to say 10 it becomes stable.

I'm using the AVX plugin and I'm on commit 879dee60bff1d6190faad3aef9ba0ba57aa118ab.

Also I'm excited to try out the vulkan plugin you've been working on. This game I'm working on will really push things to the limit I think.

Thanks,

Trevor

- MeltingPlastic
**Posts:**221**Joined:**Fri Feb 07, 2014 11:30 pm

when you say 25^2, do you mean 25 joint each connected to ever other joint?

yes I am slowly trying to write a GPU solver, I'd decided to use Vulkan which seems the only GPU api that is Vendor agnostic. OpenGl is support to but OpenGl will not work on library if you not create a

GlContext and OpenCl is a mess.

On the joint solver, I am working out the math to make a parallel version that integrates well with the parallel solver. I have the math all worked out already.

Maybe this is a good test case to implemented it.

My Guess is that if 25^2 means 625 joints, the joint solver has an arbitrary limit which is the number of intermediate variables that can be placed on the stack on one core, which is 1 meg in VS, if it has has more, it does not uses that solver.

There are also some other issues with singular matrix with a connectivity like that, that a direct solver will not be capable of dealing with because too many close loops.

Please tell me if this is the case.

yes I am slowly trying to write a GPU solver, I'd decided to use Vulkan which seems the only GPU api that is Vendor agnostic. OpenGl is support to but OpenGl will not work on library if you not create a

GlContext and OpenCl is a mess.

On the joint solver, I am working out the math to make a parallel version that integrates well with the parallel solver. I have the math all worked out already.

Maybe this is a good test case to implemented it.

My Guess is that if 25^2 means 625 joints, the joint solver has an arbitrary limit which is the number of intermediate variables that can be placed on the stack on one core, which is 1 meg in VS, if it has has more, it does not uses that solver.

There are also some other issues with singular matrix with a connectivity like that, that a direct solver will not be capable of dealing with because too many close loops.

Please tell me if this is the case.

- Julio Jerez
- Moderator
**Posts:**11124**Joined:**Sun Sep 14, 2003 2:18 pm**Location:**Los Angeles

Sorry in such a rush to post I did not get my math right (embarrassing) it is a complete graph.

There are n bodies. so (n(n-1))/2 joints. so for n = 25, there are 300 joints. Sounds like a good test to me!

There are n bodies. so (n(n-1))/2 joints. so for n = 25, there are 300 joints. Sounds like a good test to me!

- MeltingPlastic
**Posts:**221**Joined:**Fri Feb 07, 2014 11:30 pm

300 I think will still be on the range tat will over flow the stack, but will also have some problem with loops take for example the case of say for joints

a, b, c, d

that yield 4 * 3 /2 = 6 joints, ab, ac, ad, bc, bd, cd

if you arrange the as two sets: ab, ac, bc and bc, bd, cd

you will see that there are two triangles and each form a close loop.

the algorithm doe break the loop by breaking two joints.

of the top of my head if the connectivity is as dense as you say n * (n - 1) /2

the mean there n * (n - 1) / 2 - n loops.

basically 25 joint are acyclic and about 325 are closet loops.

The well definitely fail because is a bad design, In three only 25 joint are sufficient to mating the rigidity of the contractions.

the reason I say this is because if you do a Spawning tree, then that tree will only have 24 joints

every thong else is loop.

the can be fix quite this is eassy, I can probable give you a function that you can use to prune your structure.

do you understand what I am trying to say?

a, b, c, d

that yield 4 * 3 /2 = 6 joints, ab, ac, ad, bc, bd, cd

if you arrange the as two sets: ab, ac, bc and bc, bd, cd

you will see that there are two triangles and each form a close loop.

the algorithm doe break the loop by breaking two joints.

of the top of my head if the connectivity is as dense as you say n * (n - 1) /2

the mean there n * (n - 1) / 2 - n loops.

basically 25 joint are acyclic and about 325 are closet loops.

The well definitely fail because is a bad design, In three only 25 joint are sufficient to mating the rigidity of the contractions.

the reason I say this is because if you do a Spawning tree, then that tree will only have 24 joints

every thong else is loop.

the can be fix quite this is eassy, I can probable give you a function that you can use to prune your structure.

do you understand what I am trying to say?

- Julio Jerez
- Moderator
**Posts:**11124**Joined:**Sun Sep 14, 2003 2:18 pm**Location:**Los Angeles

MeltingPlastic wrote:Hi Julio,

I am running some of my tests and there is one test that has 25 bodies all interconnected with dCustomFixDistance constraints. So 25^2 joints. It used to be the engine was handling it, but now it goes unstable.

You this was working at some point, and it does not?

For you have a test?

Can you make a video to see what we are talking about?

- Julio Jerez
- Moderator
**Posts:**11124**Joined:**Sun Sep 14, 2003 2:18 pm**Location:**Los Angeles

Thanks Julio,

Yep what you are saying makes sense.

I agree its not an optimal structure. But it did work before just fyi. (I'll try to get a before and after commit)

I am trying to managing these bad designs in the game automatically. I am almost finished getting a system that automatically combines pieces into single compound shaped rigid bodies instead of running all the bodies individual through newton. ( I have to manage some things like approximately conserving momentum before and after the merge so It seems transparent that anything happens to the player)

This combining of bodies should help alot in this regard. But it will still be possible that players will make alot of hinge joints in loops or graphs.

Can newton break big joint graphs into smaller graphs and then use "Loose or Inaccurate" joints to fuse the groups of joints?

I'll get a video going when I have some time.

Also unrelated - the header documentation on NewtonBodyGetInertiaMatrix() is confusing to me, how is the 3x3 matrix formatted? will I actually get the full inertia matrix or just the simple matrix?

Yep what you are saying makes sense.

I agree its not an optimal structure. But it did work before just fyi. (I'll try to get a before and after commit)

I am trying to managing these bad designs in the game automatically. I am almost finished getting a system that automatically combines pieces into single compound shaped rigid bodies instead of running all the bodies individual through newton. ( I have to manage some things like approximately conserving momentum before and after the merge so It seems transparent that anything happens to the player)

This combining of bodies should help alot in this regard. But it will still be possible that players will make alot of hinge joints in loops or graphs.

Can newton break big joint graphs into smaller graphs and then use "Loose or Inaccurate" joints to fuse the groups of joints?

I'll get a video going when I have some time.

Also unrelated - the header documentation on NewtonBodyGetInertiaMatrix() is confusing to me, how is the 3x3 matrix formatted? will I actually get the full inertia matrix or just the simple matrix?

- MeltingPlastic
**Posts:**221**Joined:**Fri Feb 07, 2014 11:30 pm

HI Julio - I use realized this is not as big an issue as I thought - the joints were all FixedDistance, so it was not supposed to be fully rigid in the first place. here is a video with 10 bodies:

https://youtu.be/3ElbGEzUkms

https://youtu.be/3ElbGEzUkms

- MeltingPlastic
**Posts:**221**Joined:**Fri Feb 07, 2014 11:30 pm

ok in reverse order.

Inertia matrices are the first 3x3 element of a 4 x4 matrix, is does not matrix what major because inertia matrices are symmetric and positive definitive.

the videos is really cool, the interesting part is to see whet I did that broke the large test.

I am guessing I probably tweaked the LCP solver that solve the joint that break the close loop.

on eh factorization thing, the formulation of connotation system is a full cyclic graph.

any graphs has a corroding matrix formulations.

the trick to solve the linear system if to apply rows and column permutation try to make so the matrix form a known factorization algorithm.

for example a typical case, take for example a rope, if you organize the joint one after the other the resulting matrix is tridiagonal, such system can be solved in linear time with linear storage.

a three hierarchical case, each node can only have one parent, can be permutated into a

Hessenberg matrix. These systems can be factored it lower and upper triangular matrices than can be solve in linear time. with super linear storage using space matrix techniques.

These system do not have loops.

when a system has loops. what Newton does is that if factor the matrix in four sub matrices where the first quadrant is a Hessenberg matrix, and the other three are the full dense matrices that need to be solve by some direct method using matrix partition techniques.

https://en.wikipedia.org/wiki/Block_matrix

The trick is to factor permutated the big matrix into sub matrices that the off diagonal matrix has the most number of zeros and the diagonal block matrixes and the non zeros.

The method newton uses capitalize on the number of close loop in the graph will be small, therefore when the big matrix is permutated, the first block will be the largest, the off diagonal will have zeros, and the lower diagonal will be dense that will need to be solved by a direct or iterative method. is this part that is expensive and memory consuming and can be inaccurate if using and iterative method.

But the answer is yes any graph can be partitioned, the secrets is knowing how to partitioning to your convenience.

I use Spawning tree trick of the to extract the subgraph the will be equivalent to a Hessenberg matrix, when the number of loop is small (which is the case for most realist systems) this works very well.

One method I make try some day is the Band Diagonal Matrix organization, this method yield matrix the concentrate the non zeros element around the main diagonal and the subsystem can be solved in quadratic time. which when using GPU can be fixable. this system can cope with loops of the type that you are forming.

but in any case do you have a repro to see what was that I did that bloke the difficult test case.?

Inertia matrices are the first 3x3 element of a 4 x4 matrix, is does not matrix what major because inertia matrices are symmetric and positive definitive.

the videos is really cool, the interesting part is to see whet I did that broke the large test.

I am guessing I probably tweaked the LCP solver that solve the joint that break the close loop.

on eh factorization thing, the formulation of connotation system is a full cyclic graph.

any graphs has a corroding matrix formulations.

the trick to solve the linear system if to apply rows and column permutation try to make so the matrix form a known factorization algorithm.

for example a typical case, take for example a rope, if you organize the joint one after the other the resulting matrix is tridiagonal, such system can be solved in linear time with linear storage.

a three hierarchical case, each node can only have one parent, can be permutated into a

Hessenberg matrix. These systems can be factored it lower and upper triangular matrices than can be solve in linear time. with super linear storage using space matrix techniques.

These system do not have loops.

when a system has loops. what Newton does is that if factor the matrix in four sub matrices where the first quadrant is a Hessenberg matrix, and the other three are the full dense matrices that need to be solve by some direct method using matrix partition techniques.

https://en.wikipedia.org/wiki/Block_matrix

The trick is to factor permutated the big matrix into sub matrices that the off diagonal matrix has the most number of zeros and the diagonal block matrixes and the non zeros.

The method newton uses capitalize on the number of close loop in the graph will be small, therefore when the big matrix is permutated, the first block will be the largest, the off diagonal will have zeros, and the lower diagonal will be dense that will need to be solved by a direct or iterative method. is this part that is expensive and memory consuming and can be inaccurate if using and iterative method.

But the answer is yes any graph can be partitioned, the secrets is knowing how to partitioning to your convenience.

I use Spawning tree trick of the to extract the subgraph the will be equivalent to a Hessenberg matrix, when the number of loop is small (which is the case for most realist systems) this works very well.

One method I make try some day is the Band Diagonal Matrix organization, this method yield matrix the concentrate the non zeros element around the main diagonal and the subsystem can be solved in quadratic time. which when using GPU can be fixable. this system can cope with loops of the type that you are forming.

but in any case do you have a repro to see what was that I did that bloke the difficult test case.?

- Julio Jerez
- Moderator
**Posts:**11124**Joined:**Sun Sep 14, 2003 2:18 pm**Location:**Los Angeles

Interesting, Thanks for the explanation. Here is a demo with 25 bodies all connected with FixedDistanceConstraints: https://drive.google.com/open?id=1-BX7y ... 9XfyNREMw1

Run NewtonPhysicsTests.exe inside the folder structure.

Edit: Just Realized I accidentally uploaded a release build - Im on slow internet right now so It will take a bit to upload a full Debug Build.

Also if you sign up for gitlab I can add you to source access.

Run NewtonPhysicsTests.exe inside the folder structure.

Edit: Just Realized I accidentally uploaded a release build - Im on slow internet right now so It will take a bit to upload a full Debug Build.

Also if you sign up for gitlab I can add you to source access.

- MeltingPlastic
**Posts:**221**Joined:**Fri Feb 07, 2014 11:30 pm

Here is a debug build: https://drive.google.com/open?id=144sbL ... slW8hhngm-

- MeltingPlastic
**Posts:**221**Joined:**Fri Feb 07, 2014 11:30 pm

I got your pm, later I will clone it and check it out.

I like that idea of having an actual integration writen by someone other than me.

Maybe I can contribute some feactures in my spare time.

I like that idea of having an actual integration writen by someone other than me.

Maybe I can contribute some feactures in my spare time.

- Julio Jerez
- Moderator
**Posts:**11124**Joined:**Sun Sep 14, 2003 2:18 pm**Location:**Los Angeles

Yeah any contributions would be great! I know there is still some work to be done specifically in the NewtonRigidBody::rebuildBody() function. That's where I currently do 2 passes to handle making shapes with multiple sub collisions and mass densities.

- MeltingPlastic
**Posts:**221**Joined:**Fri Feb 07, 2014 11:30 pm

I have not done it yet, I want to complete the refactoring of the new memory manager.

One that offer better cache locality, can allocate at least two chunk at the same time, and that keep the memory food print smaller or at least not larger that the current one.

To add those feactures to the current one will almost double the memory consumption per with the addition of each allocation lane.

I have almost working in single thread now I need to make it multi lane.

The reason for this is that the gpu since to hit memory allocation really hard.

And thats only adding one command buffer, but the benefic seems to come from multiple command buffer and I did not want to make or get some custom memory manager that seem uggly and and more complex that need to be.

Once I get that I will revisit the broadphase pair generator that bottleneck on the creation of pairs in multi core mode.

One of the thing that has hunted Newton for years is the lack of cache coherence because I use high level containers, instead of local arrays.

I am hoping that a better memory manager can serve as a paliative to aliviate that.

One that offer better cache locality, can allocate at least two chunk at the same time, and that keep the memory food print smaller or at least not larger that the current one.

To add those feactures to the current one will almost double the memory consumption per with the addition of each allocation lane.

I have almost working in single thread now I need to make it multi lane.

The reason for this is that the gpu since to hit memory allocation really hard.

And thats only adding one command buffer, but the benefic seems to come from multiple command buffer and I did not want to make or get some custom memory manager that seem uggly and and more complex that need to be.

Once I get that I will revisit the broadphase pair generator that bottleneck on the creation of pairs in multi core mode.

One of the thing that has hunted Newton for years is the lack of cache coherence because I use high level containers, instead of local arrays.

I am hoping that a better memory manager can serve as a paliative to aliviate that.

- Julio Jerez
- Moderator
**Posts:**11124**Joined:**Sun Sep 14, 2003 2:18 pm**Location:**Los Angeles

Sounds Great!

I am pretty happy - I finally think I got inertia's calculating correctly for large compound collisions. Before if there was an offset between the COM and body frame the inertia matrix was not computed correctly. But I worked around it by manually shifting my collision shape, calling NewtonBodySetMassProperties, saving the inertia, and then reverting the collision shape and applying the Inertia.

Also Just FYI - I am doing some tests with large body counts (700 bodies) and sometimes it looks like "Islands" become unstable and a whole region of bodies will fall through the floor: https://www.youtube.com/watch?v=q8CEkeJ ... e=youtu.be

I am pretty happy - I finally think I got inertia's calculating correctly for large compound collisions. Before if there was an offset between the COM and body frame the inertia matrix was not computed correctly. But I worked around it by manually shifting my collision shape, calling NewtonBodySetMassProperties, saving the inertia, and then reverting the collision shape and applying the Inertia.

Also Just FYI - I am doing some tests with large body counts (700 bodies) and sometimes it looks like "Islands" become unstable and a whole region of bodies will fall through the floor: https://www.youtube.com/watch?v=q8CEkeJ ... e=youtu.be

- MeltingPlastic
**Posts:**221**Joined:**Fri Feb 07, 2014 11:30 pm

Are bodies going over the ground on that video?

also is this new, of has always been there.

also is this new, of has always been there.

- Julio Jerez
- Moderator
**Posts:**11124**Joined:**Sun Sep 14, 2003 2:18 pm**Location:**Los Angeles

42 posts
• Page **1** of **3** • **1**, 2, 3

Users browsing this forum: No registered users and 1 guest