Parallel solver experiments

by **Julio Jerez** » Fri Jun 15, 2018 4:10 pm

Me again Joe, oh darn!! just for fun and no expecting to be stable at all I try a 40 x 40 pyramid.

to my surprise it was not only stable by it come to sleep faster that the sequential solver, it seem to be faster. I am not sure what is going on but this seem unbelievable, I's take it.
:mrgreen:

I committed with the 40 x 40 pyramid test. please when you read this give it a try,
This will be much faster optimization.

by **JoeJ** » Sat Jun 16, 2018 1:03 pm

Sounds interesting

Julio Jerez wrote:I am not sure what is going on but this seem unbelievable

Let me know when you find out...

Unfortunately, for the first time i feel my CPU becomes dated (it has no AVX).
Can you make this optional?

: crash.JPG (142.37 KiB) Viewed 4860 times

by **Julio Jerez** » Sat Jun 16, 2018 4:06 pm

Joe but the code that is committed does not uses any of those external plugins.
Did you try to select a plug in?
If so that will not work, the option is on by default, and you can tested it but check / uncheck option solve large island in parallel. Went on the the solve can use multicores for single island.
I run the optimized mesh collision demo, that drops 1000 bodies on a mesh to make a pile of debree
And when using 4 thread and asyncronousu update I get about 600 fps, and under 10 ms physics update.
I expect this to be even better after the spa optimization I am doing now.

by **JoeJ** » Sat Jun 16, 2018 4:29 pm

The error happens before i could do any user interaction, demo sandbox does not start up at all.
I assume it is the existence of the type dgFloatAvx alone that prevents start up because my CPU lacks AVX. (i made no changes to the code either, just download and compile demos sandbox as usual.)

I have no idea how to handle this. Someone else?

Recently quite a number of games had similar issues on older AMD CPUs because they miss some SSE instructions. The games received a patch after release.
I do not know if it's possible to handle this at runtime, or if we require different executables and even need to use #ifdefs to disable the types. :?:

by **Julio Jerez** » Sat Jun 16, 2018 4:33 pm

Can you delete the dolls from the binary, if not I will disable later.
Are you getting a crash or anything?

by **JoeJ** » Sat Jun 16, 2018 5:25 pm

Julio Jerez wrote:Can you delete the dolls from the binary

What do you mean by dolls?

Julio Jerez wrote:Are you getting a crash or anything?

Unhandled exception at start up. (The message is in my screenshot, but you need to scroll around to see it)

by **Julio Jerez** » Sat Jun 16, 2018 5:47 pm

Al right, there is a misunderstanding, what is commited is no an AVX build.
it is the solve that can solve single island in parallel without having to break the mass matrix into separate island of non intersecting joints.

to about confusion, I commited the SDK with the DLL loader commented out for now.

the test is simply this:
-run the demo and you will see a single island 40 x 40 pyramid
-set to run on multiple core, 2 or 4 or more depend on your system
-you should see the physics time scale some what linearly with the number of threads.

-them the option "Solve Large island in parallel off" and repeat the test
you soudl see that the sequential solve is much slower that the parallel solve.

Then to my surprised I also see that at least in two systems that I tested, the parallel solver sine to be faster than teh sequential solve even when is has a sentential amount of overhead,
I thought that thsi was a sign of a bug, that why I said I am not sure what is going on, so I revised the code and indeed there was a bug,
but after I fixed the Bug, for some reason it got even better. :mrgreen:

and that is really welcome unexpected result because the parallel solver has a much poor converge rate than the sequential, but if it is in fact faster this means we can add some extra iterations to improve convergence until the single thread performance is at least equal in both solvers.

I run that test with the code that is committed now, and this is my result.
in my system with 1 core I am getting, could you guy check it again.

: parallelSolver.png (77.12 KiB) Viewed 4843 times

the other thing that I was saying is that with the parallel solve we can now implement the structure of array version whi will use single lanes of a simd register as a joint, this way the solve will resolve multiple joint per call, as opsed to what is doing now whi is one joint per call.

I am now working of teh SSE version whi will do 8 joint per call, the reason of 8 is to give the compiler the change to schedule teh code multiple float per instruction.
Anyway that's more tweak stuff bu the point is that the parallel solve can do multiple joints per call even on a single core.

by **Julio Jerez** » Sat Jun 16, 2018 5:50 pm

Joe I do not see any screen shot with stack trace.
but anyway, can you sync and try again, I commnetd out tha loading of any dll now.

by **Julio Jerez** » Sat Jun 16, 2018 9:29 pm

Oh I see, you mean the program counter in teh image point to one SSE instruction.
that's a bug that can be fixed eassy.

basically teh reason I am try to make the specialized solve libraries or DLL, is that this allows me to do project file setting.
Bu having project setting per project we can get the compiler do lot of teh house keeping work, like calling zero upper registers and stuff liek that. The project for example is using AVX instruction set.

The catch is that not special instruction soudl be executed begore the library test for the support for the instruction set.

I will move that initialization to teh end of teh functions. and that should fix it, but in any case that is no what we are testing now.

by **Julio Jerez** » Sat Jun 16, 2018 9:31 pm

Oh I see, you mean the program counter in teh image point to one SSE instruction.
that's a bug that can be fixed eassy.

basically teh reason I am try to make the specialized solve libraries or DLL, is that this allows me to do project file setting.
Bu having project setting per project we can get the compiler do lot of teh house keeping work, like calling zero upper registers and stuff liek that. The project for example is using AVX instruction set.

The catch is that not special instruction should be executed begore the library test for the support for the instruction set. because if the option is a compiler setting, then it will execute illegal instructions and crash.
Basically the dll has to be divided into two dlls, one that has SSE compiler options, and the other with AVX, and moved the I will move that initialization to the end of the functions, and that should fix it, in any case that is no what we are testing now.

by **JoeJ** » Sun Jun 17, 2018 2:11 pm

Yep, now i can run it without issues

Julio Jerez wrote:the test is simply this:
-run the demo and you will see a single island 40 x 40 pyramid
-set to run on multiple core, 2 or 4 or more depend on your system
-you should see the physics time scale some what linearly with the number of threads.

With 1 core i get 55 ms
4 cores: 25 ms

-them the option "Solve Large island in parallel off" and repeat the test
you soudl see that the sequential solve is much slower that the parallel solve.

1 core: 55ms
4 cores: 55ms

So i have very similar ratios

by **Julio Jerez** » Sun Jun 17, 2018 3:28 pm

Oh excellent, we now can assume that is moving on the right direction.
We can see that four cores yield about twice the performance of a single core. In your case from 55 to 25 me.

I am about half way on the SOA version, and I expect this to do something like from 55 to 45ms or less in single core mode.
The reason is the the soa version only deal with some functions plus also add some extra overhead. So let us see what we get when ready.

I am setting as a goal of something like 16 ms with all optimization on in a system like yours.
But remember this is bade line you, once the same is done with avx, the the single core should be 30 me or less for the same scene, at least that's my expectation.

There is also another benefit, this is the worse performance, if for example a bunch of convex in a pile of debree, then it performs a lot better while keeping the solution stable.
This is the aspect I am more excited about, because we can them add non rigid body to the solver and that will be the next development. I am particular thinking of the cloth solver passes.

by **JoeJ** » Sun Jun 17, 2018 4:45 pm

Julio Jerez wrote:This is the aspect I am more excited about, because we can them add non rigid body to the solver and that will be the nextdevelopment. I am particular thinking of the cloth solver passes.

Yes i see, and i'm looking forward this the most.
Although characters is where my heart is, i have practical use for cloth and soft bodies before this... :roll:

by **Julio Jerez** » Tue Jun 19, 2018 1:28 am

Ok guys I am now rolling out the first version of the SOA parallel solver.

I made a movie because a still picture does not do justice.
This is the first time we can see the parallel been faster than the sequential in single thread mode. about 20% or maybe more.

There is only one optimization left to make and that is the dynamics sleeping, right now is brute force, I will do that later.

In the you can see that I get about 30 fp single thread and just about 60 with four thread.
I suspect that with the last optimization, plus some other minor ones it will be a solid 60.
But when we made the AVX version, that's when is ready it should aproach around the 90 or 100 fps. and now we are talking some serious performance.

I also wigglef the boxes around a play with the stack, you can see that the is not slow down, and in the close up there is not jitter.

Please guys check it out and let me know what you get.

by **Dave Gravel** » Tue Jun 19, 2018 2:09 am

Hi Julio, here I get a different result but it is very nice too.
Here on my amd I need to add more thread if I like to see a difference with the island solver.
With only one thread it stay the same with island on or off.
Yes the speed is very amazing when you get a object from the stack and it stay constant.

https://youtu.be/sZst5E4T0U4

Parallel solver experiments

Re: Parallel solver experiments

Re: Parallel solver experiments

Re: Parallel solver experiments

Re: Parallel solver experiments

Re: Parallel solver experiments

Re: Parallel solver experiments

Re: Parallel solver experiments

Re: Parallel solver experiments

Re: Parallel solver experiments

Re: Parallel solver experiments

Re: Parallel solver experiments

Re: Parallel solver experiments

Re: Parallel solver experiments

Re: Parallel solver experiments

Re: Parallel solver experiments

Who is online