Parallel solver experiments

A place to discuss everything related to Newton Dynamics.

Moderators: Sascha Willems, walaber

Re: Parallel solver experiments

Postby Julio Jerez » Wed Jun 20, 2018 3:41 pm

could you sync again and if is crash can you show me the assembly code.

the next plugin, I will do tonight will be the sse4.2 which is the proper 128 bit set for these architectures. the
We recommend using Visual Studio 2017 Download now
__cpuid, __cpuidex
Visual Studio 2015 Other Versions

The new home for Visual Studio documentation is Visual Studio 2017 Documentation on docs.microsoft.com.

The latest version of this topic can be found at __cpuid, __cpuidex.

Microsoft Specific**

Generates the cpuid instruction that is available on x86 and x64. This instruction queries the processor for information about supported features and the CPU type.
__cpuid( int cpuInfo[4], function_id for verify these values.

Code: Select all
   static bool FMA(void) { return CPU_Rep.f_1_ECX_[12]; } 
   static bool SSE42(void) { return CPU_Rep.f_1_ECX_[20]; } 


by the will be a separate dll that will not have AVX options at the project level.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Parallel solver experiments

Postby Julio Jerez » Wed Jun 20, 2018 3:44 pm

Joe what is your CPU?

I may have the initialization in the worn place, I think I should place in function
BOOL APIENTRY DllMain( HMODULE hModule, DWORD ul_reason_for_call, LPVOID lpReserved )

and have it return NULL if the instruction set does no validate, this was is will no get to call the C++ startup.

edit: I made that change but it does no matter the start up is call before dllmain.

any way if you still get a crash then to no I will make sure that there are no global variables in the plug in and that should take care of that problem.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Parallel solver experiments

Postby JoeJ » Wed Jun 20, 2018 4:40 pm

Now it works for all variants of debug / release, 32 and 64bit builds :)

CPU is i7 930, i think this was very 1st generation of core processors.
User avatar
JoeJ
 
Posts: 1453
Joined: Tue Dec 21, 2010 6:18 pm

Re: Parallel solver experiments

Postby Julio Jerez » Wed Jun 20, 2018 5:14 pm

When you open the option menu do you see two solvers?
Default
Acc

Or only default.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Parallel solver experiments

Postby JoeJ » Wed Jun 20, 2018 5:33 pm

only 'default solver'
User avatar
JoeJ
 
Posts: 1453
Joined: Tue Dec 21, 2010 6:18 pm

Re: Parallel solver experiments

Postby Julio Jerez » Wed Jun 20, 2018 5:50 pm

so there are some core 7 that do not support AVX, and even worse even if those support SS4.2, there still do not support instructions FMA4 or FMA3.
It turns out those instructions are a new three a dress standard altogether. They at in fact more modern than avx, so for those cpu is the end of the line as hardware support goes.

I am still going to make the sse 4.2 because there are ton of AMD that do supports fmadd, this again is another case where AMD drop the ball, they made the specification for fmad4 then changed to fmad3, then changed again back to the first fMad4 and so on
just read this nightmare. https://en.wikipedia.org/wiki/FMA_instruction_set

I guess I will try the one for Bobcat and if the result are better the I make one for bulldozer in the same DLL and we load the one the is supported first.

Edit: it turns out that almost all popular and cup: piledriver, buldoxer and bodcat support the fmadd3
Which btw is the only one exposed to the intrusive, so it is all good.
The solver check reduces to check if flag fmadd is true, that implies sse4 is also true.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Parallel solver experiments

Postby JoeJ » Sun Jun 24, 2018 1:52 pm

Julio Jerez wrote:so for those cpu is the end of the line as hardware support goes.


Not necessarily if you utilize this: https://arxiv.org/pdf/1602.04716.pdf

( :mrgreen: Just joking. But interesting idea somebody mentioned to me.)
User avatar
JoeJ
 
Posts: 1453
Joined: Tue Dec 21, 2010 6:18 pm

Re: Parallel solver experiments

Postby Julio Jerez » Sun Jun 24, 2018 2:35 pm

I just finish a slight modification.
The solver has three stages
1-calculate joint forces on each row.
2-apply forces to each body
3-integrate each body.

Until now face one has always been the slowest,, so all my effort has always been focus on optimizing the joint solve.

The parallel solver groups joints by similarity in groups of 8, this can be 4 or any arbitrary number, but 8 seems to be the most efficient, the it solve then as a single joint.

This requires some extra overhead, but it does not seem to be a problem.

So far the was working very well. And the timing seem very similar. In till I try four or height thread.

The solve reach a platoo and in fact it seem to slow down.
I took some profile capture and to my surprise, the joint solve is no longer the slower part when running multiple thread. It isphase two. Applying joint force to each body.

The problem is the the code apply the force by iteration over the joints, but since a joint link tow bodies, if two threads are calculation a force and they share one body, this means that the code must use critical sections to do the acummulation.

Shake it really bad with this solver is that joint are in groups, therefore the chance that one core will share one body with another core increase by a factor of
8 joints each 2 body ^ number of thread

Basically the code resializes and all the error is wasted.

To fix that, now the solve iterates over the bodies, but each body has to have a precompute array of the joints, this is more code but does not requires critical section.
Now the solver scam liberally with the core count.

I will commit this layers today, it may be a little slower in single core, but the goal of this solver is high core count.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Parallel solver experiments

Postby Julio Jerez » Sun Jun 24, 2018 7:51 pm

Ok guys I added the critical section free version of all the parallel solvers,
also I added the avx2

For those with AMD processors that recognizes the plugins, according to my understanding of the AMD
It seems that executing AVX code on amd proccessors is not as efficient as it is on modern Intel icore7

therefore the most efficient solver would be sse4.2 not avx.
This is the one that use vmadd instruction but in 128 bit mode.

I can no tell teh dirrence between the avx2, and sse4.2,
avx2 seem a lithe faster the diffrent is with in the margin of error.

but both AVX2 and SSE4.2 are definably faster that the normal sse by about 3 ms on the 40 x 40 pyramid.
I will make a video later showing all three solve in action.


On teh optimization there on last final thing that I can do whi is remove all the atomic indexes, teh lar profile show that the slower functions the hit happen at teh atomic Add every time.

anyway if you guy sync it is all committed.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Parallel solver experiments

Postby Julio Jerez » Sun Jun 24, 2018 9:10 pm

here is a video showing the best it can do so far, noticed this is a hard scene to simulate without jitter or the towers collapsing the moment is touched, here you have to work hard to knock them down.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Parallel solver experiments

Postby MeltingPlastic » Mon Jun 25, 2018 10:54 am

Hi Julio,

I Just synced - things are working however I get a crash when I introduce a compound collision. (Something to due with the hashmap for contacts change I think) If I revert before commit ff01d56c08968b7cbd15d6b57782ca540e9dcc73 Things work again.

Crash Screenshot:
https://drive.google.com/open?id=1zqqIFgjKd57s-qcQP9p7K5NwFt1mctkR
MeltingPlastic
 
Posts: 237
Joined: Fri Feb 07, 2014 11:30 pm

Re: Parallel solver experiments

Postby Julio Jerez » Mon Jun 25, 2018 1:32 pm

I was having similar crashes this weekend until I realized that the project update new plugin, but of an old plugin was in the folder and, the engine will try to load it and call function on it, but this will crash inevitable if the plugin class was changed.

I have no fin a way to delete the plugin folder for the build system, adding a prebuild script does it be is quite tedious, because there si no way to chek dependencies on binary

the only way to clean the legacy's plugin is by manually delete the folder
../newton-dynamics\applications\demosSandbox\Win32\newtonPlugins

or the home folder in your of eh plugin in your application.
please do that and try again.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Parallel solver experiments

Postby Julio Jerez » Mon Jun 25, 2018 3:02 pm

ok now I added a way to determine the prefer solver plugin.

here is a code sniped for anyone testing in you wrapper
Code: Select all
NewtonSetMultiThreadSolverOnSingleIsland (m_world, 1);   
NewtonSelectPlugin (m_world, NewtonGetPreferedPlugin(m_world));


This will select the best solver base of my assessment of the instructions set,
for platforms not supporting any of the instructions sets, it will still use the parallel solver which is quite good.
The plugin in are a little better but is no what I was shopping for, but we need to keep going because is the future.
Now the last thing for the parallel solve is to remove the atomics, this will apply to the collision system and the solvers, it turns out the are quiet expensive as well when using multicores, who knew. :( :cry:

anyway we are now close to the base solve, and next we will work on the unified solver.
-rigid bodies
-contractions of articulated bodies
-particles and soft spring mass cloth
-lagrangian fluid particles
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Parallel solver experiments

Postby MeltingPlastic » Mon Jun 25, 2018 3:21 pm

Sounds Good, sounds like I need to bite the bullet and go full dll linking.
MeltingPlastic
 
Posts: 237
Joined: Fri Feb 07, 2014 11:30 pm

Re: Parallel solver experiments

Postby Julio Jerez » Mon Jun 25, 2018 3:48 pm

you do not have to, the SDK load the plugins automatically,
The are dll only for now. all you need to do is delete that folder I mentioned in my previous post.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

PreviousNext

Return to General Discussion

Who is online

Users browsing this forum: No registered users and 18 guests

cron