Bodies not detecting collision with my player

by **pHySiQuE** » Fri Dec 02, 2016 10:59 pm

I am experiencing a problem since updating Newton.

Dynamic bodies that have mass are not getting the AABBOverlap function called with my player object. The player is a dynamic body with mass of zero that is continuously set with NewtonSetBodyMatrix(). This function skips collision with the player, but it is important for it to occur because it wakes up my character controller so it can perform its own collision.

Have there been any recent changes in Newton that could cause this?

by **Julio Jerez** » Sun Dec 04, 2016 12:26 am

yes I would expect that this may happen and need to be fixed.
Newton 3.14 broad phase now use a scalar distance functions, this allows broad phase not only quickly find new contact pairs, but also find when tow pair as not needed.
This translate to huge about half order of magnitude performance game since the O(n log n) phase not longer has to runs, instead it runs a O(n) + O (m * log (n)) where m is the number of joint that violate the distance field.
The only problem is that now when bodies are teleported by the end application it need invalidate the distance field for all existing pair contenting that body to any other bodies.
The bug you see should be easy to fix, and I though I covered all the cases, but the maybe be a case that I over look. I can not see where it could be. but this is what we can do.

Do you have an executable demo linked to the Newton DLL, that I can use to debug all these issues?
other wise I hard for me to see what path you object is following since in all me test is works fine

so that you can see the performance I am talking about, here is a comparison of Newton 3.14, newton 3.13 and I throw Bullet 2.82 and PhysX 3.4 (latest version)
PhysX and Bullet are precompiled by the PhysX team itself.
Newton 3.14 is about 3 time faster that 3.13, and about 20% faster that PhysX 3.4 the bullet does not come even close.

by **JoeJ** » Sun Dec 04, 2016 12:14 pm

Julio Jerez wrote:Newton 3.14 is about 3 time faster than 3.13, and about 20% faster than PhysX 3.4
the bullet does not comes even close.

Whoooo!

by **pHySiQuE** » Sun Dec 04, 2016 9:02 pm

JoeJ wrote:
Julio Jerez wrote:Newton 3.14 is about 3 time faster that 3.13, and about 20% faster that PhysX 3.4 the bullet does not come even close.

Whoooo!

Why don't have you have a graph on your front page showing this? Jesus!

by **Julio Jerez** » Sun Dec 04, 2016 11:13 pm

About the bug, do you have a test using you sandbox that link to the newton DLL. so that I can debug those pending things.

wait until I commit the cloth demo, I believe that it will support even self collision,
I will be capable to simulate a person wearing semi tight fitting cloth with no penetrations.
all at a linear time speed.
for what I can see the cloth solve will very, very efficient.
Therefore it will be a show point on the engine no a check point box.
the extension will be soft body, that can fully interact with the entire scene.

once I get that going, I will close 3.14 and star development of 4.00 which will be GPU support and the two passes solver.

by **pHySiQuE** » Mon Dec 05, 2016 1:09 am

I do not use the DLL, but I might switch it over so it does, so it can be debugged.

by **Schmackbolzen** » Mon Dec 05, 2016 6:28 am

Those numbers are indeed impressive! The cloth physics also sound very interesting.

About GPU physics: I've seen in the Github log that you are planning to use CUDA. Can't you use OpenCL instead? It runs on nearly all hardware plus you can run the same code on CPU, which is a huge advantage for debugging. I have used both (although only for simple tasks) and can say that I find the OpenCL way better and easier to use. You also don't have to rely on the CUDA compiler and as soon as you have the right OpenCL SDK installed you just can start programming for it. For debugging you can use e.g. AMD's CodeXL, which you can integrate into Visual Studio. Intel also seems to offer something similar in their SDK, but I have never tested it.

by **Julio Jerez** » Mon Dec 05, 2016 8:32 am

@"Schmackbolzen"
I may give another try to openCL, last time I try I found harder to set up than CUDA, then a switch to AMP, which was difficult to use also. of the three CUDA seemed the simpler, but maybe I was too hasty on judging OpenCL. It was Intel CL, and was very early stage.

but you are right that potential to do High performance computing on CPU is very attractive to me. so I will evaluate OpenCL again. maybe I should try ampCL this time.
Thank for the hint.

@physique please see if you can do that, I do not see what I miss, it might by a combination of two things, no just and invalidation.

by **Julio Jerez** » Mon Dec 05, 2016 12:58 pm

Schmackbolzen wrote:Those numbers are indeed impressive! The cloth physics also sound very interesting.
About GPU physics: I've seen in the Github log that you are planning to use CUDA. Can't you use OpenCL instead? It runs on nearly all hardware plus you can run the same code on CPU, which is a huge advantage for debugging. ..

I think that what I will do is that by the end of the year:
1-complete the pending bugs introduced by the new broad phase algorithm
2-complete the cloth and soft body (I am really motivated on this one)
3-release stable 3.14

then I will do newton 3.15 which will be a short release:
1-the two passes solver,
2-removed sleep, auto sleep and continue collision logic. (future version of the engine bodies will always be in CCD and auto sleep.
3-complet the parallel solve, as warm up for GPU
4-release 3.15

start 4.0 the GPU version. (opencl or CUDA)

by **AntonSynytsia** » Mon Dec 05, 2016 5:05 pm

Great news, Julio! Thanks for all of this.

by **JoeJ** » Mon Dec 05, 2016 5:37 pm

I've spent the last half year on GPGPU again.
Vulkan is my primary API, but i still use OpenCL for comparision and profiling.
I write one source file for both APIs and use a regular C preprocessor to translate it to GLSL / OpenCL C.
No extrawork here after some initial setup.

Vulkan would be quite attractive, because it is available everywhere (AFAIK Intel not yet)

Here's a comparision VK / CL (on AMD Gpu):

CL initally mostly faster, but after optimizing the * out of it, VK slightly wins overall.
(Even there is no proper profiling tool for VK yet, which is totally necessary to optimize properly)

This means only the execution time on GPU.
CL 1.x has no indirect dispatch, so including API overhead finally makes VK almost twice as fast for me (talking of about 30 invocations per frame)
CL 2.0 would be better, but i doubt Nvidia will ever support it.

I think CL is very easy to use, VK was harder to get going. (Now i spent equal time on maintaining both.)
You did not like CL - you'll probably hate VK. But if you need lots of invocations, think about it.

I would use OpenCL for now, it's easy to port this to VK/DX later for you or for us.
OpenGL has indirect dispatch and uses the same GLSL shaders as Vulkan, so it's another option.
(Actually GLSL is the only option to generate Spir-V)

Using CUDA would be totally useless.
Also you need to let loose from thoughts like:
'It's to technical - the API should care for those details'.
GPUs don't work this way

Assuming you have Nvidia card i also recommend to buy a AMD because they publicate detailed specs to help you knowing what you do. (And i still remember my cheap 280X destroying that expensive Titan by a factor of 2)

by **godlike** » Tue Dec 06, 2016 9:54 am

Personally, I don't find GPU physics very attractive, especially now. The new low level APIs (vulkan + DX12) will free some CPU resources and that can be spend on physics, AI, animation etc.

But I agree, choosing something future-proof is tough. At least for vulkan there are thoughts to expand the compute capabilities of the API with similar functionality to OpenCL.

I'm not sure what are the requirements for GPU based physics but I'll throw something alternative in the mix even if I haven't investigated much. You could use HLSL for the shaders. Then Newton will have 2 backends, one for Vulkan and one for DX12. For the DX12 backend Newton will use the HLSL directly but the Vulkan one will use Khronos' glslang compiler to compile HLSL to SPIR-V*.

For Vulkan, Newton API will accept a VkDevice and a VkQueue. Using the VkDevice Newton can create buffers and shaders and the VkQueue to schedule work. The application can pass to Newton whatever device/queue they want. For example, the application can pass the on-board GPU or the discrete GPU. Also, the app can pass an async-compute queue or another queue with lower priority etc. Similar idea for DX12.

*: At the moment the glslang lacks support for compute HLSL shaders.

by **Julio Jerez** » Tue Dec 06, 2016 10:30 am

in I am confused. My impression was that all three high performance computing APIs offer interoperability capability. this is you can pass a blind buffer in GPU memory and the kernel call can put results on the buffer.

for example say you have a bunch of particles, you can pass the vertex position buffer and the kernel can place the result there, then the render will issue the draw call using that buffer that is already preloaded. This apply to more complex thong like a cloth patch which has to generate the mesh as well.

as for the major difference one that is important is this the multi devise capability. this is you can devide a GPU in multiple devices, and execute different kernel simultaneully on all of then.

as if stand now in open CL last time I check a GOU can be seen as a single device, the make hard to do thong like collision detection.
take for example different pairs.
say you have 1000 colliding pairs, 200 box/polygon, 300 box/box, 500 convex/box and so on
now you can only issue one kernel at time, in believe CUDA let you divide the GPU into multiple devises so you can issue the first batch in one kernel, the second in another and the this in another.

CUDA, the letters versions, has the capability of issues kernel form with in kernels. this is very useful but I believe opencl can do that with the cammands queue.
I think I read somewhere that openCL would be able to associate multiple command queue to different devices, but that currently that was only possible on CPU but no on GPUs.
this is why CPU openCL is attractive to me. and maybe later this can be extended to GPU as well.

I know the do it at least on consoles because when you look at the GPU debugger on a console you can see how different each multiprocessor execute different shader in parallel, so I do not know why this was supported by early version of openCL. right now it can only one keener per GPU and what it need is one kernel per multiprocessor.

by **JoeJ** » Tue Dec 06, 2016 11:31 am

Julio Jerez wrote:My impression was that all three high performance computing APIs offer interoperability capability.

Not yet, neither VK nor DX12 have data sharing with other APIs.
Khorons annonced this as feature for next version.

Julio Jerez wrote:as if stand now in open CL last time I check a GOU can be seen as a single device, the make hard to do thong like collision detection.
take for example different pairs.

CL has multiple queues, VK and DX12 too.
This however does not guarantee that work runs in parallel, it just makes it possible.
AMD is the only hardware with fine grained async compute, but even AMD recommends using just one compute task while doing ALU light rendering work (Depth prepass, Shadowmaps).
I hope multiple parallel compute tasks can profit from async compute too if one task has too little work to saturate the GPU, but at the moment i have too much sequential dependencies to test this seriously. I'll let you know...

This means you need to try to generate large workloads of similar work, e.g.
Shader 1: build potential collision pairs and write them to a large list.
Shader 2: Detect exact collision data and write to another list
Shader 3: Resolve all collisions

Julio Jerez wrote:say you have 1000 colliding pairs, 200 box/polygon, 300 box/box, 500 convex/box and so on

Yep, that's a bad example - 1000 is simply not enough to saturate a gpu.
It makes sense only for cloth / particles. And even there it is questionable because you need to maintain a copy of the physical world on the GPU. (I still think if someone wants e.g. cloth on characters, the graphics engine developer can and should implement this more efficiently then you could do)

Julio Jerez wrote:CUDA, the letters versions, has the capability of issues kernel form with in kernels. this is very useful but I believe opencl can do that with the cammands queue.

OpenCL can do this only since 2.0.
Going back to my collisions example,
with OpenCL 1.x you write a list from shader 1 and the list count to GPU memory.
Before you start shader 2 you need to read the list count from GPU to CPU so you know how much work is to do. And this is what *.

VK / DX12 have indirect dispatch: You build the command buffer upfront. The command buffer knows all 3 shader in order and also it knows the count will be a result of the previous shader.
At runtime you call the command buffer with a single call per frame - no need to read the count from GPU, it can handle it on its own.
This is the ONLY reason VK is twice as fast as OpenCL for me, so it's very important.

OpenGL is in between: It has indirect dispatch, but no prerecorded command buffers.
So no need to read the list count, but you need to invoke many shaders per frame instead just one command buffer for everything.
(But 2 years back even with the lack of indirect dispatch CL was twice as fas as GL on Nvidia for me)

Julio Jerez wrote:I know the do it at least on consoles because when you look at the GPU debugger on a console you can see how different each multiprocessor execute different shader in parallel, so I do not know why this was supported by early version of openCL. right now it can only one keener per GPU and what it need is one kernel per multiprocessor.

I expect this to be better with VK / DX12 than with OpenCL, but not as fine grained as you wish:
* Those debug graphs show mostly work from graphics pipeline, not compute - that's a difference.
* AMD only (Pascal might have some improvement, but i assume it's stall far behind. Intel has NO async compute)

godlike wrote:You could use HLSL for the shaders. Then Newton will have 2 backends, one for Vulkan and one for DX12. For the DX12 backend Newton will use the HLSL directly but the Vulkan one will use Khronos' glslang compiler to compile HLSL to SPIR-V*.

Agree, seems the most future proof path to go (but a hard start).
IMHO both CUDA and OpenCL are dead now for game dev.

Edit:
In the meantime i have a multithreaded CPU implementation of my (insanely complex) Global Illumination algo.
FuryX is 30-100 x faster than i7-930
100 on calculation intense work like ray tracing
30 on bandwidth heavy / low workload things

by **Julio Jerez** » Tue Dec 06, 2016 12:08 pm

Julio Jerez wrote:
I know the do it at least on consoles because when you look at the GPU debugger on a console you can see how different each multiprocessor execute different shader in parallel, so I do not know why this was supported by early version of openCL. right now it can only one keener per GPU and what it need is one kernel per multiprocessor.

I expect this to be better with VK / DX12 than with OpenCL, but not as fine grained as you wish:
* Those debug graphs show mostly work from graphics pipeline, not compute - that's a difference.
* AMD only (Pascal might have some improvement, but i assume it's stall far behind. Intel has NO async compute)

I know, what I am saying is that at least on AMD GPU the multiprocessors of a GPU are independent for each other, debugger shows then doing different task simultaneously, and I guess this is also the case on NVidia, it is the API that force the use to make single keener with huge data as if this is always the case.

a GOPU has form 16 to 32 multi processor inside, if they were all linked together as these API pretend the user to believe they are, they we be completely useless even for rendering.
I know CPU opencl support that, and I know that for rendering internally the drive tread the Mutiprosseros as in depended CPU, is Is the crappie api the offer that prevent people form using the same way the drive use them.

Bodies not detecting collision with my player

Bodies not detecting collision with my player

Re: Bodies not detecting collision with my player

Re: Bodies not detecting collision with my player

Re: Bodies not detecting collision with my player

Re: Bodies not detecting collision with my player

Re: Bodies not detecting collision with my player

Re: Bodies not detecting collision with my player

Re: Bodies not detecting collision with my player

Re: Bodies not detecting collision with my player

Re: Bodies not detecting collision with my player

Re: Bodies not detecting collision with my player

Re: Bodies not detecting collision with my player

Re: Bodies not detecting collision with my player

Re: Bodies not detecting collision with my player

Re: Bodies not detecting collision with my player

Who is online