Moderators: Sascha Willems, walaber
Julio Jerez wrote:Newton 3.14 is about 3 time faster than 3.13, and about 20% faster than PhysX 3.4
the bullet does not comes even close.
JoeJ wrote:Julio Jerez wrote:Newton 3.14 is about 3 time faster that 3.13, and about 20% faster that PhysX 3.4 the bullet does not come even close.
Whoooo!
Schmackbolzen wrote:Those numbers are indeed impressive! The cloth physics also sound very interesting.
About GPU physics: I've seen in the Github log that you are planning to use CUDA. Can't you use OpenCL instead? It runs on nearly all hardware plus you can run the same code on CPU, which is a huge advantage for debugging. ..
Julio Jerez wrote:My impression was that all three high performance computing APIs offer interoperability capability.
Julio Jerez wrote:as if stand now in open CL last time I check a GOU can be seen as a single device, the make hard to do thong like collision detection.
take for example different pairs.
Julio Jerez wrote:say you have 1000 colliding pairs, 200 box/polygon, 300 box/box, 500 convex/box and so on
Julio Jerez wrote:CUDA, the letters versions, has the capability of issues kernel form with in kernels. this is very useful but I believe opencl can do that with the cammands queue.
Julio Jerez wrote:I know the do it at least on consoles because when you look at the GPU debugger on a console you can see how different each multiprocessor execute different shader in parallel, so I do not know why this was supported by early version of openCL. right now it can only one keener per GPU and what it need is one kernel per multiprocessor.
godlike wrote:You could use HLSL for the shaders. Then Newton will have 2 backends, one for Vulkan and one for DX12. For the DX12 backend Newton will use the HLSL directly but the Vulkan one will use Khronos' glslang compiler to compile HLSL to SPIR-V*.
Julio Jerez wrote:
I know the do it at least on consoles because when you look at the GPU debugger on a console you can see how different each multiprocessor execute different shader in parallel, so I do not know why this was supported by early version of openCL. right now it can only one keener per GPU and what it need is one kernel per multiprocessor.
I expect this to be better with VK / DX12 than with OpenCL, but not as fine grained as you wish:
* Those debug graphs show mostly work from graphics pipeline, not compute - that's a difference.
* AMD only (Pascal might have some improvement, but i assume it's stall far behind. Intel has NO async compute)
Users browsing this forum: Google Adsense [Bot] and 42 guests