Cuda Solver

A place to discuss everything related to Newton Dynamics.

Moderators: Sascha Willems, walaber

Re: Cuda Solver

Postby Bird » Fri Apr 15, 2022 5:58 pm

Still not rotations. Here's what ReadData output
ReadData: id(0) w(0.000000 0.000000 0.000000) r(0.000000 0.000000 0.000000 1.000000)
Bird
 
Posts: 636
Joined: Tue Nov 22, 2011 1:27 am

Re: Cuda Solver

Postby Julio Jerez » Fri Apr 15, 2022 7:38 pm

ah ok, it seems the inialization from the bodies to the gpu is failing.

but before we start debug that let us do a quick hack but force the angular velocity.
please sync and try again. you should get a result like this

GetTransform: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.157161 0.000000 -0.987573)
IntegrateExternalForce: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.157161 0.000000 -0.987573)
IntegrateVelocity: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.115888 0.000000 -0.993262)
IntegrateExternalForce: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.115888 0.000000 -0.993262)
IntegrateVelocity: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.074413 0.000000 -0.997228)
GetTransform: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.074413 0.000000 -0.997228)
IntegrateExternalForce: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.074413 0.000000 -0.997228)
IntegrateVelocity: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.032810 0.000000 -0.999462)
IntegrateExternalForce: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.032810 0.000000 -0.999462)
IntegrateVelocity: id(0) w(0.000000 10.000000 0.000000) r(0.000000 -0.008851 0.000000 -0.999961)


I also discover a possible cause for this, that primitive math function I added were all __device__
but it is possible that I was calling the form C++ and the result are undefined.
I nwo added __device__ __host__ so that I can call them from cpp and form gpu.

anyway, please just sync and try again let us see if spins.
Julio Jerez
Moderator
Moderator
 
Posts: 12452
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Cuda Solver

Postby Bird » Fri Apr 15, 2022 8:15 pm

Body still doesn't spin on updates

ReadData: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.000000 0.000000 1.000000)
IntegrateExternalForce: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.000000 0.000000 1.000000)
IntegrateVelocity: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.041655 0.000000 0.999132)
IntegrateExternalForce: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.041655 0.000000 0.999132)
IntegrateVelocity: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.083237 0.000000 0.996530)
GetTransform: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.083237 0.000000 0.996530)
IntegrateExternalForce: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.083237 0.000000 0.996530)
IntegrateVelocity: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.124675 0.000000 0.992198)
IntegrateExternalForce: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.124675 0.000000 0.992198)
IntegrateVelocity: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.165896 0.000000 0.986143)
GetTransform: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.165896 0.000000 0.986143)
IntegrateExternalForce: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.165896 0.000000 0.986143)
IntegrateVelocity: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.206830 0.000000 0.978377)
IntegrateExternalForce: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.206830 0.000000 0.978377)
IntegrateVelocity: id(0) w(0.000000 10.000000 0.000000) r(0.000000 0.247404 0.000000 0.968912)
Bird
 
Posts: 636
Joined: Tue Nov 22, 2011 1:27 am

Re: Cuda Solver

Postby Julio Jerez » Fri Apr 15, 2022 8:33 pm

It look like is printing the correct values now.
If it not rotating it them it most be failing when reading from gpu.
I will clean up the prints and enable one whe it read the rotation.
Julio Jerez
Moderator
Moderator
 
Posts: 12452
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Cuda Solver

Postby Bird » Fri Apr 15, 2022 8:38 pm

What I don't understand is how it could work in the Newton demo but not in my app???
Bird
 
Posts: 636
Joined: Tue Nov 22, 2011 1:27 am

Re: Cuda Solver

Postby Julio Jerez » Fri Apr 15, 2022 9:49 pm

me too I do not get why this happens, but there must be a reason. This is why I hate GPu programeing of any kind.

anyway I remove all the print form shader leaving onle the one that read form GPU and copy to cpu
when he get time please try again, this how looks on mind, the velocity will be wrong because I just hack the value,
the rotations should be correct, but they are out of order because the GPU and CPU prinst aren't coordinated. It seems each is buffered them print a group.

GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.247404 0.000000 0.000000 0.968912)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.327195 0.000000 0.000000 0.944957)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.404715 0.000000 0.000000 0.914443)
GetTransform cpu 0 : w(0.654416 -0.409420 -1.027692) r(0.479425 0.000000 0.000000 0.877583)
GetTransform cpu 0 : w(0.654416 -0.409420 -1.027692) r(0.404715 0.000000 0.000000 0.914443)
GetTransform cpu 0 : w(0.654416 -0.409420 -1.027692) r(0.479425 0.000000 0.000000 0.877583)
GetTransform cpu 0 : w(0.654416 -0.409420 -1.027692) r(0.550809 0.000000 0.000000 0.834631)
GetTransform cpu 0 : w(0.654416 -0.409420 -1.027692) r(0.618370 0.000000 0.000000 0.785887)
GetTransform cpu 0 : w(0.654416 -0.409420 -1.027692) r(0.681639 0.000000 0.000000 0.731689)
GetTransform cpu 0 : w(0.654416 -0.409420 -1.027692) r(0.740177 0.000000 0.000000 0.672412)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.479425 0.000000 0.000000 0.877583)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.550809 0.000000 0.000000 0.834631)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.618370 0.000000 0.000000 0.785887)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.681639 0.000000 0.000000 0.731689)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.740177 0.000000 0.000000 0.672412)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.793578 0.000000 0.000000 0.608469)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.841471 0.000000 0.000000 0.540302)
GetTransform cpu 0 : w(0.654416 -0.409420 -1.027692) r(0.793578 0.000000 0.000000 0.608469)
GetTransform cpu 0 : w(0.654416 -0.409420 -1.027692) r(0.841471 0.000000 0.000000 0.540302)
GetTransform cpu 0 : w(0.654416 -0.409420 -1.027692) r(0.883524 0.000000 0.000000 0.468386)
GetTransform cpu 0 : w(0.654416 -0.409420 -1.027692) r(0.919445 0.000000 0.000000 0.393219)
GetTransform cpu 0 : w(0.654416 -0.409420 -1.027692) r(0.948985 0.000000 0.000000 0.315322)
GetTransform cpu 0 : w(0.654416 -0.409420 -1.027692) r(0.971938 0.000000 0.000000 0.235238)
GetTransform cpu 0 : w(0.654416 -0.409420 -1.027692) r(0.988146 0.000000 0.000000 0.153520)
Julio Jerez
Moderator
Moderator
 
Posts: 12452
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Cuda Solver

Postby Bird » Sat Apr 16, 2022 7:37 am

I changed my build system so that I'm using the exact same ndNewton.dll that the newton sandbox demo is using but still no luck.

ReadData: id(0) w(10.000000 0.000000 0.000000) r(0.000000 0.000000 0.000000 1.000000)
GetTransform cpu 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
GetTransform cpu 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.083237 0.000000 0.000000 0.996530)
GetTransform cpu 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.165896 0.000000 0.000000 0.986143)
GetTransform cpu 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
GetTransform cpu 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.247404 0.000000 0.000000 0.968912)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.327195 0.000000 0.000000 0.944957)
GetTransform cpu 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.404715 0.000000 0.000000 0.914443)
GetTransform cpu 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.479425 0.000000 0.000000 0.877583)
GetTransform cpu 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.550809 0.000000 0.000000 0.834631)
GetTransform cpu 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.618370 0.000000 0.000000 0.785887)
GetTransform cpu 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
GetTransform cpu 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.681639 0.000000 0.000000 0.731689)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.740177 0.000000 0.000000 0.672412)
GetTransform cpu 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.793578 0.000000 0.000000 0.608469)
GetTransform cpu 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.841471 0.000000 0.000000 0.540302)
GetTransform cpu 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.883524 0.000000 0.000000 0.468386)
GetTransform cpu 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.919445 0.000000 0.000000 0.393219)
Bird
 
Posts: 636
Joined: Tue Nov 22, 2011 1:27 am

Re: Cuda Solver

Postby Bird » Sat Apr 16, 2022 8:05 am

One thing that's different about my app is that I run Newton on a separate thread. My opengl gui runs on the main thread and I put the OptiX renderer and Newton on separate threads so that a long calculation doesn't effect the user interactivity with the gui.
Bird
 
Posts: 636
Joined: Tue Nov 22, 2011 1:27 am

Re: Cuda Solver

Postby Julio Jerez » Sat Apr 16, 2022 9:43 am

possible but I do not think it is the case. the last print out seems to indicate that the the problem is in the cpu side when copiedn the data of the body is wrong.

for exampel these two print out,
GetTransform GPU: id(0) w(10.000000 0.000000 0.000000) r(0.327195 0.000000 0.000000 0.944957)
GetTransform cpu 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)


the firs one is issue from the GPU, and as you can see the rotation is not identity. and the speed is no zero, this mean the GPU was initialized and is working.

the secund line is print the value take form the newton body not from the buffer,
so either is wrong in the CPU buffer or the body is a wrong pointer. let us continue narrowing down.

I will now move the printf from the GPU shader, and place to a before and after the data is read for GPU.
Julio Jerez
Moderator
Moderator
 
Posts: 12452
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Cuda Solver

Postby Julio Jerez » Sat Apr 16, 2022 9:58 am

ok sync again, now it is printing the rotation before and after is to the cpu.
we know that setting is identinity be let us see if that value that is applying is identity

the trace looks like this

SetTransform buffer 0 : w(0.000000 0.500000 -3.000000) r(0.772952 0.000000 0.000000 0.634464)
SetTransform body 0 : w(0.654416 -0.409420 -1.027692) r(0.772952 0.000000 0.000000 0.634464)
SetTransform buffer 0 : w(0.000000 0.500000 -3.000000) r(0.823081 0.000000 0.000000 0.567924)
SetTransform body 0 : w(0.654416 -0.409420 -1.027692) r(0.823081 0.000000 0.000000 0.567924)
SetTransform buffer 0 : w(0.000000 0.500000 -3.000000) r(0.867497 0.000000 0.000000 0.497443)
SetTransform body 0 : w(0.654416 -0.409420 -1.027692) r(0.867497 0.000000 0.000000 0.497443)
SetTransform buffer 0 : w(0.000000 0.500000 -3.000000) r(0.905892 0.000000 0.000000 0.423509)
SetTransform body 0 : w(0.654416 -0.409420 -1.027692) r(0.905892 0.000000 0.000000 0.423509)


after this I suspect what could be the problem, but let us try this test first
Julio Jerez
Moderator
Moderator
 
Posts: 12452
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Cuda Solver

Postby Bird » Sat Apr 16, 2022 12:13 pm

Still no go

ReadData: id(0) w(10.000000 0.000000 0.000000) r(0.000000 0.000000 0.000000 1.000000)
SetTransform buffer 0 : w(-0.417485 0.339037 -0.455729) r(0.000000 0.000000 0.000000 1.000000)
SetTransform body 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
SetTransform buffer 0 : w(-0.417485 0.339037 -0.455729) r(0.000000 0.000000 0.000000 1.000000)
SetTransform body 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
SetTransform buffer 0 : w(-0.417485 0.339037 -0.455729) r(0.000000 0.000000 0.000000 1.000000)
SetTransform body 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
SetTransform buffer 0 : w(-0.417485 0.339037 -0.455729) r(0.000000 0.000000 0.000000 1.000000)
SetTransform body 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
SetTransform buffer 0 : w(-0.417485 0.339037 -0.455729) r(0.000000 0.000000 0.000000 1.000000)
SetTransform body 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
SetTransform buffer 0 : w(-0.417485 0.339037 -0.455729) r(0.000000 0.000000 0.000000 1.000000)
Bird
 
Posts: 636
Joined: Tue Nov 22, 2011 1:27 am

Re: Cuda Solver

Postby Julio Jerez » Sat Apr 16, 2022 12:35 pm

it seem is no reading the data form the GPU buffer.
on this line:

Code: Select all
   auto SetTransform = ndMakeObject::ndFunction([this](ndInt32 threadIndex, ndInt32 threadCount)
   {
      D_TRACKTIME();
      const ndArray<ndBodyKinematic*>& bodyArray = GetActiveBodyArray();
      const cuSpatialVector* const data = &m_context->m_transformBufferCpu1[0];


try changing const cuSpatialVector* const data = &m_context->m_transformBufferCpu1[0];

to const cuSpatialVector* const data = &m_context->m_transformBufferCpu0[0];
Julio Jerez
Moderator
Moderator
 
Posts: 12452
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Cuda Solver

Postby Bird » Sat Apr 16, 2022 12:43 pm

Hey, that worked!!!!

ReadData: id(0) w(10.000000 0.000000 0.000000) r(0.000000 0.000000 0.000000 1.000000)
SetTransform buffer 0 : w(-0.417342 0.341314 -0.404124) r(0.000000 0.000000 0.000000 1.000000)
SetTransform body 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
SetTransform buffer 0 : w(-0.417342 0.341314 -0.404124) r(0.000000 0.000000 0.000000 1.000000)
SetTransform body 0 : w(1.709886 0.698956 0.781648) r(0.000000 0.000000 0.000000 1.000000)
SetTransform buffer 0 : w(-0.417342 0.341314 -0.404124) r(0.083237 0.000000 0.000000 0.996530)
SetTransform body 0 : w(1.709886 0.698956 0.781648) r(0.083237 0.000000 0.000000 0.996530)
SetTransform buffer 0 : w(-0.417342 0.341314 -0.404124) r(0.247404 0.000000 0.000000 0.968912)
SetTransform body 0 : w(1.709886 0.698956 0.781648) r(0.247404 0.000000 0.000000 0.968912)
SetTransform buffer 0 : w(-0.417342 0.341314 -0.404124) r(0.327195 0.000000 0.000000 0.944957)
SetTransform body 0 : w(1.709886 0.698956 0.781648) r(0.327195 0.000000 0.000000 0.944957)
SetTransform buffer 0 : w(-0.417342 0.341314 -0.404124) r(0.404715 0.000000 0.000000 0.914443)
Bird
 
Posts: 636
Joined: Tue Nov 22, 2011 1:27 am

Re: Cuda Solver

Postby Julio Jerez » Sat Apr 16, 2022 2:02 pm

ok, now we know what the bug is.
the transforms are colletec in a double buffer, so we will see the scene one frame behind
not doing like that and we will have to way for each frame to complete and copy the data,
and that will kill the fps. This is very similar to double and triple buffering in rendering.

Essentially the gpu works in one scene and set the transform on a buffer while the cpu is collecting the transforms form a previous frame.
the change I told to add, simply get the transform from the same buffer while the dma is copying data. I am surprised it did not crash.
we should find out why is not swapping the buffers.

on this file
..\newton-4.00\sdk\dExtensions\dCuda\ndWorldSceneCuda.cu

find function void ndWorldSceneCuda::Sync()

and set a breakpoint. in m_context->SwapBuffers();

then step in, see if
Code: Select all
void ndCudaContext::SwapBuffers()
{
   dSwap(m_sceneInfoCpu0, m_sceneInfoCpu1);
   m_transformBufferCpu0.Swap(m_transformBufferCpu1);
}


m_transformBufferCpu0 and m_transformBufferCpu1 should be changed each time that function is called and the end of a frame. The fact that when use m_transformBufferCpu0
mean that either ndCudaContext::SwapBuffers() is no called or the swap function has a bug.
Julio Jerez
Moderator
Moderator
 
Posts: 12452
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Cuda Solver

Postby Julio Jerez » Sat Apr 16, 2022 2:15 pm

after we check if that function is called, I will change that as well.
I have on gpu transform buffer and two cpu transform buffers. I think that is a mistake.

I should have instead two GPU transform buffers, and one CPU buffer.
That way the internal latencies are all inside the GPU. they way it is now it implied that but GPU and CPU are in sync but that is no true, the drive does many diaboclical things that can not be controlled.
you can see it is the profiler, where the GPU is several ms behind the host time line.
but the CPU can only work, with the host code.
so what I suspect is that the swap is actually working but the buffer I am getting is not the buffer that the data was copied.

but anyway, let us see of you get that call to sync to verify it is working as expect them we made that change.
Julio Jerez
Moderator
Moderator
 
Posts: 12452
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

PreviousNext

Return to General Discussion

Who is online

Users browsing this forum: No registered users and 47 guests

cron