A place to discuss everything related to Newton Dynamics.
Moderators: Sascha Willems, walaber
by Julio Jerez » Tue May 24, 2022 3:12 pm
the nature of GPU make so that it need latency.
that's how gpus are designed.
I am trying to make so that there will be one frame, because there is a frame sync at the begining of each frame, and the fate is double buffer so the date you will see the date that was calculated in the preview frame.
it is too early still but when we are further alone.
one functionality that we be added is that some bodies and joint will have special options,
GPU and parallax solver are many for large scenes,
but experience has show that in general most scenes are made for many object that share the same proprieties.
it is fine to have classes file force an torque call back, or joints, and materials.
but when you get thousand of bodies this because a bit overhead.
the is more severe for GPUs.
but is you look a typical scene most bodies are just gravity, so it makes sense that we can have option that let the engine cash those things and the app can specify for what objects what the feed back.
this way we can have scenes with lot of stuff that live their entire life in the GPU, even for rendering. since CUDA has interoperability for all graphics apis,
there is also another functionality that newton supports, which is that the solve will be the same in gpu and CPU, ther means that we can septate island to objects that can be in cpu and object that cane be in gpu.
this way for example we can have special objects like player, vehicles and things like that, for which latency will make bad, experience, but not if the are handle by the CPU.
-
Julio Jerez
- Moderator
-
- Posts: 12249
- Joined: Sun Sep 14, 2003 2:18 pm
- Location: Los Angeles
-
by Julio Jerez » Tue May 24, 2022 6:05 pm
for what I can see the function clock is simply useless for measuring time across kernels calls.
it does not even mention in the docs or blogs.
https://developer.nvidia.com/blog/how-i ... s-cuda-cc/I will try with the event thing.
edit: I have it now simply using events. not I can keep going while monitoring the performance impact as the system are been added.
so far that scene with one past sort is still under 1 ms. on a 1060
It would be nice is is was less, but is not too bad, since now the 1060 is an entry level card.
-
Julio Jerez
- Moderator
-
- Posts: 12249
- Joined: Sun Sep 14, 2003 2:18 pm
- Location: Los Angeles
-
by JoshKlint » Wed May 25, 2022 8:33 am
I wonder if there is a way to transfer the orientations of physics objects into a storage buffer that can be read by the renderer. That would get the results on the screen without having to make a round trip to and from the CPU.
-
JoshKlint
-
- Posts: 163
- Joined: Sun Dec 10, 2017 8:03 pm
-
by Julio Jerez » Sun May 29, 2022 5:52 pm
Wow programing this cuda sdk is an endurance exercise.
It is a programming tour de force marathon.
I think I now have the two routines that I need to go on
Prefix scan and counting sort.
I had the before but not using dynamic parallelism which made the too low level and complex.
It is eassy to make simple demor that run one or two kernel in cuda. But one you pass that point when it come to large scale projects. This is very complex
But anyway, I will now make so that the boxes rotate like before, that way it will street test the sort routine.
And see if it does not fail on the 1060.
-
Julio Jerez
- Moderator
-
- Posts: 12249
- Joined: Sun Sep 14, 2003 2:18 pm
- Location: Los Angeles
-
by Julio Jerez » Mon May 30, 2022 3:17 pm
oh I am now at the same point that I was when I was no using dynamics parallelism.
The good knew is that the code now work on the gts 1060, which is a important part since it is still a very popular card.
on the dynamics parallelism, for what I can see this is an Nvidia moving target, ton of stuff that is in state of changes, and is quite different from SDK to SDK and with it the same SDK, very different behavior form hardware architecture to hardware architecture.
I started to think that something this unpredictable and random must have just about 9 into 10 people n complaining and giving really bad reviews.
but I believe that Nvidia does not expect people to use as a CUDA feature. they expect people to use it via the Nvidia CDP (cuda dynamic parallelism) library.
this is why every question posted over the forum is simply answered by redirecting the person so use the a CPD version of what the user try to do.
and as far as I am concern I will not touch that library, I browsed over, is and it is full of defines and checks, for make it work across different hardware.
Anyway, I believe a have a good undertaking of how to use it now and I will keep going.
now I have a stable working in both GPUs that I am testing: 1060 and 1660.
one last thing is that the debug version using dynamics parallelism is ridiculous slower, to the point that becomes unusable.
but on the other hand, the release version I seems is a little fasters. So we lose some and we win some.
-
Julio Jerez
- Moderator
-
- Posts: 12249
- Joined: Sun Sep 14, 2003 2:18 pm
- Location: Los Angeles
-
by Bird » Mon May 30, 2022 3:50 pm
Congratulations on making it through the first stage of the Ironman Triathlon.
I just synced to the latest version and it is not working here.
In Debug mode, I'm hitting the assert on line 136 of ndCudaDeviceBuffer.h
In Release mode, the bodies render briefly and then disappear.
I'm using visual studio 2022 now with Cuda 11.7
-
Bird
-
- Posts: 623
- Joined: Tue Nov 22, 2011 1:27 am
by Julio Jerez » Mon May 30, 2022 5:26 pm
that's my fear, that to develop for CUDA we need sample of each generation of GPU, because they all are different even within the same architecture.
I too am in vs 2022 sdk 11.7 so we should get identical result.
but that assert, i think is something on my side that I committed, I believe I have it fixed.
can you sync and try again? please.
also for some reason debug is extremally slow, when using dynamics parallism, I have no idea why, but is so slow that windows kill the app. at 2. seconds and the update in cuda take more than 2 seconds.
if you try debug make the scene very small.
edit:
the slowdown in debug was a bad assert I have in a parallel loop. is fixed now.
-
Julio Jerez
- Moderator
-
- Posts: 12249
- Joined: Sun Sep 14, 2003 2:18 pm
- Location: Los Angeles
-
by Bird » Mon May 30, 2022 5:56 pm
Sorry, no luck with the latest version I just synced. Still hitting the assert on line 136 of ndCudaDeviceBuffer.h
cudaStatus = cudaMalloc((void**)&newArray, newSize * itemSizeInBytes);
the error code for cudaStatus is "cudaErrorUnsupportedPtxVersion (222)"
-
Bird
-
- Posts: 623
- Joined: Tue Nov 22, 2011 1:27 am
by Julio Jerez » Mon May 30, 2022 6:09 pm
Yes I thought it will still fail. And that's similar to what I had when trying the 1060.
I did not bother to see what was wrong g since I planned to move to dyn parallelism. But now that's a very show stoper problem.
Because even if I had the same hardware, I did really found what was wrong.
For the must part all the fixes I had made are inference about the hardware base on what I see in the profiler.
But even the tools behave quite different from hardware to hardware.
I will to to se a break point to see where that call come from.
-
Julio Jerez
- Moderator
-
- Posts: 12249
- Joined: Sun Sep 14, 2003 2:18 pm
- Location: Los Angeles
-
by Julio Jerez » Mon May 30, 2022 6:12 pm
Bird wrote:cudaStatus = cudaMalloc((void**)&newArray, newSize * itemSizeInBytes);
the error code for cudaStatus is "cudaErrorUnsupportedPtxVersion (222)"
That's alone the line of what I am saying.
This nonsense force you to go in a wild goose chase of compatibility, from forum to forum and people who has had similar problems.
I am not using anything special other than dyn parallelism. so this should just run on min spec higher that 5.52
-
Julio Jerez
- Moderator
-
- Posts: 12249
- Joined: Sun Sep 14, 2003 2:18 pm
- Location: Los Angeles
-
by Bird » Mon May 30, 2022 6:17 pm
After googling I think it might be a graphics card driver problem. I'll try a new driver
-
Bird
-
- Posts: 623
- Joined: Tue Nov 22, 2011 1:27 am
by Julio Jerez » Mon May 30, 2022 6:24 pm
the doc says this:
cudaErrorUnsupportedPtxVersion = 222
This indicates that the provided PTX was compiled with an unsupported toolchain. The most common reason for this, is the PTX was generated by a compiler newer than what is supported by the CUDA driver and PTX JIT compiler.
that suggest that you have a driver older than the SDK.
since you migrated to sdk 11.7 you probably did not selected the Driver.
Again that's another quirk, Imagine you write an app and you distribute compiled code,
an end user will not have SDK non sense so I will fail on all clients with older drivers.
Nvidia may make fast GPUs, but they are an extreme mediocre software developer.
there is so much that leaves a lot to be desired in that SDK.
starting from the Docs which * big time, gives erroneous information and you are better off not using it.
can you check if your driver is the latest?
-
Julio Jerez
- Moderator
-
- Posts: 12249
- Joined: Sun Sep 14, 2003 2:18 pm
- Location: Los Angeles
-
by Bird » Mon May 30, 2022 6:42 pm
This is the driver in coming from the last CUDA sdk tool kit.
Unfortunately, with that driver has problems that caused some OptiX code I use to crash. You can't win
-
Bird
-
- Posts: 623
- Joined: Tue Nov 22, 2011 1:27 am
by Dave Gravel » Mon May 30, 2022 6:46 pm
Yes and coding with lower driver version give more risk to get deprecated function
Last edited by
Dave Gravel on Wed Jun 01, 2022 7:38 pm, edited 1 time in total.
-
Dave Gravel
-
- Posts: 800
- Joined: Sat Apr 01, 2006 9:31 pm
- Location: Quebec in Canada.
-
Return to General Discussion
Who is online
Users browsing this forum: No registered users and 5 guests