Improving the demo render quality and performance

A place to discuss everything related to Newton Dynamics.

Moderators: Sascha Willems, walaber

Re: Improving the demo render quality and performance

Postby Dave Gravel » Sun Feb 03, 2019 6:32 pm

Ok I find the error, error C7507: OpenGL does not allow attributes of type ivec4
I think you can just use a simple vec4 and cast it as int...
You search a nice physics solution, if you can read this message you're at the good place :wink:
OrionX3D Projects & Demos:
https://www.facebook.com/dave.gravel1
http://orionx3d.googlepages.com/
https://www.youtube.com/user/EvadLevarg/videos
User avatar
Dave Gravel
 
Posts: 690
Joined: Sat Apr 01, 2006 9:31 pm
Location: Quebec in Canada.

Re: Improving the demo render quality and performance

Postby Dave Gravel » Sun Feb 03, 2019 6:36 pm

If you add this under the version in vs shader it work good now

#version 120
#extension GL_EXT_gpu_shader4 : enable

Now with 100 characters animated in release I get 450 500 fps.

I lose around 100 fps when I record.
https://www.youtube.com/watch?v=7s4vSrSMCFE

Edited:
Just to make sure I have test to cast ints from vec4 directly and it seen to work good without use #extension GL_EXT_gpu_shader4 : enable

Code: Select all
glVertexAttrib4f(boneIndices, (float) boneIndex.m_boneIndex[0], (float) boneIndex.m_boneIndex[1], (float) boneIndex.m_boneIndex[2], (float) boneIndex.m_boneIndex[3]);


In shader
Code: Select all
attribute vec4 boneIndices;

   for (int i = 0; i < 4; i++) {
      weightedNormal += vec3 (matrixPallete[int(boneIndices[i])] * n * boneWeights[i]);
      weightedPosition += matrixPallete[int(boneIndices[i])] * gl_Vertex * boneWeights[i];
   }


Edited:
About the opengl debug.
https://sites.google.com/site/oxnewton/DebugGL.txt
--
In the code I use glEnable for enable the debug, In newer opengl I think the command is deprecated.
In my plugin engine I use glfwWindowHint(GLFW_OPENGL_DEBUG_CONTEXT, GL_TRUE);

The output give something like this https://www.youtube.com/watch?v=L7rJ_oB-0vc
You search a nice physics solution, if you can read this message you're at the good place :wink:
OrionX3D Projects & Demos:
https://www.facebook.com/dave.gravel1
http://orionx3d.googlepages.com/
https://www.youtube.com/user/EvadLevarg/videos
User avatar
Dave Gravel
 
Posts: 690
Joined: Sat Apr 01, 2006 9:31 pm
Location: Quebec in Canada.

Re: Improving the demo render quality and performance

Postby Julio Jerez » Mon Feb 04, 2019 9:05 am

Now with 100 characters animated in release I get 450 500 fps.

that's not too bad. that model has 12500 vertices.
that's more or less the mid to high end range of vertex count of models in games,
in fact high end model in game can have even fewer points,
but the make up normal maps and more detail textures.

ok I try both methods for the matrix indes,
the one using this #extension GL_EXT_gpu_shader4 : enable
makes no difference in my system

the one that passes the indices as a vec4, contrary to what I expected, it seems
to gives some marginal speed bump. The total fps using ivec4 is below 300 pfs,
the one using vec4 and casting to int gives a definite above 300 fps.
it seems like a 20 to 30 fps faster.
I was expecting the opposite since the shader with float has to do casting casting.
but maybe there is a higher penalty for reading an integer from memory.
anyway I like the version using float better.

here is an idea, maybe there is a way to make just one shader the takes an uniform like a render state. then with a switch case call the function that will do the specific rendering.

this will remove the hazard of having to call the function glUseProgram each time something is going to be rendered. has any one done that?
not that I am going to do it, but why is that bad?
I guess that adding the switch case is a cost that apply to each pixel, but will be just one instruction.

anyway I consider the skinning a done deal now.
back to the real work full IK motions.
Julio Jerez
Moderator
Moderator
 
Posts: 10954
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Improving the demo render quality and performance

Postby JoeJ » Mon Feb 04, 2019 9:38 am

Julio Jerez wrote:here is an idea, maybe ther is a way to make just one shader the takes and uniform like a render state. then with a switch case call the funtion that will do the rendering.

thsi will remove the hazard of having to call the funtion glUseProgram each time something is goin to be rendered. has any one done that?


Todo this, you'd need a profiling tool that tells you register usage. If packing two shaders into one wastes too many of them, occupancy will go down. Only using such tools it's possible to understand GPU performance.

I remember one work of a guy who did the switch in a hacky way. He grabbed dozens of shaders from shadertoy, and he was able to show a scene of cubes where each cube face showed a different shader toy, all with one big shader and a jumptable. No register waste IIRC.
But using regular APIs you can not do such thing efficiently.

But you can use indirect dispatches and distribute work as necessary from GPU itself. With low level APIs and prerecorded command buffers that's much faster than switching shaders etc. from GPU.
User avatar
JoeJ
 
Posts: 1147
Joined: Tue Dec 21, 2010 6:18 pm

Re: Improving the demo render quality and performance

Postby Dave Gravel » Mon Feb 04, 2019 12:48 pm

Yes I get similar gain with the vec4 float version.
I never have try to do all 100% in the same shader, but I have already try pretty big shader scene.
With multiple shadows and multiple lights + bump mapping and some more effects.
I use a lot the switch case for shad & light pass in this shader, I find the speed result pretty good.

I have already read that is better to make it in multiple shaders and share with other shaders pass, but for my personal case I have get good result with a big shader.
Maybe the problem begin when you try to use more shaders and when you already use a big one i'm not sure.
Maybe it can become more complex to mix the big shader render pass with other shader pass.
You search a nice physics solution, if you can read this message you're at the good place :wink:
OrionX3D Projects & Demos:
https://www.facebook.com/dave.gravel1
http://orionx3d.googlepages.com/
https://www.youtube.com/user/EvadLevarg/videos
User avatar
Dave Gravel
 
Posts: 690
Joined: Sat Apr 01, 2006 9:31 pm
Location: Quebec in Canada.

Re: Improving the demo render quality and performance

Postby Julio Jerez » Mon Feb 04, 2019 1:11 pm

I think what Joe say is correct, basically shader are gpu kernel, when you make the using say compute shader of some othe language, you specify the with of the simd word.
my guess is that the shader compile does some tricks to estimate the simd word form a vertex and a pixel shader. if you made a compound shader then the with may be more than needed to carry variables that will no be used, so it will decrease what the call occupancy. I agree I think that's a bad idea each basically each compute unit will have less throughput and t'eh drive will issue more kerner.
maybe is no a big deal for GPU with many compute units, but may be a problem for GPU like consoles and mobiles. anyway that was a bad idea.


here the new problem. I am try to compile the linux, but I can't get to link because I am missing a linux library but I do not know which one and I can't find an answer in google. this is the error
[ 80%] Linking CXX executable demosSandbox
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libglfw.so: undefined reference to symbol 'XGetErrorText'
//usr/lib/x86_64-linux-gnu/libX11.so.6: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
applications/demosSandbox/CMakeFiles/demosSandbox.dir/build.make:1699: recipe for target 'applications/demosSandbox/demosSandbox' failed
make[2]: *** [applications/demosSandbox/demosSandbox] Error 1
CMakeFiles/Makefile2:942: recipe for target 'applications/demosSandbox/CMakeFiles/demosSandbox.dir/all' failed
make[1]: *** [applications/demosSandbox/CMakeFiles/demosSandbox.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2


basically this library is making a call to this function
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libglfw.so: undefined reference to symbol 'XGetErrorText'

does anyone know what librarary nee to be added to CMake get function XGetErrorText?
Julio Jerez
Moderator
Moderator
 
Posts: 10954
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Improving the demo render quality and performance

Postby Shaderman » Mon Feb 04, 2019 4:21 pm

Julio Jerez wrote:does anyone know what librarary nee to be added to CMake get function XGetErrorText?


Sounds like Xlib or libX11.
Shaderman
 
Posts: 66
Joined: Tue Mar 08, 2016 2:51 am

Re: Improving the demo render quality and performance

Postby JoeJ » Mon Feb 04, 2019 5:35 pm

Dave Gravel wrote:Maybe the problem begin when you try to use more shaders and when you already use a big one i'm not sure.
Maybe it can become more complex to mix the big shader render pass with other shader pass.


It's always best to try out, on multiple GPUs.
For example, AMD recommends to use shaders with long runtime (or many iterations) for async compute in cases where the sheduler can become a bottleneck.

Also, as long as no barriers are in the way (which you only know for certain when using low level APIs because you handle them yourself), the GPU can run different shaders (or draw calls / dispatches) even on the same CU, without any intention from the programmer. So it can happen that a 'big' shader runs well if you test it in isolation, but when adding other things it can end up harmful.

Other things that have an affect is cache trashing, which happens if too many high bandwidth shaders run in parallel and constantly trash each others cached memory. (Consoles have commands to limit occupancy on purpose to prevent this - on PC you can only 'waste' LDS memory or registers.)
... just to give an example why it's hard to predict or understand GPU performance without profiling tools. (I have no idea why int4 vs. float4 affects performance - but the guessing you do here is just that: guessing. Profiling tool could give true answer.)

Most important is to keep an eye on register and LDS usage to aim for high occupancy, but there is much more than just that, for example memory access patterns. On GCN accessing memory with larger but exact power of two strides can kill performance. This is something profiling tools do not tell so you have to know it. Changing a list size from 256 to 257 with one unused can double performance.
The size (lines of code) of a shader has very little effect, usually. You can write lots of code as long as you take care there are not too many presistent variables that waste registers. (push / pop to LDS can be a win here.)

Julio Jerez wrote:my guess is that the shader compile does some tricks to estimate the simd word form a vertex and a pixel shader.


I don't know how wide pixel and vertex shaders are and if this is dynamic or fixed 32 on NV and 64 on AMD.
With compute the width is often free to choose and it's worth to test this out. Depends on GPU again, but can make a big difference like 20%.
For me the width mostly depends on problem size so i can rarely choose. There are some details here too, GCN for example can only reach max occupancy of 80% if you choose width of 128, but 100% with widths 64 or 256. Still in practice 128 may perform best.
Mostly performance increases linear with increased occupancy, but above 80% the wins become very small for me.
Occupancy means how many other wavefronts are available to switch to in case the actual one does a memory operation (so it's the same like hyper threading on CPU), but i see increases also in situations when a shader does heavy ALU work and not much access to memory.

All wavefronts in flight share the same register file and LDS memory, so you often end up calculating: 'I need a workgroup width of 256. How many LDS and registers can i use to reach full occupancy?'
This question often dictates the algorithm you choose to solve a problem.

Profiling tools can help here too. CodeXL is so good i still use OpenCL just to use it. I assume Radeon GPU profiler has catched up in the meanwhile, but i have not tried for a long time. Also it's DX12/VK only. I have never used NSight yet.
User avatar
JoeJ
 
Posts: 1147
Joined: Tue Dec 21, 2010 6:18 pm

Re: Improving the demo render quality and performance

Postby Julio Jerez » Tue Feb 05, 2019 5:22 pm

wow, I jus tested the skinning on a GTX 970 and I get 950 fps,
I am now in the process to reuse the same model so that the loading is faster and does now it so much memory.
for now it simply pass the vertex to the pixel shader until I fix the skin shade copy constructor,
but the results at impressive for that nvidia card. In fact is debug is about same speed than in my home system. very impressive for NVidia.

the loading is almost instant instantaneous.
Julio Jerez
Moderator
Moderator
 
Posts: 10954
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Improving the demo render quality and performance

Postby Julio Jerez » Tue Feb 05, 2019 8:12 pm

ok I take my comment back, rendering is no 900 fps after I enable the skinning is 350.
Loading is 100 time faster, not because it is faster by because it uses the static mesh as an asset, so it use 100 time less memory.
It makes sense the GPU isn't faster or slower since GPU do not have cache memory, this will be fast in CPU because with CPU, cache makes a big difference, however I deleted all the software skinning, no need to maintain to rendering pipelines.
Julio Jerez
Moderator
Moderator
 
Posts: 10954
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Improving the demo render quality and performance

Postby Julio Jerez » Wed Feb 06, 2019 12:21 am

well after now testing on my own system and AMD RX 480, I do see some gain in fps, is a solid 350, as oppose to before that was around 300 fps with 101 players.
Julio Jerez
Moderator
Moderator
 
Posts: 10954
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Improving the demo render quality and performance

Postby Dave Gravel » Wed Feb 06, 2019 12:32 am

Yes nice it load very fast here and I get 460 500 fps when I move around the 101 models.
I have a GTX 1060 3go.
You search a nice physics solution, if you can read this message you're at the good place :wink:
OrionX3D Projects & Demos:
https://www.facebook.com/dave.gravel1
http://orionx3d.googlepages.com/
https://www.youtube.com/user/EvadLevarg/videos
User avatar
Dave Gravel
 
Posts: 690
Joined: Sat Apr 01, 2006 9:31 pm
Location: Quebec in Canada.

Re: Improving the demo render quality and performance

Postby Julio Jerez » Wed Feb 06, 2019 11:46 am

Shaderman wrote:Sounds like Xlib or libX11.

thank you.
that was the problem I added X11, maybe too high level but Xlib or Xlib11 failed.
anyway now the Linux build runs.
In my laptop with 101 skin meshes, is 25 fps, but this si a really a wimpy laptop,
Julio Jerez
Moderator
Moderator
 
Posts: 10954
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Previous

Return to General Discussion

Who is online

Users browsing this forum: No registered users and 8 guests

cron