Collision free instance painting

A place to discuss everything related to Newton Dynamics.

Moderators: Sascha Willems, walaber

Re: Collision free instance painting

Postby Julio Jerez » Tue Jan 18, 2022 12:02 pm

Has you, try more than 90k in optics.

If I set the block size to 64, that's 260k,
It is slow but does not crash or freezes.


On the geometry shader Joe. As far as I can see this
Is an awesome feature of open gl and dc10+
And it is good the hardware vendors supported as a stage.

I read some more opinion about that, and it seem that geometry shaders are slow because whe making temp data, the has to save it to memory.
But to me that apply to compute shaders as well.

What I think is happen was that maybe some bad driver implementation made the temp buffer a funtion of the imput.
Fir example say you want to draw 1000 points, but you source is just one vertex, and you procedurally make them in the sheer, that will only one one thread.
But if instead you pass 1000 inputs. Now the intermediate buffer has one entry for each imput.

The truth is that not one really knows, the origin of that, and to my, I would bet it came from some self appointed expert who saw something like that, made a nick and that became gospel.
But for was I can see, geometry shaders, save not only memry bandwidth, the also cut ton of intermediate calculation.

The fact that you can get the primitive in between the vertex and pixel shader is just an awesome functionality.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Collision free instance painting

Postby Julio Jerez » Tue Jan 18, 2022 1:45 pm

the final geometry shader just becomes this, trivial function because let the function deal with the primitive, either is view space, world space or screen space.
in C++ you have to do ton of transform to get the same result, while the geo shader not only does fewer calculations, it uses order of magnitude more threads to do it in parallel.

Code: Select all
#version 330 core

uniform mat4 projectionMatrix;
uniform vec4 shadeColor;
uniform vec4 quadSize[4];
uniform vec4 uvSize[4];

layout (points) in;
layout (triangle_strip, max_vertices = 4) out;

in vec4 color[];
in vec4 origin[];

out vec4 quadUV;
out vec4 quadColor;

void GetPosit(int index)
{
   quadUV = uvSize[index];
   vec4 p = origin[0] + quadSize[index];
   gl_Position = projectionMatrix * p;
   EmitVertex();
}

void main()
{   
   quadColor = color[0];
   GetPosit(0);
   GetPosit(1);
   GetPosit(2);
   GetPosit(3);
   EndPrimitive();
}


I have it drawing a doll sphere shape, but it is on to the pixel shader to apply the light intensity, and that will be it.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Collision free instance painting

Postby Julio Jerez » Tue Jan 18, 2022 4:09 pm

ok guys, I now added a fake shading and set the scene so that we can see a glance of how the final simulation is going to look like.
I am getting over 1000 fps visually with 64k particles.

they are few more thing taht it needs in the graphics.
the first one is sorting the particle from front to back in camera space.

I notice that the fps changed a lot with the radius of the particles. the reason is the there is a lot of over draw, since the order is random, but this can be improved a lot but a simple sorting the particles by range. say we divide the distance in z into 32 equally spaces ranges. them using the space a particle falls in we sort for front to back in view space. now each particle; will do a z pre pass and that will reduce lot of the pixel band with overdraw.

the secund, is that you will see some z fighting, this is because I need to use the project pixel on the sphere to calculate the z value and add that to the z out in the pixel shader.
I do not know how to do that. do that in open gl, but I can find out.

last is to pass the correct light vector.

but all in all, we can actually see how the result will be and this is like an upper bound, the rest can only improve form there.

after that, is just one to polish the simulation.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Collision free instance painting

Postby Bird » Tue Jan 18, 2022 5:25 pm

Nice, that will be a good preview of what's going on. I'm getting over 3000 fps here

One strange thing I'm noticing is that even though I'm using the same code as you to fill the box with particles, our boxes look totally different with Newton's being way more uniform
Attachments
my_particle_box.jpg
my_particle_box.jpg (110.56 KiB) Viewed 10226 times
newton_particle_box.jpg
newton_particle_box.jpg (254.98 KiB) Viewed 10226 times
Bird
 
Posts: 623
Joined: Tue Nov 22, 2011 1:27 am

Re: Collision free instance painting

Postby Julio Jerez » Tue Jan 18, 2022 5:38 pm

3000 !! Oh mine :shock: :shock: :D

Bird wrote:One strange thing I'm noticing is that even though I'm using the same code as you to fill the box with particles, our boxes look totally different with Newton's being way more uniform


Yes I noticed that too.

The problem is that in the solver there are few parameter that are hard coded.
Rest density, viscosity, mass, gas constant.

They react very different to very small changes.
I have to make a dimensional analysis, that takes some of those and based on the Kerner size, determine the values of the others.

For now I just set those by trial and error until the look more or less nice, but it is not final.

You can see in you set up the particle want to separate from each other. Imagine the kernel size be enve bigger, and the rest doesn't smaller, and low viscosity.
And we get some kind of turbulence gas, but for that we need a different visualization.

Do no worry too much about that for now. There us still work to do, but we are getting close.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Collision free instance painting

Postby Julio Jerez » Tue Jan 18, 2022 6:48 pm

here is the new problem that I can't figure out how to solve. maybe some knwo how.

in the pixel shade, all the pixel will have a z depth that correspond to the flat sprite, bu since I am calculation the point project to the surface, I need to override that z value wit the new caculated one.

this is just the origin of the sprite plus the radius time the normal.
this value will be in veiw space so I have to multiply by the projection matrix, that devide z by the w.
them of that value take z and overrid the x value on the shader.

the code in below



Code: Select all

uniform mat4 projectionMatrix;
//uniform vec4 directionalLightDir;

in vec4 quadUV;
in vec4 quadColor;
in vec4 spriteOrigin;

out vec4 pixelColor;

void main()
{
   float r2 = dot(quadUV, quadUV);
   if (r2 > 1.0)
   {   
      discard;
   }

   vec3 normalInPixelSpace = vec3(quadUV.x, quadUV.y, sqrt (1.0 - r2));

   // TODO
   // here I need to rest the z buffer and prevent z fitgting
   vec4 pointOnSphere = spriteOrigin + vec4(normalInPixelSpace * 1.0f, 0.0f);
   vec4 newPointOnSphere = projectionMatrix * pointOnSphere;

//open gl does not let me chnage this
//   gl_FragCoord = pointOnSphere;

   // TODO
   // now I need the light vector also in pixel space
   // which should be an interpolated input,
   // for now just hack a fix value
   vec3 directionalLightDir = vec3(0.71, 0.71, 0.0);
   float difusse = max (dot(normalInPixelSpace, directionalLightDir), 0.3) ;

   pixelColor = quadColor * difusse;
}



pexle chjadrr output the color of the pixel, is there a way to override the z value.
I know there must be because is possible to set a z bias, for fix z fighting, but how to do in a shader.

is anyone has even done that?
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Collision free instance painting

Postby Julio Jerez » Tue Jan 18, 2022 7:10 pm

never mine, the variable is gl_FragDepth, not gl_FragCoord.
now the shder look like this, and no z buffer figting.

Code: Select all
void main()
{
   float r2 = dot(quadUV, quadUV);
   if (r2 > 1.0)
   {   
      discard;
   }

   vec4 normalInPixelSpace = vec4(quadUV.x, quadUV.y, sqrt (1.0 - r2), 0.0f);

   // here I need to rest the z buffer and prevent z fitgting
   vec4 pointOnSphere = spriteOrigin + normalInPixelSpace * spriteRadius;
   vec4 newPointOnSphere = projectionMatrix * pointOnSphere;
   //gl_FragCoord = pointOnSphere;
   gl_FragDepth = newPointOnSphere.z / newPointOnSphere.w;

   // TODO
   // now I need the light vector also in pixel space
   // which should be an interpolated input,
   // for now just hack a fix value
   vec4 directionalLightDir = vec4(0.71, 0.71, 0.0, 0.0);
   float difusse = max (dot(normalInPixelSpace, directionalLightDir), 0.3) ;

   pixelColor = quadColor * difusse;
}


there is one draback,

this code.

    vec4 pointOnSphere = spriteOrigin + normalInPixelSpace * spriteRadius;
    vec4 newPointOnSphere = projectionMatrix * pointOnSphere;
    //gl_FragCoord = pointOnSphere;
    gl_FragDepth = newPointOnSphere.z / newPointOnSphere.w;

add a huge hit, to the code.

I will probably add a check that is the z value is lower than the current, then early out, since there si so much over draw, but yes it takes a big told in the shader. but to add that early put I need to read the z buffer, and there has to be a way of doing that too.

goes for over 1000 fps to about 600. still fast but huge cost.

start to look really impressive now.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Collision free instance painting

Postby Bird » Tue Jan 18, 2022 7:30 pm

Definitely looking better but now I'm down to around 1000-1100 fps while the particles are still packed together tightly.
Bird
 
Posts: 623
Joined: Tue Nov 22, 2011 1:27 am

Re: Collision free instance painting

Postby Julio Jerez » Tue Jan 18, 2022 7:56 pm

Yeah, that little bit of code take a huge cost. I am not sure if it is because the projection matrix of to recalculate the virtual z position in pixel space, of is because of write the
Gl_fragDepth.

I red over the Chonous forum, that writing to that variable disable z early out. That mad not sence to my.
But apparently reading the z buffer value. Is a opengl 4.2 feature. Not supported by many harweared.

It seem z test, and in general z buffer operations just like blending, are hardware rasterization, operations.

That make sence, because z buffers are usually hierarchical
Or have some spetial organization.

I was alway wondering why it that most game engines. Actually do a z pre pass rendering the big meshes with a shader that only write to the zbuffer instead of just read the z buffer value and early out if is was less that the calculated for that pixel.
It most be because reading the z buffer is not a possibility.

Anyway. If it gets comically. We can have two option. One that does not calculate the z and one that does.
The fast one for quick test.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Collision free instance painting

Postby Julio Jerez » Tue Jan 18, 2022 11:45 pm

yes is is writing to gl_FragDepth that make slow.
form the opengl specifications obe the Khronos forum it say this.
Writing to gl_FragDepth will disable early fragment tests:
Therefore, an implementation is free to apply early fragment tests if the Fragment Shader being used does not do anything that would impact the results of those tests. So if a fragment shader writes to gl_FragDepth, thus changing the fragment's depth value, then early testing cannot take place, since the test must use the new computed value.


in there words, it can read gl_FragDepth but if it writes to it them z test is disabled.
the only way around that would be an occlusion pass.
I hear people doing crazy stuff like a pass with a vertex shader without pixel shader. some other people use compute shader.

but the point is that z prepass will no help us because it will write to depth no matter what

what confuse my, is that the say z test happen after the shader, and yet in the same paragraph the say that the early test prevent expensive fragment, that does not make sense to me.

but is seems that a simple thong like rendering the ball twice,
the first one just apply the vertex shader. and the second the shader and every thong, maybe that would work if what is expensive is writing a new value to the zbuffer,
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Collision free instance painting

Postby Julio Jerez » Wed Jan 19, 2022 12:14 am

oh, it seems there are option that can be added to the shader to make some optimizations.



The built-in fragment shader variable gl_FragDepth may be redeclared using
one of the following layout qualifiers.

layout-qualifier-id
depth_any
depth_greater
depth_less
depth_unchanged

For example:

layout (depth_greater) out float gl_FragDepth;

The layout qualifier for gl_FragDepth specifies constraints on the final
value of gl_FragDepth written by any shader invocation. GL implementations
may perform optimizations assuming that the depth test fails (or passes)
for a given fragment if all values of gl_FragDepth consistent with the layout
qualifier would fail (or pass). If the final value of gl_FragDepth
is inconsistent with its layout qualifier, the result of the depth test for
the corresponding fragment is undefined. However, no error will be
generated in this case. When the depth test passes and depth writes are
enabled, the value written to the depth buffer is always the value of
gl_FragDepth, whether or not it is consistent with the layout qualifier.


By default, gl_FragDepth assumes the <depth_any> layout qualifier. When
the layout qualifier for gl_FragDepth is <depth_any>, the shader compiler
will note any assignment to gl_FragDepth modifying it in an unknown way,
and depth testing will always be performed after the shader has executed.
When the layout qualifier is "depth_greater", the GL will assume that the
final value of gl_FragDepth is greater than or equal to the fragment's
interpolated depth value, as given by the <z> component of gl_FragCoord.
When the layout qualifier is <depth_less>, the GL will assume that any
modification of gl_FragDepth will only decrease its value. When the
layout qualifier is <depth_unchanged>, the shader compiler will honor
any modification to gl_FragDepth, but the rest of the GL assume that
gl_FragDepth is not assigned a new value.

Redeclarations of gl_FragDepth are performed as follows:

// redeclaration that changes nothing is allowed

out float gl_FragDepth;

// assume it may be modified in any way
layout (depth_any) out float gl_FragDepth;

// assume it may be modified such that its value will only increase
layout (depth_greater) out float gl_FragDepth;

// assume it may be modified such that its value will only decrease
layout (depth_less) out float gl_FragDepth;

// assume it will not be modified
layout (depth_unchanged) out float gl_FragDepth;



it seems that it is the shader compiler that detect if a shader writes to gl_FragDepth and automatically set the option, in our case the is a flat bill board, and the recalculate test will always made new z value smaller that the sprite value, so I will set option

layout (depth_less) out float gl_FragDepth;

maybe that can help in some GPUs.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Collision free instance painting

Postby Bird » Wed Jan 19, 2022 12:17 pm

Has you, try more than 90k in optics.

If I set the block size to 64, that's 260k,
It is slow but does not crash or freezes.


I modified an OptiX demo scene and it has no problem rendering 25 million particles. :)

Turned out to be my bug fortunately and everything is fine now. I set the block size to 80 and there's no problem rendering the 512K particles.
Bird
 
Posts: 623
Joined: Tue Nov 22, 2011 1:27 am

Re: Collision free instance painting

Postby Julio Jerez » Wed Jan 19, 2022 12:55 pm

80 x 80 x 80?
that's some serious number for a cpu. What kind of performance are we talking about here?


on the slow down for writing to gl_FragDepth, I added this to the shader

Code: Select all
#version 330 core
// from, chrornos and opengl specs
// "The layout qualifier for gl_FragDepth
// specifies constraints on the final value
// of gl_FragDepth written by any shader
// invocation.
// GL implementations may perform optimizations
// assuming that the depth test fails (or passes)
// for a given fragment if all values
// of gl_FragDepth consistent with the layout
// qualifier would fail (or pass)."

// it does not seem to do anything in my system,
// but I leave anyway maybe some newer
// hardware supports it.
layout (depth_less) out float gl_FragDepth;


it did not do anything in my system, but my gpu is very entry level, and may not support the option.
I leave it anyway, in case some newer gpus can take the hint, and tell the shader compiler not to disable z early test.

basically, reduces to this.
when the shader compiler finds an assignment like the one blow in a fragment shader.

gl_FragDepth = some value;

it will automatically set the rasterize hardware z buffer early out off and the rasterization will process all pixels resulting from the raster operation, sending all pixels to the fragment stage for execution.
it is not the operation that is fast or slow, it is that it will issues more pixel shader calls after the vertex or geometry stage.

it seems the only way around this, is an occlusion culling pass that determine which particles are truly occluded and reduce the actual number of particles to render.

that's outside the scope of newton for now, so I will continue with physics and them later we see if we can make a nicer presentation.

can you try again Bird? maybe you GPU support that layout (depth_less) out float gl_FragDepth;
and does something, I am curious to know.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Collision free instance painting

Postby Bird » Wed Jan 19, 2022 4:04 pm

80 x 80 x 80?
that's some serious number for a cpu. What kind of performance are we talking about here?


Heh, it's a bit slow but here's 1 million particles
https://youtu.be/BcHbs34vfRQ
Bird
 
Posts: 623
Joined: Tue Nov 22, 2011 1:27 am

Re: Collision free instance painting

Postby Julio Jerez » Wed Jan 19, 2022 4:22 pm

dude you are making me blush here.

1 million particles?
https://www.youtube.com/watch?v=EJR1H5tf5wE

if you are going for broke, one thing that may help is setting to the max number of threads,
I usually only use thread equal the cores count, but the engine allows up to 16 threads.

I found that for some reason when using rigid bodies, more threads than cores is actually slower, but with the particles more thread than cores seem faster. It seems to be a black art.

In my system the bios allows for disabling hyper threading, so even the cpu is 4 core 8 thread, is only use 4 core 4 threads.

I do not really like to mess with that, because hardware is not a friend of mine, and I hate when the machine does no come back after those changes.

but seriously, 1 million particles in CPU at interactive time, is something that any render package can actually work with. :shock: :mrgreen: 8) :D :mrgreen: :roll: :o :shock:

at that point I think that the meshing will be that taxing part.

but it gets better, after we finish the physics solve and we get a production.
I will add the move the opencl solver.
since the update is already async, them is does not matter of is CPU or GPU, the latency is already hidden, and now we can talk few millions of those babies. and that is some very serious numbers.
Julio Jerez
Moderator
Moderator
 
Posts: 12249
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

PreviousNext

Return to General Discussion

Who is online

Users browsing this forum: No registered users and 3 guests