on a lighter note.
I was planning to write the grid base sweep and prune broad phase, by before that I when over the current method and I found very by mistakes that translate to a big performance gain.
the biggest mistake was a legacy from 3.xx
this the engine uses a flag called m_equilibrium to determine how a boy need to be update.
over the years that flag has evolved to a complex state machine. but for the most part what the flag mean is that a body do no need to be update if it is in static equilibrium.
this flag is controlled by the solver.
the mistake I made is that the broad phase was also using the same flag to see if it needs to scan the scene for potential new collision contact joint.
here is the problem, from the solve point of view a body can be in static equilibrium, meaning is not moving, and yet the equilibrium flag be false.
there are many reasons for this, the most obvious one is a matrix teleport, but also the sleeping code can set the false off because the neighbor was moving very slowly, and there are more reason still.
but here is the real big problem, imagine a spinning sphere, it will never be in equilibrium, yet the aabb of the sphere in the broad phase never changes. but usen that flag to do the scan will forces the broad phase to run the scan for not change in the scene.
In fact form the broad phase view point item are the aabb of the body, so they look like spheres,
what this mean is that the broad phase needs it own set of flags, that is set when the aabb of a body intersect the aabb of it proxy in the broad phase.
I just made that change and the resole are just beautiful. now almost all the time is accrued to the solve doing the calculation of forces, even for scene that moves a lot or that do not move.
with those changes it take about 15 ms form the stacking scene.
and the gpu spinning cubes goes from 60 ms, to 18 in cpu.
even the cuda got better at 3 ms but in cuda that a fix cost.
here is how the profile look now

- suspension.png (86.91 KiB) Viewed 6471 times
the time is spend in the solve where it really count.
we are now ready to go full steam ahead with GPU.