I spent the last two weeks working on that solver just to find out that is was fundamnatially wrong, basically is showed that is was possible that after one interation step the solution diverged instead of converge to a solution.
In general this is not a problem if you can run a sufficiently large number of iteration, and some iteration the residual decreases on average, but that is one of the problems with real time you can no run a unlimited number of iterations, so what is needed is an algorithm with a monotonically decreasing convergent rate per iteration.
for PSD matrices Gauss sidle and gradient descent fall on that category.
Jacobi and conjugant gradient do not. so the are far slower because they need more iterations for the type of matrices for which the are known to converge.
It is know that for diagonal dominant matrices Jacobi converge, however not all PSD matrices are diagonal dominant, and that was my huge mistake
People has tried to implement variants of Jacobi, to make it fixable for high perforce computing, by threading convergence rate for parallel execution. I tried one of those variance without success, or so I thought.
I spent some time debugging the code, and there are not bug, the solver simply diverge, so I spent time reading some of the literature again to see what I have wrong.
The variant of the Jacobi solver I am trying is called Weighted Jacobi method
https://en.wikipedia.org/wiki/Jacobi_method
basically what this is, is that each partial variable is calculated, they to do not replace old variable value, instead only a fraction contribut to the total solution. like this.
say x0 is a partiltial solution and you calculate an new x1, you do not replace x0 with x1, instead you do
x0 = x0 + (x1 - x0) * w
where w is a weight factor between 0 and 2.0
as you can see a value of w = 0 make make the solution always x0, and value of 1 is a full Jacobi and a value larger than 1.0 make the solution very unstable.
for PSD matrices using Gauss sidle, a value of w from 1.1 to 1.3 accelerates converge a great deal,
however for Jacobi w should always be less than 1.0, many books on numerical method recommend 2/3
my big mistake was how much lower than 2/3 than value should be, it turned out that values should be very, very small, something like 0.66 or even 0.05
So as you can see is not only that Jacobi has poor convergence rate is that it is much, much worse that I could possible imagine it was going should be.
I was trying with the fix values of 2/3 (0.66) as many people recommend and all I got was explosions for any relatively complex set ups, this wikipead article say that a way to calute the weigh is by doing this.
Weighted Jacobi method[edit]
The weighted Jacobi iteration uses a parameter
ω {\displaystyle \omega }
to compute the iteration as
x ( k + 1 ) = ω D − 1 ( b − R x ( k ) ) + ( 1 − ω ) x ( k ) {\displaystyle \mathbf {x} ^{(k+1)}=\omega D^{-1}(\mathbf {b} -R\mathbf {x} ^{(k)})+\left(1-\omega \right)\mathbf {x} ^{(k)}}
with
ω = 2 / 3 {\displaystyle \omega =2/3}
being the usual choice.[2]
Convergence in the symmetric positive definite case[edit]
In case that the system matrix
A {\displaystyle A}
is of symmetric positive-definite type one can show convergence.
Let
C = C ω = I − ω D − 1 A {\displaystyle C=C_{\omega }=I-\omega D^{-1}A}
be the iteration matrix. Then, convergence is guarenteed for
ρ ( C ω ) < 1 ⟺ 0 < ω < 2 λ max ( D − 1 A ) , {\displaystyle \rho (C_{\omega })<1\quad \Longleftrightarrow \quad 0<\omega <{\frac {2}{\lambda _{\text{max}}(D^{-1}A)}}\,,}
where
λ max {\displaystyle \lambda _{\text{max}}}
is the maximal eigenvalue.
The spectral radius can be minimized for a particular choice of
ω = ω opt {\displaystyle \omega =\omega _{\text{opt}}}
as follows
min ω ρ ( C ω ) = ρ ( C ω opt ) = 1 − 2 κ ( D − 1 A ) + 1 for ω opt := 2 λ min ( D − 1 A ) + λ max ( D − 1 A ) , {\displaystyle \min _{\omega }\rho (C_{\omega })=\rho (C_{\omega _{\text{opt}}})=1-{\frac {2}{\kappa (D^{-1}A)+1}}\quad {\text{for}}\quad \omega _{\text{opt}}:={\frac {2}{\lambda _{\text{min}}(D^{-1}A)+\lambda _{\text{max}}(D^{-1}A)}}\,,}
where
κ {\displaystyle \kappa }
is the matrix condition number.
Recent developments[edit]
so I tried to experiment with arbitrary low w value to see is I get something stable. and I got at 0.2f
I you guy sync I committed the solve using 4 thread and solving a stack of variable size masses and at a value of 0.2 the system beaocme stable.
It just happens that the inverse of the highest eigen value of teh interationmass matrix is about 1 / 7 = 0.143
but as you can see the solver apply that value all joints, this is why people has come up with schedules relaxation, this is jsut a fancy way to apply different weighs to different joints base on th inverse of the Eigen value.
anyway the moral is that as you can see is possible to have a parallel solver working stable if we accept that we need many more iterations. and this is the solver become an option.
Basically for CPU with low core count is only of theoretical value.
But if we move this to a GPU with say 32 or more cores, then it becomes a practical solution.
Imagine a GPU with 128 cores, doing say 4 time as many passes, that's like using 32 cores on a cpu for the same perfomance per core.
there is no way even a 16 cores CPU can compete with that.
Therefore the parallel solve project is not dead.
Joe or sweenie if you sync, there is a stack that run with run in for threads, if you switch to
"solve large island in parallel" you will see that is about twice as fast, but soft and smoggy, not too jittery which is important.
the stack does not explode or colapses, it does not goes to rest either but we knew that.
This seem like a good sign.