I am opening this thread for discussion of the the self balancing character.
the other thread was too old, plus I learned a few things.
the first one is that I underestimated the scope of the project by a great deal in each of my previous tries, so now I start with smaller scope approach.
I am still in the camp of the people that thinks the locomotion is a physics more a physics and control problem than it is an intelligence control problem, and I believe companies like Boston dynamics proven that. Boston dynamics robots are good control with good animatronic and very litle AI.
what It is different now for me, is that it is not an all or nothing problem, there most be some intelligent control, backed with strong physic and analytical control.
I have seen many videos and papers, and I see that many people think the other way, they think it is just an AI problem, in fact I'd seen people that think neutral net make up for the physics, that, I totally reject. Here is an example https://www.youtube.com/watch?v=JgvyzIkgxF0&t=826s
I will go later over the new changes, But in a nut shelve, what I have learned, is that a good physics solver combined with a robust controller is not enough to balance a contraction more complex than a box on a single or double pendulum.
The main reason is that a linear controller is based of a minimization of a quadratic function. Basically you build an optimization function, and you take the step that in the negative direction of the gradient of the function.
the problem is that with an articulated character made of a complex arrangement of bodies, joint and an environment, the energy function to optimized is not convex and has many local minimal.
so calculation the gradient and moving alone the negative of that gradient at some point, may move the entire character to a worse situation.
So to solve this, I am thinking to use a re-informent learning neutral net that calculate a trajectory.
basically calculate the state of the Character an arbitrary number of step in the future.
we can trained to say see if the energy goes down say 8, 12, 16 steps ahead.
the number can be tuned. but by doing this we can see if the character is in a better state has he continue doing that.
of course move the player ahead is not an options, but that what's where reinformeten learning with policies gradient algorithm are very good at.
at typical example with be something like:
say we have a walk animation, and we train a neural net to learn the animation.
at the end what we get is a DNN (deep neutral net) that give the inputs (joint angles) output the same joint angle.
That does not seen to do much, but now consider that we are playin this character on the same flat terrain that was use for the animation.
since the character only know how to walk, the result will be that he will paly the walk animation in a more expensive way.
but now consider that the terrain has some small perturbations, and we use a Collison system to detect the the feet locations and how far there are form the location in the animation.
animation system does this with Bad IK and blending. in the pass I had done this with Gaussian processes, basically this is find the close pose assuming each joint position is a gaussian distribution with mean centered at the walk cycles. the result are never satisfactory because gaussian interpolattion tend to smooth high frequencies. but that a different story.
in the new3 system we feet the pose at position t of the animation, but now the feet are offset by a small amount. and we read the results.
the new set of angle can be good or bad, but that whet Training can feet the problems, because by repeating the operation is possible to adjust the network weight so that the result get as best as it could. basically the net becomes a non lineal blender.
I have seen few paper that are doing similar thing, all claiming there are doing it better that the other, that, but for me is just an example.