my speculations, is that in this case a high-level controller will monitor effectors by types. in this case the hand effector will have either, a contact point joint or an attachment joint.
so that joint will constraint the solve will have to obey.
Sounds similar to what i did when mapping the Inverted Pendulum controller result back to the ragdoll.
Basically the IP controller gives a target COM for the next frame, which respects both the current dynamics and solving the balance control problem.
To get there, i rotated the current pose around the virtual ankle joint point of the IP, to meet the target COM. (You would achieve that by pulling hand / head effectors forward along the arc of desired COM movement)
To keep the feet in place, i use the IK solver to correct them. This causes a small error on the COM, but was good enough and i needed no iterations to improve that. (You would achieve that by contact point effectors at the feet)
Now i thought pulling by hands effectors would give a bad result of just stretching the arms, but i see - due to including a balancing objectice into your IK solver - it should rotate/move the whole body as desired, not just the arms?
My plan to improve my naive IP -> ragdoll mapping currently is to make a whole body IK solver, controlled by the same effectors, but also respecting a given COM target. This would make it much easier to me, because my previous method felt like an ugly hack.
I wonder: Using your method, wouldn't it help to add another effector to the pelvis? It should find the target COM easier, maybe reducing iteration counts, and give better control over the pose to the user.
Would this work, or is an effector at internal bones prohibited?
I am pretty sure that's what Spot is doing. if you look at some of eth animation of the real Spot when he is standing form the floor, the first few frames look really uncoordinated, which is a sign of the interpolation to a matching start animation.
I guess they don't use animation at all.
Personally i get all this from the dynamics. For example the improved walk cycle i made after the video works like this:
Imagine IP of a walking model. As it swings forward, we predict the path of the ZMP both in time and space. We can use that to plan the next step and the motion path of the foot effector to get there in time. As we hit the floor, the IP foot teleports to the new stance position, and we repeat the cycle for the other side.
Here we only care about IP and feet effectors. All the other motion emerges from that and is natural and good looking.
No need for motion capture or machine learning. We only need to understand the dynamics of the IP.
Downside: We need to build understanding for each kind of locomotion individually. Running is different the walking, so wee need to work on that specifically. (Remember my one legged hopper, which serves as a model for running)
If we want to do a backflip, again we need to understand the dynamics and work out a backflip controller.
Mixing multiple such controllers and behaviors requires work on the top level each time we add a new feature.
So we don't need training data and processing, but we need a good engineer who also has to study animation data to understand and implement the stuff.
In the end, the manually crafted approach will be harder to implement, but likely gives us better control overall, and will have better performance. So i think.
It's not that i have a choice. No motion capture equipment, no Keanu Reeves, no Tensor Flow GPU, no budget.