About Physics modeling.

by **Julio Jerez** » Wed May 17, 2023 12:32 pm

finally in summary these are the set of equation we need to evaluate and solve in a loop at each step.

Code: Select all: a) Mt = sum (m(i)) b) cg = sum (p(i) * m(i)) / Mt c) Vcg = sum (v(i) * m(i)) / Mt d) Icg = sum (I(i) + covarianMatrix(p(i) - cg) * m(i)) e) T0 = sum [w(i) x (I(i) * w(i)) - Vcg x (m(i) * V(i))] f) T1 = sum [(p(i) - cg) x Fext(i) + Text(i)] g) Bcg = (Icg^-1) * (T0 + T1)

so as a long as Bcg.x and Bcg.z are larger that some small value we apply some joint action to reduce.
Bcg.x and Bcg.z toward zero, and repeat the loop.

of course, there are some optimizations on the implementation, as it is clear that many quantities for constant for the step iterations.
now to the batcoding robin
I always wanted to sat that

,
but I should not be too happy too soon since.
I have been here before an failed, it is just that I never develop a method on a web page.

by **Julio Jerez** » Wed May 17, 2023 3:00 pm

oh wow, there is another important simplification, which seems counter intuitive, but the result makes even more sense. If we take equation c and e.

c) Vcg = sum (v(i) * m(i)) / Mt
e) T0 = sum [w(i) x (I(i) * w(i)) - Vcg x (m(i) * V(i))]

we can expand equation e and we get
e) T0 = sum (w(i) x (I(i) * w(i)) - sum(Vcg x (m(i) * V(i))

but Vcg is constant, so we can factor it out of the second sum

e) T0 = sum (w(i) x (I(i) * w(i)) - Vcg x sum(m(i) * V(i)

now is we use the expression of Vcg in the secund term we get

e) T0 = sum (w(i) x (I(i) * w(i)) - sum (v(i) * m(i)) / Mt x sum(m(i) * V(i)

e) T0 = sum (w(i) x (I(i) * w(i)) - (sum (v(i) * m(i)) x sum(m(i) * V(i)) / Mt

but sum (v(i) * m(i)) is just the total linear momentum Pt

e) T0 = sum (w(i) x (I(i) * w(i)) - Pt x Pt / Mt

an again the cross product of two collinear vector is zero.
so equation E reduces to

e) T0 = sum (w(i) x (I(i) * w(i))

just the sum of all Gyro torques,
and since Vcg is not used in any other equation, the set of equation is now

Code: Select all: a) Mt = sum (m(i)) b) cg = sum (p(i) * m(i)) / Mt d) Icg = sum (I(i) + covarianMatrix(p(i) - cg) * m(i)) e) T0 = sum (w(i) x (I(i) * w(i)) f) T1 = sum [(p(i) - cg) x Fext(i) + Text(i)] g) Bcg = (Icg^-1) * (T0 + T1)

which is quite remarkable, the virtual single rigid body has the exact same equation as a normal rigid body and does not depend on the linear velocities.
Icg * Bcg = sum (p(i) - cg) x Fext(i) + Text(i) + w(i) x (I(i) * w(i))

hard to believes, the linear velocity is not part of it, but if we think about it, a const linear velocity does not generates any acceleration. but a const angular velocity does.

so the term that in fact enforce conservation of momentum are

sum (p(i) - cg) x Fext(i) + w(i) x (I(i) * w(i))

we know of w(i) x (I(i) * w(i) but we never saw (p(i) - cg) x Fext(i)

(p(i) - cg) x Fext(i) seems to be torque produced by each net force relative to the center of mass of the entire set.

and again, if the set was a single body, then p(i) = cg therefore (p(i) - cg) x Fext(i) = zero;
And the expression are identical. That's a good sign.

by **Julio Jerez** » Sun May 21, 2023 2:12 pm

alright, for the people following this.

I implemented this new method on the unicycle robot.
have a lots of problems.
first:

I made the controller act on the current state and the result was what that calculation always indicated an action that made the unbalance worse.

I attributed to some bug on the inverse dynamic solver. But after debugging and finding few bugs, the error persisted, and if anything, it became even worse after the fix.
My mistake was that, I thought that the current state of the model was enough to capture the dynamics of the model.

this is quite correct for an analytical model, but the robot is not quite an analytical model, at each step the model actually changes, ex, the contact on the next step can change. And that just one aspect.

so after drawing the model on paper, and stepping over the code, I found that yes indeed, the corrections where physically correct. but the correction based on the current state, can lead to an state that can actually increase the cause of the unbalance.

then I realized that, indeed the model was not wrong, the problem is that the correction assumes the model state will be the same after the correction is applied.

for the people who are into the Q learning thing. This is a good demonstration of why the q learning algorithm works that way.

basically, in Q learning, the process is as follows.

-an agent makes an observation of the environment state S.
-take an axion A to generate a new state S'.
-with these parameters S, A, S' collect a reward R
-Make a table of values S, A, S' R,
-and then apply what is known as the Bellman equation.
-repeat that until the process reach convergence.

Apparently it can be proven rigorously that the algorithm converge to the best strategy, given that the entire search space is a Markov Decision Processes that can be explored completely.

Of course it is immediately obvious that the space can be huge, so the value of the algorithm is just academic.

However for very small search spaces, the algorithm can be tested,.
For large spaces, that's where the last 20 years of AI research has come up of more sophisticated stochastic ways to do probabilistic sampling and come up with very convincing algorithm that can find not optimal solutions, but very good local minimal solutions.

For this case, we have a small search space. So we first do brute force search, them after that, if we succeed, we will move to deep q leaning and solve the same problem,
Them after that, we move to the more sophisticated policy gradients that can handle articulated humanoid, and even more complex models.

For this example, basically, what it means is that if the model may not be balanced in one future step, maybe it can after a series of future steps.
This is say we apply a nudge to the model, it will find that it is not possible to balance in one step if the nudge was too strong. But it may be that a sequence of steps can balance it.

For now we fucus on just the next step, meaning very small nudges, essentially not larger than the numerical integration steps errors.

to make the model to take the action for one state in the future, I move the controller code to the PostUpdate method of the model.
here I now have S, A, S', R for a least one step of the robot

so my goal is to implement the algorithm with a Q table of just one entry.
Again, a robot like that can only do very small corrections. essentially just fix numerical errors.

after doing that, the model still failed, but after a debug section I found that to predict the next step, the model most calculate the new contacts for state S' in Post Update. This is because after a step the contacts are still the values of state S, so after doing that fix.

Badabin Badabum, the controller can keep the heavy body in equilibrium for an indefinite amount of time.

if anyone is interested, please sync and check it out.
The big Box is balanced, for a long time and if touched, it falls, and you can see the controller try to correct to a new equilibrium state.

to me this is a good milestone. but there is still lot of work to do.

by **JoeJ** » Mon May 22, 2023 2:33 am

if touched, it falls, and you can see the controller try to correct to a new equilibrium state.

I did this, and after it has fallen, i have picked it up to hold it up in air. I've picked the box between it's center and the corner top left side of the blue face. A bug has shown up.
First it behaved correctly (did nothing), but then it started to rotate at constant angular velocity. The rotation started instantly, and the constant velocity never changed its value.
I let it fell down to the floor, but the rotation did not stop. It kept rotating, rolling over it's edges and faces over the floor.
Angular velocity did not change from the contact with the ground. Also the axis of rotation remained constant all the time.

Could not reproduce this a second time, but maybe you can make some sense out of it.
Now that i think of it, maybe the ball started spinning very quickly, causing some flywheel effect.
But then i should have noticed the spinning after some time, even if one revolution closely matches gfx refresh rate, which i didn't.

after doing that, the model still fail, but after a debug section I found that to predict the next step, the model most calculate the new contacts for state S' in Post Update
Because the contacts are still the values of state S, so after doing that fix.

That's interesting. I've tried PostUpdate too but it did not work. Maybe i miss something.
Here is your explanation of the pipeline, but i add my assumptions into braces:

-collision update (Just collision detection, but no contact force calculation yet?)
-dModel update
-Constraimt solver update (Solve both contact and joint forces?)
-dModel post update
-Integration

So if you do everything in post update, you do know exact contact forces, if that helps?
But then i wonder, the joint motor acceleration you also set here will be not considered in the following integration, but only one step later in the future solver update?

That's confusing, but i'll keep experimenting with doing some or all things in post.

Btw, the covariance matrix in your code isn't used.
That's something i want to learn about sometime soon.
I guess you could use it to find the ideal orientation for a virtual single body to define inertia, matching many smaller bodies?
Or you could do something like finding a good orientation of an oriented bounding box over a point set?
If you have related examples in your code, please let me know. Learning from formulas is always hardly possible for me.

I also need to learn what gyro torque is. Maybe that's something i ignore, explaining some of my problems.

Very interesting to follow.

by **Julio Jerez** » Mon May 22, 2023 7:25 am

Oh, if you try to disturb it, the result will be unpredictable,
As I said, the controller only knows how to take an action to balance the state in one step.

Actions has side effects, if the disturbance is too strong the corrective action can make even worse.

A Ballance controller than can handle disturbance is the next step. For now we have to perfect this step

Once we know the model state, there are two possibility to apply the action
You apply to balance the observed state, or you apply to Ballance the predictive future state.

Balancing the correct state, is a mistake, because does nothing to correct uncertainty on the next state.

Fortunately the engine is designed such that in one update we can get the current state, do the physics and get the next state all in one update.
If in next state, that's post update, we complete the missing parts, that is the new contacts, then, we can apply the action on a future state.

This is very much analogous to implicitly integration, where to take the derivatives at one step in the future and use it as if it was the derivative at time t.

As I said the advantage on the method is that it is the principle of Q learning.

Once you can apply action one step in the future.
Imagine that you sabe the value, state, action, new state.
In a table of action/ state/ next state.

Them you repeat trials, and in each trial you keep updating the table.

Now imagine that given a big table of action/ states.

Next time you are in the same citation, you can say, I am going to calculate an action, but the model have a dehabo, because he's seen that in the table, so he now can predict what the future state is going to be, and from that future state it can search for the next, next future state, and from there the next, next, next, and keep doing it, until the reward of some many future state, is insignificant.

Now you have a way to apply action than can do correction not for one state in the future, but for many, future states.

That's how Q Learning works.

This consept is hard to grasp, but once you do, it feels so obvious and natural.

It is how people work to solve problems most the time
Imagine you are in a maze. Where at each split you can go either left or right.
At first you know nothing, so you make a decision, say you go right.
If the decision was right, you can keep going, but if it was wrong you will probably die. Either way you get a map of good and bad decision, that will influence your choices next time.

Now if you have the chance to repeat the experiment, next time you have a metal picture of the first turn, therefore the value of right is lower than left, because the last time, you died, in q learning this memory is represented by that state action table.

For the robot, if you apply a nudge, the robot will try to correct, and fall, next time the same correction will have different value, si he try a different action.

As he build the table, is can find that a serious of action, is s the answer, therefore if the problem can be solved, it will not take the action that keep a future state Ballance, but just an ancion that improve his state, knowing that the next state will be more favorably.

by **Julio Jerez** » Mon May 22, 2023 7:37 am

Ah the covariance not use, good point.
Fixed.

Gyro is the torque that ensure conservation of momentum for bodies with non uniform inertia. It is just part of the physics.

Covariance come out of the parallel axis theorem for calculation the inertia of a body relative to a point offset of the center of mass.

The bug did not make difference because the body is at rest, so the numerator was very small.

But that was a good catch,
Is fix now,

by **Julio Jerez** » Mon May 22, 2023 7:43 am

Before going on, I will try a series of experiment to see the limits of the method.
One is to determine how big of a nudge the controller can handle.

Fir the future steps, it should be able to handle some kind of small perturbation.
Simply because and big recovery is equal to the sum of many small recovery
But if the small perturbation is zero, the the big recovery will also be zero.

by **Julio Jerez** » Mon May 22, 2023 10:33 am

There are still very big challenges
Some how the acceleration resulting from an action on the joint is far larger than I expected.
That seems a sign that there are still bug in the ik silver

by **Julio Jerez** » Mon Jun 12, 2023 1:26 pm

ok, I took a detour to get the mocap from fbx files, but now I am back on this.

I committed a test that demonstrates that the theory is sound, at least in principle.
the method can actually balance the big box on top for a few seconds.

this is only a proof of concept. I found that zeroing out the Angula acceleration of the virtual rigid body, is numerically unstable. this is because it required to synthesize the movement of inertia of the equivalent body.
but we can use the method of the equivalent support point.
That is shift the com to the point that align with the support feature (point, line or polygon)
I will writ ethe derivation of that equation which is actually very simple from the current equation.

the secund problem is that in the test there is not friction, so it is difficult to mating balance since the momentum does not dissipates, but this is intentional.

I anyone sync and just run the test, you will see the box dancing try to get balanced on the stick, but since momentum persist, each time it takes larger swings, until it can't do it anymore.
if anyone sync, you will see the box dancing on top.

edit:
finally I found one more bug wit the contact update.
now the controller is capable to keeps the box in equilibrium perpetually.
It is even capable to tolerate a gentle nudge of the ball, and the big box oscillates.

again, the demo is al academic to project that the methodology is correct.

The next step of to find a faster convergent actions.

by **Julio Jerez** » Tue Jun 27, 2023 4:22 pm

Al right I now have the first test of a learning agent, that
Seem to be learning how to balance the proverbial cart pole
The seems to be the hello world of reinforcement learning.

The algorithm is deep q learning.
It needs ton of debug and tweak, but so far it seem the neural meet is learning to predict the q values.

I still do not know if it will converge, for that I have to let it run for hours probably.

But is about ten minutes the predicted q value goes from about 1 step and die to about 50 step and die for some trajectories runs.

The reward is discount, so a reward of 50 is more that 50 frames.

If anyone tested it,
It does not seems stable, but that because the epsilon greedy Anilining too slowly, so after 10 minutes it is still taking more than 95.% of the time random action.
There for there are still too many runs that terminate very quick.
This is however important, so that the neural net learn how to weed out the action the lead to those bad trajectories.

I am surprised that it can actually take 50 steps when about 45 come from random selection.

Anyway, the is a base step before moving to better algorithm that deal with continue actions.

by **Julio Jerez** » Tue Jun 27, 2023 6:41 pm

small amount of tunning, and it manage to keep it up in about 2000 episodes.

The cool part of this is that, the knowledge can be cumulation.

for example if I save the neural net, then that net can be use a the start point for new taring epoch.
where there is not so much noise for exploration.
but instead, it introduces noise on the initial condition.

after doing few more tweaks, I will try that next, before moving to the more interesting type of
learning which are either, Deep Deterministic policy gradients (DDPG) or Soft Actor critics (SAC)
both those method handle continues actions, and are comparable and very similar to DQN
those methods are all off policy. Meaning that they learn from a training buffer of trials.

I do not really like Policy Gradients methods so called on policy meaning the learn on line.
which imply the training is always on.

anyway, now we have a more stable system, to use as our start point.

by **Julio Jerez** » Mon Jul 03, 2023 7:12 pm

ok guys I got the first controller balancing the pole on a cart.
it does not do much, just a prove of campest before moving to more advance models.
next I will see if I can add input controllers, so that the use can controller it with keyboard strokes.

If anyone want try it, you just run it and way for about an hour until is learn how to keep the it upright.

This is the deep q network algorithm (DQN) it onle handle discrete actions, so I am surprised it worked at all.
For more advance stuff It need to go to continues actions algorithm like Deed Deterministic Policy Gradients. (DDPG)
this algorithm is an extension of DQN, so it is no very advance, but it is a step in the right direction.

by **JoeJ** » Tue Jul 04, 2023 6:49 am

Just tried it. I see a upright capsule an a box shaped cart.
The cart moves left and right extremely quickly. But capsule keeps balanced.
I'm confused that if i pick the model and drag it, it snaps back to its initial position. So there must be kind of external force / constraint to reset it, which reduces my impression of magic going on.

This would be opportunity to learn at least a bit about ML, but i'm to deep in a rabbit hole and have no time.

But i would like to play around with it.
I'd like to change mass of the capsule, to see if it still can learn how to balance itself.
And ofc. i'd like to make it move slowly, but i guess you have to spend more work first?
Finally i expect explanation on the snap back / reset kind of thing, you cheater! :twisted:

by **Julio Jerez** » Tue Jul 04, 2023 3:38 pm

you are right there is a lot explaining to do.

for now, here are few points:

- this model is the equivalent of the hello world of re informant learning, it is there to test the algorithm with a problem that we know has a reasonably easy to find solution.
in that regard, it only has academic value and very little practical value.

-the reason why you see dancing around, is because the when the pole falls more than 30 degrees from the upright orientation, the algorithm declare the episode completed and restart a new one.
it is fast because it disables fix time step simulation to speed up the training. This is why you can't control with the mouse.

-It is creating a memory replay buffer of half a million steps by taking random inputs, then, after that it tris to optimize that table by improving the control action that it took on each step that led to an early termination. In doing that, good trajectories get improved and bad trajectories are rejected.

it is similar to when you are in a map, and you get to an intersection, if you know nothing, you take a random turn base on a random value to each possible turn, if the tern was good, next time you increase the value of that turn, but if it was bad is decreases it.
It keeps doing that for all the entries on the memory buffer many times.
The next time it finds a previous seen again entry, now the turn has had different values, so this time around the bad turns has lower chance to lead to a high score.

there is a lot more detail and theory, but that is the intuition behind the proliferation of those so-called reinforcement learning AI.
there are in fact very clever, but the people who keep saying that the AI, will wake come and come alive, are talking from the part of the body where the sun doe does not shine.

That the really sad part of this, these algorisms are not smart and have zero change to be smart, but they are powerful enough that they can and will be use to make tools that will take the Jobs of many people.

not only that, but these algorithms, has also been used by unscrupulous people to increase the wore demons of humanity.

anyway, you will have to wait a few more day only I complete the save and load part of the training right now, is does not do it.

then there will be tow demos for each controller.
the trainer and the playback.
the trainer is like the one you see now.
the play back, is the one that might take some small user input and we will see how it performs.

by **Julio Jerez** » Tue Jul 04, 2023 6:08 pm

Ok Joe, I now added the dqn controller player.

I is far better than my wilder expectation. :mrgreen:

I was hoping that it can withstand some small nudges and I would be considered a success.
but what happen is that you actually had to wrestle the model to knocked it down.

the balance is at a point that move to the right or less, but that is in fact the best result I can hop for.
I will let you to see if you figure out why,
later I will make a diagram to how why this is the case.

If you or anyone following sync, the more now start playing with a pretrained model, so it is normal play.

Now I can start moving to the DDPG method.

About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Who is online