About Physics modeling.

by **Julio Jerez** » Fri Jul 14, 2023 9:03 am

Yes exactly like that.
The cart pole demo is already set like that.

Training is just too slow to be online.
The algorithm have two classes.
The trainer and the player.

Over the weekend I will give a crack at the quadruped.
And see if the same trainer can learn a policy that keep
The balance when the walk in place animation is run at various speed.

The walk is stable only at the nominal speed, because the body does not have time to fall while two leg are up in the air. But when play slowly it just fail.
So the objective is to learn to shift the center of mass to a position that makes the zero moment point close to zero.

This is not ambitions, but a step for learning the pitfalls of the method.

by **Julio Jerez** » Sun Jul 16, 2023 12:21 am

tunning the algorithm is quite challenging.
I now have a new version of the DQN, that I think is the best I got so far.

It managed to keep the pendulum up for about 550 ticks, after 1400 training episodes.
here is the plot of the Q value at each state.

: Untitled1.png (27.79 KiB) Viewed 16598 times

as people can see the Q values tend to go up, but they are very erratic.
this is because these algorithms are what they call high variance, meaning that from step to step the different in gains are very unpredictable.

1400 episodes are about half a million frames and takes about 1 and half hour to process.
an episode is a run of the simulation from the start until it is declared dead.
I can let it run over night for another million frames and see if it gets better.
but for now I am happy with these results and I can take this as the baseline for more advanced methods. Again, this algorithm only has academic value, because it is the base for the more advanced one, so it is important to get it right.

by **Julio Jerez** » Mon Jul 24, 2023 9:51 am

ok, I finally debug the DQN class, fix some problems and optimized it.

: Untitled.png (37.78 KiB) Viewed 16580 times

the problem I was having was that the training start out fine, it reached a high local minimal, at about.
1000 to 1300 episodes, but if it keeps training for a very long section, it started to degrade to a point that is was worse than pure random actions.
now that does not happen anymore, It reached a Plato and remained there moving from one local minimal to the next. It also shows the typical behavior of Q learning, which is that sharp spike on leaning then a gradual ascends until it reaches a maximum.

so now to make it better is a matter of playing with network architecture or hyper parameters, but the base algorithm is correct.

It is very important I have the algorithm right because it is the base class for the first continue action,
and these methods are not guaranteed to converge even for control problems for with are solvable.
anyway continue on.

by **Julio Jerez** » Tue Jul 25, 2023 3:47 pm

ok guys, I finally fix the bugs that were making the continues action space trainer fail.

I committed a pretrained controller, and the result are extremally impressive, I could not believe how resilient it was to perturbations.

you can grab the pole with mouse, and you are hard press to knock it down, I I do not see I do no beleive it. :mrgreen:

for the people follwing this.
I committed the trained agent, but if you run the second demo,
It will train a new one, and it is funny to see how is struggled to learn how to cope with perturbations.

Now we are ready to try this in some of the more interesting model.
The Dog, The Spider, the Human, and some robots.
I think we are now in the game guys.

Edit: if anyone decide to run the trainer, in my four core takes about 15 minute to run one million steps.
But people with better system should be faster.

In fact one of the beauty of the native implementation is that it run quite fast as opposed to the test I seen on YouTube that run for many hours.
I am very happy with the results so far.
In fact it is so good that I do not think it is going to need gpu
At least not for the normal nets, but we may need it for when we try images to capture the world by reading to texture, and get a z buffer as the input of a convolutional net.
But that way in the future.

by **Julio Jerez** » Tue Jul 25, 2023 6:23 pm

here is the clean Q value iteration curve of the DDPG trainer.

: Untitled.png (13.35 KiB) Viewed 16551 times

it reaches a score five time higher than teh DQN.

That can be explain because the DQN onel has three actions do nothing, move right or move left.
so it can Onel balance the pole at an angle moving right or left, true a series of impulses.
but that generate a lower score than is the pole was upright.

the DDPG method can apply a continue impulse, so it can zero out the pole angles and a higher score.

by **JoeJ** » Thu Jul 27, 2023 2:06 am

Wow, i can see both unicycle and cartfoot models can balance already. Awesome! :mrgreen:

by **Julio Jerez** » Thu Jul 27, 2023 6:24 am

The unicycle is quite weak, it was the first time after I realized that using only the leg actuator is impossible to make stable, because that actuation create a positive feedback.

Meaning that if it leans forward the left would has to also tilt forward, but that create a torque that make it leans even.
more. But if it tilt left then it is less supported and gravity makes tilt until if falls.

By making the wheel active now the wheel makes a negative feedback. And that makes it possible to control.

The important part is that this is a multi action controlled, which make the algorithm proof it's effectiveness.

I will make few tweaks, to see if it makes a more robust controller, before moving to the more interesting controllers, but this seems like a good experiment.

by **Julio Jerez** » Thu Jul 27, 2023 10:11 am

the longer training section yielded worse results. the gain is quite erratic and tunning hyperparameters seem to make it even worse.

: Untitled.png (27.02 KiB) Viewed 16461 times

that's one of the problem with the DDPG method. It is too brittle with respect to hyper parameter running.

this is what we got so far,
1- we know the algorithm is working because it does solve a simple inverted pendulum problem
2- I have not debug the unicycle, so I will do that before implementing the next trainer algorithm
3- It seems clear that DDPG even when fixing the model, is not going to handle the more complex model with several action and states, well.

Over the weekend I will try the next algorithm, I have not decided if it would be TD3 or SAC
TD3 seems a lot easier to understand from the paper,
The SAC is quite difficult to read paper, and in fact some time it is even incoherent.
but according to the popular opinion it is the most advanced method yet.

by **Julio Jerez** » Fri Jul 28, 2023 12:36 am

if you try now, it is more robust.

I tweaked some hyper params and went from 200 to 300 hundred average frames. still far, far away from the desired goal, but you can see that this could be quite powerful tool.

anyway, I am now coding the TD3 algorithm, I am hoping this one manages to get a more robust controller that behaves close to the cart pole model. That's a tall order, but let us see how it goes.

by **Julio Jerez** » Sat Jul 29, 2023 10:34 pm

ok guys I now completed that secund algorithm and clean up the bugs.
here are the comparisons, training the cartpole using same settings.

: Untitled.png (16.99 KiB) Viewed 16380 times

they both fail on the unicycle, both give that thes are both in agreement with the theory and manage to train a very robust inverted pendulum controller, I consider it a very strong indication that the trainers are correct, and it is the unicycle model that has to be re designed. so I will do that latter.

The important part here is that the TD3 algorithm is far superior to DDPG, the graphs show it start to learn a lot sooner and as you can see, the area under the DDPG graphs is a lot smaller than the area under TD3, just by inspection.

this is very important because for the next demos, if the training fail, it is hard to determine if it is the model or the trainer that is incorrect, so it is good to have a suite of demos to compare the reliability of the algorithm, and also the reliability of new future algorithms.

anyway, we are now really close to start training the more interesting robots.
:mrgreen:

by **Julio Jerez** » Sun Jul 30, 2023 10:33 am

ok guys I now trained the unicycle using DP3 trainer,
and it is more robust, the other one kept the box up, but did not like any perturbation.

this one now can withstand small nudges to the parts and sometime recovers, it is remarkable how cool that is.

if I lift the box is does not know that is on the air, so it trys to compensate.
that on itself is actually quite cool, but it can be made better by adding another input that if it does not have a support contact, them it sends the wheel to the rest position.
I will probably add that later.

if anyone check it out, but demos are check with a trained controller.

anyway, I think we are now ready to kick robot a-s-s with the more interesting and challenging models.

by **Julio Jerez** » Sun Jul 30, 2023 7:55 pm

I reopened the newton youtube channel, here are first and second RL videos.

by **Bird** » Wed Aug 02, 2023 11:03 am

Wow, that's pretty impressive. A couple of times I got the unicycle to keep it's balance from pretty far from vertical positions. Much better than last time I checked! Having it reset when it's off the ground would be a nice feature to have.

by **Julio Jerez** » Wed Aug 02, 2023 12:50 pm

yes, I have it that when is off the ground is moves the leg to align with the body and zero the wheel angular velocity. I thought that it will snap back really fast, but to my surprise, is moves the leg very slowly.
After we see the behavior, it does actually make sense, the reason is that if it moves the leg fast, that means the upper body with counter rotate, but since the objective is to maintain the upper body upright, a wat is to move slow letting gravity align the whole thing, it is very counter intuitive.
It is easy to get confused and start thinking that the thing is actually making intelligence decisions, but it is not, it is just a bunch of linear combinations followed by nonlinear activation.

Anyway, I start to train the first quadruped robot. I let it go for one million steps, and the final still does not know how to sway the center of mass to maintain its balance but did show promising results.

here is the learning curve so far.

: Untitled.png (104.9 KiB) Viewed 16257 times

the important part to take away is that it is showing an upward leaning trend, so it probably needs more training steps. I will let it go for another half million steps and see if it gets better.
but all in all, it is going on the right direction.
This can be used for ton of practical application and game logic of background characters not just for physic control.

by **Julio Jerez** » Sun Oct 01, 2023 1:36 pm

wow, after spending few months developing and truly understanding many of the RL method.
My conclusion is that all of the off-policy methods (DQN, DDPG, TD3, SAC, etc) are not worth the Invesment for large comment model.
actor critics method suffers a lot from actor being too brittle, the reason is that when training the critic, it is the critic maximined the input, but the input contains a lot amount of random error.
so the error get amplified just the same way as the signal, is a way the have a very large amount signal to noise ration that does not get reduces.

this error is different than the amount of entropy added for exploration, these errors come for the fact that the parameter actor network, is like a land scape, in which tiny changes on any of the parameter cause huge changes of the output and this makes is very intractable because one would expect that adding more parameter to the neural net will yield better result, but what it does is that increases that randomness, and each time you increase randomness, there will be more way to be wrong that to be right. It is true you can get better result, but the amount trial is unpredictable.

for these reasons I will try the policy interactions, I will start with the vanilla
Proxima Policy gradient (PPO) and see what it can go from there.

About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Who is online