About Physics modeling.

by **Dave Gravel** » Tue Oct 03, 2023 3:22 pm

Hello Julio,
I had tried the SDK 1 or 2 days ago, but the demo was stuck in a loop even after 1 hour or more.
I didn't have the time to make a report at that moment.
Today, the demo seems to be working and starts much faster.
Was it normal for the application to be stuck in a loop 1 or 2 days ago ?
Is there anything specific for users to test in the demos, or is it just for training purposes at the moment ?

by **Julio Jerez** » Wed Oct 04, 2023 11:11 am

I had tried the SDK 1 or 2 days ago, but the demo was stuck in a loop even after 1 hour or more. I didn't have the time to make a report at that moment.

You mean the spider robot that is in suspended in the vacuum doing random move?

I was trying to train a neural net to do a pose matching controller, using the TD3 algorithm.
But the results are very disappointing.
The agent start to learn well on the early stage, but the the learn collapse in a unpredictable ways.

I found that as the problem becomes more complex the value iteration methods are very unstable.
The just have a hard time converging to a good minimal as the dimension of the space becomes more complex. They behave well one stuff like a single robot arm or the inverted pendulum, but truly collapse when the model is a set or arms like a robot.

It is true they are sample efficient because the learn from pass experience, but the problem is that it is almost always the case that the if experiences in memory are bad them it just learns a bad policy.

I am reading the papers for policy gradient methods, and I will try to implement an PPO and maybe a A3C agent and see what results I get.

These method are not sample efficient because they learn from real data generated by the models.
That makes them expensive for real world robots, but we are not using real world robots, we are simulating them.

Also, it seems that Policy gradients are more friendly to train multiple models in parallel and behave better when the model dimensions are more complex, more states and more actions.

Anyway, I will keep reading and see if I can implement at least the vanilla policy gradient this weekend.

the training takes many hours, but does not make satisfactory progress, I will disable it later until I get a good controller.
one of the things the AI people never mentions, is how difficult is to get an agent working, that why you see ton of people just repeating the successful test with the same hyper paraments.
basically, they count the hits and hide the misses.

by **Dave Gravel** » Wed Oct 04, 2023 3:09 pm

I'm not sure if it was the Spider demo, but I think it was. I had downloaded the SDK on October 2nd, I believe, and the application only showed a white window. It was as if the application was working in the background, but the image remained white even after more than an hour. I downloaded the SDK two days later, and now the application seems to open normally, and I can see the training with the Spider.

I was trying to train a neural net to do a pose matching controller, using the TD3 algorithm.

I have already conducted some tests with CNN and TD3, as well as several other algorithms, but my project used OpenCV, CUDA, and Python. The implementation was certainly different, and it was for video image processing and object detection, among other things. However, it seems more complicated to implement for a 3D simulation.

I had taken a short break from OpenGL and Newton to make further progress with my Vulkan code in my engine. I'll be returning to using Newton very soon, but I'll keep a closer eye on the changes in the SDK to avoid getting too lost. However, I'd really like to conduct some experiments with AI training and Newton.

the training take many hours, but does not make satisfactory progress, I will disable it later until I get a good controller.

Okay, or if you leave it enabled and it takes some time to perform the training calculations before opening, it's probably better not to put this demo first when the application opens. This way, if the application takes some time to open normally, people won't think that the SDK has a bug.

However, I believe that when I tried the SDK on October 2nd, I was probably just unlucky, and it wasn't the optimal version to try at that time.

by **Julio Jerez** » Fri Oct 20, 2023 1:02 pm

I finally implemented the discrete and continue version of the Vanilla Policy Gradient with base line, and so tested on tow test programs, the unicycle and the cartpole.
the results are far superior that all of the Q value interaction that I tried so far.

Bellow is a comparison of the learning curve of both algorithms.

: Untitled.png (59.09 KiB) Viewed 9173 times

as you can see the Vanilla policy with base line method is far more stable, the learning did not collapse as Q values base method usually do. I learned that method that use the Montecarlo method to calculate expected reward are far more stable that Q Value, this is one very good reason for me not to try the method that replace Montecarlo with Temporal differences using a deep net.
at this point, I do not see any reason for value-based methods given that I am simulating data,
therefore, I have not need to be sample efficient methods, also, I found out that, in my experience, the claim of sample efficient methods are way, way, way overrated.

any way with these results, I will see if I can make a pose matching controller again.
my hope is that if it works, them It can be extended to a motion matching, and so on.

About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Re: About Physics modeling.

Who is online