I had tried the SDK 1 or 2 days ago, but the demo was stuck in a loop even after 1 hour or more. I didn't have the time to make a report at that moment.
You mean the spider robot that is in suspended in the vacuum doing random move?
I was trying to train a neural net to do a pose matching controller, using the TD3 algorithm.
But the results are very disappointing.
The agent start to learn well on the early stage, but the the learn collapse in a unpredictable ways.
I found that as the problem becomes more complex the value iteration methods are very unstable.
The just have a hard time converging to a good minimal as the dimension of the space becomes more complex. They behave well one stuff like a single robot arm or the inverted pendulum, but truly collapse when the model is a set or arms like a robot.
It is true they are sample efficient because the learn from pass experience, but the problem is that it is almost always the case that the if experiences in memory are bad them it just learns a bad policy.
I am reading the papers for policy gradient methods, and I will try to implement an PPO and maybe a A3C agent and see what results I get.
These method are not sample efficient because they learn from real data generated by the models.
That makes them expensive for real world robots, but we are not using real world robots, we are simulating them.
Also, it seems that Policy gradients are more friendly to train multiple models in parallel and behave better when the model dimensions are more complex, more states and more actions.
Anyway, I will keep reading and see if I can implement at least the vanilla policy gradient this weekend.
the training takes many hours, but does not make satisfactory progress, I will disable it later until I get a good controller.
one of the things the AI people never mentions, is how difficult is to get an agent working, that why you see ton of people just repeating the successful test with the same hyper paraments.
basically, they count the hits and hide the misses.