ok guys, I finally fix the bugs that were making the continues action space trainer fail.
I committed a pretrained controller, and the result are extremally impressive, I could not believe how resilient it was to perturbations.
you can grab the pole with mouse, and you are hard press to knock it down, I I do not see I do no beleive it.
for the people follwing this.
I committed the trained agent, but if you run the second demo,
It will train a new one, and it is funny to see how is struggled to learn how to cope with perturbations.
Now we are ready to try this in some of the more interesting model.
The Dog, The Spider, the Human, and some robots.
I think we are now in the game guys.
Edit: if anyone decide to run the trainer, in my four core takes about 15 minute to run one million steps.
But people with better system should be faster.
In fact one of the beauty of the native implementation is that it run quite fast as opposed to the test I seen on YouTube that run for many hours.
I am very happy with the results so far.
In fact it is so good that I do not think it is going to need gpu
At least not for the normal nets, but we may need it for when we try images to capture the world by reading to texture, and get a z buffer as the input of a convolutional net.
But that way in the future.