AWS Deepracer — How to train a model in 15 minutes

What most people did

Almost all scoring functions that I heard of are mostly scoring based on where the car currently is. The simplest approach is already provided as an example by AWS: just score high, if you are closer to the central line. More sophisticated approaches score high, if the car is close to a hand crafted more or less ideal line. E.g. a line that would be on the right before a left turn and on the left in the middle of the turn. The proximal policy optimization algorithm is then able to learn what good actions have been and which not.

What I did

The point the car should aim for is determined by simple geometry.

Why is this a good idea?

After about 15 minutes of training the car completes its first lap. One minute later it completes another lap and after this it can run 50 more without a single crash. Other approaches usually took many hours if not days to converge to a somewhat stable model. But Why?

18 Minutes of training on the re:invent 2018 track. After 42 attempts the model learned to run the track with 100% success rate.

Isn’t this cheating?

Well it feels a bit like cheating at first. Right? After all there is an elaborate training algorithm that can figure out how to steer. An now we tell the car directly how it should steer using “complicated mathematics”.

DeepRacer running in Accenture cup in Kronberg, Germany

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Falk Tandetzky

Falk Tandetzky

52 Followers

Software and cloud architect and machine learning engineer at TwoDigits / Accenture. PhD in quantum physics. www.github.com/falktan www.twitter.com/falk_tan