A Practical Guide to Machine Learning with TensorFlow 2.0 & Keras

Reinforcement Learning

Microsoft

Lesson Description

The "Reinforcement Learning" Lesson is part of the full, A Practical Guide to Machine Learning with TensorFlow 2.0 & Keras course featured in this preview video. Here's what you'd learn in this lesson:

Vadim explains what reinforcement learning is, introduces different reinforcement learning example, and introduces the open AI gym, a toolkit for developing and comparing reinforcement learning algorithms.

Join Now or Start a Free 5-Day Trial

Preview

Transcript from the "Reinforcement Learning" Lesson

[00:00:00]
>> The only notebook we left untouched, but we probably have to just do it on our own. But at the same time, this notebook is very, very documented. And it's talking about reinforcement learning. So, with this example, you will be using OpenAI gym. If you look at their website, OpenAI actually created amazing environment for playing around different games, and trying to figure out what to do with them.

[00:00:34]
In my examples, I'm using a simple classic control, CartPole-v1. So it just simply, this pole you have on carts, and you can move cart to the left or you can move it to the right. And the task is to try to balance your pole, and it's trying to preserve the laws of physics.

[00:00:54]
Meaning that if it's bemt, it's probably gonna fall, and you're trying to avoid it by just moving your cart in the corresponding direction. And in this notebook first, and by the way, yeah, that's all the experiments available through the gym. For instance, you can even play Atari games, and try to, For instance, educate your Pac-Man how to await all those ghosts, or play Box, and so on and so forth.

[00:01:26]
There's more advanced ones, for instance, with MuJoCo, you can even try to manipulate the, Robots, and try to educate humanoids to walk, but you will need 3D physical engine for this. Yeah, all right, so in our example in this notebook, yes, there are 800 different environments listed all here.

[00:01:53]
Well, 859 of them, actually. But with the CartPole, you can just simply load this environment. And everything to you will be provided as those four numbers, which actually correspond to position of the cart. The current velocity, and if it's negative, it means it's moving to the left, if it's positive, it's moving to the right.

[00:02:19]
And that's the angle of your pole and corresponding angular velocity. So given all those coordinates, you can even render. Let's see if we can render the environment. Right, so it will visualize it in a separate window. And now, anything I'm gonna be doing in the code, it will simply change the position of this cart.

[00:02:47]
So for instance, I can say what's the size of my action space, and it's 2, it means that I can move to the left or I can move to the right. And for instance, action number 1 will just move my cart to the right. Let me try to actually put the visualization window and my codes on the same screen.

[00:03:18]
Here we go, okay, so if I do this, you see it's actually slightly moving our cart to the right. And we see that every time I execute this cell, our cart's moving faster and faster. And it's basically making this pole fall down. But anyway, that's the setup, and you can play around with it.

[00:03:42]
So you can try to figure out what action you need to take to, well, balance the pole. And I explained first that you can probably try something simple. For instance, if angle is negative, just move to the right, and to the left if it's positive, move to the right, right, to balance it.

[00:04:01]
But you will also figure out quickly that it actually introduce more and more fluctuations to your pole. And so we need to be slightly smarter than that. And Q-values or Q-Learning is one technique how to do this. With Q-Learning, you're just simply saying, we have different states and we have particular actions we can take.

[00:04:27]
And you're just simply trying to figure out, depending on the state, what you have in your system, what actions you should take to maximize the profit or reward function. Right, and Q-Learning is one technique which can help you to do that. So once again, as I said, this is a pretty nicely documented notebook.

[00:04:50]
So I would recommend just to go through it yourself, play around with the provided code. And at the end, it's actually even using optimized solution, the one available through the TensorFlow hub itself. So TensorFlow-Agents allows you to use some of the agents which are already being created by somebody else.

[00:05:15]
And you can just apply those to train your agents in, for instance, in this case, I think it's one of the Atari games, yeah, this one. So you're just trying to move your agent right to basically, Keep this bouncing ball in the field. But you can teach your your agent to do exactly that, and don't lose the ball.

Learn Straight from the Experts Who Shape the Modern Web

In-depth Courses
Industry Leading Experts
Learning Paths
Live Interactive Workshops

Start a 5-Day Free Trial