OpenAI Gym and Python for Q-learning – Reinforcement Learning Code Project



what's up guys welcome back to this series on reinforcement learning over the next couple of videos we're going to be building and playing our very first game with reinforcement learning we're going to use the knowledge we gained last time about cue learning to teach an agent how to play a game called frozen lake will be using Python and open a ice gem toolkit to develop our algorithm so let's get to it so as mentioned we'll be using Python and open AI gem to develop our reinforcement learning algorithm the gem library is a collection of environments that we can use with the reinforcement learning algorithms we develop Jim has a ton of environments ranging from simple text-based games to Atari games like breakout and Space Invaders the library is intuitive to use and simple to install just run pip install Jim and you're good to go really easy as that the link to Jim's installation instructions requirements and documentation is included in the description so go ahead and get that installed now because we'll need it in just a moment we'll be making use of Jim to provide us with an environment for a simple game called frozen lake we'll then train an agent to play the game using cue learning and then we'll get a playback of how the agent does after being trained so let's jump into the details for frozen lake wait frozen lake like the frozen lake in sorry but no the frozen lake will be playing won't have us fighting any white walkers and seriously if no one gets this reference then you're spending way too much time learning deep learning and not enough time vegging out on well let me know in the comments if you know where this scene is from alright let's get into the real details for the actual frozen lake game we'll be playing I've grabbed the description of the game directly from Jim's website let's read through it together but with an accent you know to add dramatic effect winter is here you and your friends were tossing around a frisbee at the park when you meet a wild throw that left the frisbee out in the middle of the lake the water is mostly frozen but there are a few holes where the ice has melted if you step into one of those holes you'll fall into the freezing water at this time there's an International Frisbee shortage so it's absolutely imperative that you navigate across the lake and retrieve the disk however the ice is slippery so you won't always move in the direction you intend the surface of the lake is described using a grid like you see here well that was fun this grid is our environment where s is the agent starting point and it's considered safe for the agent to be here F represents the frozen surface and is also safe H represents a hole and if our agent steps in a hole in the middle of a frozen lake well yeah you know that's not good finally G represents the goal which is the space on the grid where the prized frisbee is located the agent can navigate left right up and down and the episode ends when the agent reaches the goal or falls in a hole it receives a reward of 1 if it reaches the goal and 0 otherwise so pretty much our agent has to navigate the grid by staying on the frozen surface without falling into any holes until it reaches the frisbee if it reaches the frisbee it wins with a reward of +1 if it falls in a hole it loses and receives and no points for the entire episode cool alright let's jump into the code first we're importing all the libraries will be using not many really we have numpy Jim random time and clear output from I pythons display next create our environment we just called gem make and pass a string of the name of the environment we want to set up we'll be using the environment called frozen lake v-0 all the environments with their corresponding names you can use are available on Jim's website with this end object we can do several things we can query for information about the environment we can sample states and actions retrieve rewards and have our agent navigate the frozen lake we're now going to construct our cue table and initialize all the key values to zero for each state action pair remember the number of rows in the table is equivalent to the size of the state space in the environment and the number of columns is equivalent to the size of the action space we can get this information using end observation space thaw in and in the action space thought in we can then use this information to build the cue table and fill it with zeros if you're foggy about cue tables at all be sure to check out the earlier videos where we covered all the details you need all right so here's what our cue table looks like now we're going to create and initialize all the parameters needed to implement the cue learning algorithm let's step through each of these first with num episodes we define the total number of episodes we want our agent to play during training then with max steps per episode we define the maximum number of steps that our agent is allowed to take within a single episode so if by the 100th step that agent hasn't reached the frisbee or fallen through a hole then the episode will terminate with the agent receiving 0 points next we set our learning rate which was mathematically shown using the symbol alpha in the previous video then we also set our discount rate as well which was represented with the symbol gamma previously now the last four parameters are all related to the exploration exploitation trade-off we talked about last time in regards to the epsilon greedy strategy we're initializing our exploration rate that we previously referred to as epsilon to 1 and we set the max exploration rate to 1 and a main exploration rate to 0.01 the max and min are just bounced to how large and how small our exploration rate can be lastly we set the exploration decay rate to 0.01 the rate at which the exploration rate will decay now all these parameters can change these are parameters you'll want to play with and tune yourself to see how they influence and change the performance of the algorithm when we get there speaking of which in the next video we're going to jump right into the code that will write to implement the actual cue learning algorithm for playing frozen lake for now go ahead and make sure your environment is set up with Python and Jim and that you've got the initial code written that we went through so far also come check out the corresponding blog for this video on D poster comm to make sure you didn't miss anything and while you're at it check out the exclusive perks and rewards available for members of the Deep lizard hive mind let me know in the comments if you're able to get everything up and running and leave us a thumbs up to let us know you're learning thanks for contributing to collective intelligence and I'll see you in the next one well that agent lost

23 thoughts on “OpenAI Gym and Python for Q-learning – Reinforcement Learning Code Project”

  1. Check out the corresponding blog and other resources for this video at:
    http://deeplizard.com/learn/video/QK_PP_2KgGE

    Comment if you know the scene from 1:27! ๐Ÿ’€โ„๏ธ๐Ÿ’Ž

  2. Amazing content, at your channel I have never faced any compatibility issue, I just follow the code and it runs Everytime. Please keep posting videos

  3. Cool music. Much better that the original one ..well one a one was a bit creepy. Well that whitewalker would have fit right in with that. As always short and great content.
    Anyway comparing with the earlier lizard cricket problems you did happen to penalize being eaten by a bird and reaching holes. And here the only change that happens is when we complete reach goal successfully. Isn't there any possibility of the q-values decaying towards zero way faster than the previous one? Wanna know what's the advantage over here other that the more likely chances of exploitation?

  4. It took me some time to understand in the later videos that the environment keeps unchanged, although the .reset() function gets called. I guess this is because you keep working with the exact same instance variable "env", right? That's a little subtle ๐Ÿ˜‰

  5. The future is coming and it's no longer waiting. No more arduinos, in just 3 years we got microcontrollers of 32 bits and three times cheap, 4.5 times fast. Brain to brain interface was done this year

  6. great tutorial :D, please next teach us how to implement q learning with neural network ๐Ÿ˜€

  7. Thanks for your great work. Do you have some experience in RL in ROS/Gazebo? And maybe you can say whats the best way to use it, like with openai gym or openai ros …?

Leave a Reply

Your email address will not be published. Required fields are marked *