Teaching Computer to Play Video Games and Land Lunar Landers

Thomas Lawrence
5 min readApr 1, 2021

A Quick Guide to DQN

I’m sure you’ve seen a clip of a reinforcement algorithm playing a game before? Something like this one from Google Deep Mind of a DQN playing the Atari game - Breakout:

Amazing!

But how does it actually work? What the heck is a DQN? When would this actually be used in real life? So sit down and prepare to have your DQN questions answered and maybe look at just a few more games being played by an AI.

What is a DQN

A DQN is a subset of reinforcement learning. DQNs are simpler than regular reinforcement learning, making it a lot more practical for a lot of problems. DQN stands for Deep Q Network. DQNs are basically deep learning and reinforcement learning combined. The Q in DQN is for Q-value, which I’ll discuss later. For now, we need to know what an RL model needs to work.

What is needed for a Reinforcement Learning Model?

You need 5 things to be able to create an RL model: an Agent, Reward, State, Action and Environment.

The Agent

The agent is the main thing we care about for our algorithm because everything is based around the agent. The agent is like the player. It takes all the actions and it produces data.

In this case, the agent is the shooter circled in red.

The State

The state is the position and orientation of the agent in the environment. This is useful for the RL problems because it allows us to relate states to the reward for being in that state.

In this case, the state is the position and orientation of the shooter.

The Environment

The environment is the place where all the actions take place and what the agent interacts with.

In this case, the environment is the game environment; so, everything except the entities

The Reward

The reward is what incentivizes the agent to what you want it to do. The reward is kinda like giving a dog a bone to do tricks.

In this case, the reward is the score; because the shooter is incentivized to get the highest score possible.

Actions

The action is what it sounds like - it’s the thing the agent does. Usually, the actions are small simple things like move left, move right, etc., and these build up to make a bigger action like navigating a maze or move that thing there from here.

In this case, the actions are left or right.

Combining all these things we have all the elements needed to make our RL model. This is very cool because reinforcement learning is actually the closest thing we have to artificial animal intelligence. It learns similar to animals, in the sense that, it learns things are bad or good based on the reward they provide us.

But wait! You might be asking yourself, ‘ok that’s great, but how do we make the agent do stuff?’ That actually comes down to the policy. The policy is what makes the agent value those sweet sweet rewards.

Policy

The policy of a DQN is value-based. The DQN wants to increase the value as high as possible. This is where the Q comes in. The Q in DQN stands for Quality in a State. We want the highest reward, but we don’t know if an action will give us a higher reward later. For example, imagine you’ve been given a cookie and every 20 seconds, the number of cookies you have doubles; but, once you reach a certain number of cookies, all the cookies you have will be taken away from you. You don’t know how many cookies is the maximum but you want the maximum! You can take the cookie right away for an immediate reward, but if you wait, you get more cookies. So, the long-term reward is better. Getting the maximum reward is the goal of a DQN. DQNs want the biggest reward so you have to balance long-term rewards vs short-term rewards.

How I made a DQN land on the (Virtual) Moon

I wanted to make a DQN. So I made a DQN with the OpenAI Gym Lunar Lander Environment to train a Lunar Lander to land on the moon.

For the model, I used a basic NN with 3 Dense layers, Relu and linear activations, mean squared error as the loss, and the Adam optimizer

def DQN_model(self): #making the model

model = Sequential()
model.add(Dense(150, input_dim = self.state_space, activation=relu))
model.add(Dense(120, activation=relu))
model.add(Dense(self.action_space, activation=linear))
model.compile(loss="mse", optimizer=Adam(lr=self.lr))
return model

If you want a full code breakdown, check that out here.

To train the model we use experience replay. Experience replay allows the model to learn in a non-consecutive manner. Experience replay sample from a random time where we can see the state, action, reward given at state, next state and also if your environment has finished. This allows the learning to be more efficient due to learning in a non-consecutive manner.

Finally, after training for a while I got this result

We have landed!!

Ok, this is great but it isn’t useful. Is there any potential useful application for DQN and RL? Wow, what a great question and a perfect transition to my next section!

What is next for DQN and RL

Some potential application of Reinforcement learning is:

Robotics

Since robots will be doing everything in the future (supposedly), we need to be able to make them able to do everything - even tasks that we can describe. This is where RL comes in because it allows us to do tasks without describing how to do them. There are some robots with RL now, but it is still very much in the early stages.

Robots opening doors with RL

AGI

A potential path for RL is artificial general intelligence or AGI for short. Of course, nobody knows if this will work yet, but since RL is very similar to animal learning, it’s a potentially promising path.

There is no limit to the potential application for reinforcement learning. It could help speed up processes when combined with regular neural networks or it could be used for things like:

  • Cybersecurity
  • AB testing ads
  • Recommendation systems
  • Personal Assitants
  • Automating tasks

There are so many potential opportunities for DQNs and RL in the future. I’m excited to see what happens in the future.

TL;DR

  • Reinforcement Learning needs an agent, reward, state, actions and environment.
  • The policy of a DQN is optimizing for the highest score.
  • Experience replay is used to train more efficiently.
  • They are many potential opportunities for RL and DQNs models in the future.

If you like this article, you will probably like my other ones, so consider following me on Medium and while you’re doing that, follow me on Twitter, Linkedin and sign up for my newsletter.

--

--

Thomas Lawrence

I’m a curious 17-year-old. I’m interested in QC, AI and many other things.