There’s something eerie about deep reinforcement learning that sets it apart from other kinds of AI. Deep reinforcement learning is “goal-oriented” AI. That is, it learns which actions to perform in order to reach a goal. 

It was used in the early DeepMind demos that showed algorithms beating Atari games. Visible proof that algorithms can kick ass eventually led to Google acquiring the research group, and they also pushed DeepMind investor Elon Musk to warn the world about the dangers of superintelligence

What does deep reinforcement learning (RL) demonstrate that doesn’t come through quite as powerfully when machine learning algorithms classify kitten photos?

The Relationship Between Deep Reinforcement Learning and Goal-Oriented AI

Deep RL is programmed to act with a purpose that we can recognize, a goal to win the game. Winning that game is something humans consider hard. And RL does so over many steps strategically with clear intent. We might call its actions strategic, as they take place over time and are sometimes unintuitive. 

What are some examples of strategic goals that you can program a reinforcement learning agent to achieve? 

You can tell your reinforcement learning agent that it will only get a large reward if it succeeds at reaching a physical location; e.g. if you manage to get to the loading dock, you get 100 points. 

You can give it a large reward if it hits a production target: e.g. if you make your quota of 5 tons of copper per month, you get 2000 points. 

In a game, you might say: if you manage to kill the Boss, you get 10,000 points. 

AI chess pieces representing goal-oriented learning.

With these goals, you are telling the agent WHAT it needs to accomplish, but you are not telling it HOW to do it. That gives the agent a lot of flexibility. The reinforcement learning agent will invent the how. That is, it will come up with ways, maybe entirely new paths and strategies, to accomplish what you told it to. It will perform a complex act of problem solving by learning through trial and error over many attempts to reach its goal.

Often, a company will have several goals that have to be met at the same time. For example, they want to deliver shipments while reducing their carbon footprint. Or they want to deliver those shipments as quickly as possible, but at the lowest possible cost. In those cases, you can program a reinforcement learning agent to optimize for more than one goal; that is, to act in such a way that they maximize their rewards over all goals behind the best intersection of them. 

When programming the goals of a reinforcement learning agent, you can assign importance to them. Maybe lowering costs by a dollar per shipment gets them an additional 200 points, but delaying a delivery by more than 48 hours would earn them a penalty of -1000. That would be particularly important if you are delivering anything perishable.

In contrast to goals, most advances in deep learning over the last decade have shown that machines can now do things humans consider easy. Classifying photos, recognizing voices, sensing the emotional tone in a text — these are what we call perceptive tasks or actions that humans can accomplish in under a second, usually without conscious effort. 

For many years, those perceptive tasks were out of reach. Moravec’s paradox states that reasoning is easy for computers, while sensorimotor skills are hard. That is, algorithms struggled for a long time to perform tasks that are easy for a three-year-old, even as they successfully beat grandmasters in chess. 

Now, those same computers are eroding the paradox. They are slowly mastering low-level sensorimotor tasks, with the help of deep learning and more powerful hardware. 

You can combine the new perceptive powers of deep learning with goal-oriented optimization in the framework of reinforcement learning. That is its power. Not only can deep reinforcement learning understand the goal you want to achieve, but it can interpret the sensory input of the world in order to move through it, and reach that goal. 

In practice, this means that deep RL is the best way we have to demonstrate the intelligence of machines.

When AI Gives You Chills and Why

When you witness deep RL in the process of learning in simulations, AI gives you chills. When you see AI reach a goal through intelligent action for the first time when just a day before it could not… chills. It can be like an out of body experience, a shift in what we understand to be possible. It’s like that moment in the movie Frankenstein when the doctor shouts “It’s alive!!” 

You could argue that this is because humans associate the notion of intelligence with the ability to achieve one’s goals. To think strategically is clearly intelligent, to act without strategy is clearly dumb. RL can be programmed with the intent to act in a way that mirrors our ideas of intelligent behavior. And when it succeeds, it can give you goosebumps. 

Learn more about how deep reinforcement learning works