Technology

Pathmind makes it easy to incorporate AI into business practices, from cloud-powered training to deployment into operations.

REQUEST DEMO

An Introduction to Reinforcement Learning

Pathmind uses a type of AI called reinforcement learning to optimize simulations and make accurate predictions about hard, real-world problems. But what is reinforcement learning?

Reinforcement learning (RL) is a set of machine learning algorithms, or combinations of math and code that process data, that try to make decisions about how to act. RL is based on the idea that rewarding smart decisions and penalizing mistakes can speed up algorithmic learning.  

Other factors such as massive cloud compute, faster processors and modern simulation technologies are also speeding up RL as it solves many complex problems.  

Some forms of AI need very large datasets to learn from; for example, image recognition. But RL can learn from data generated by its interaction with a dynamic environment such as a simulation. We call that data “synthetic.” That means it doesn’t have to come from the real world. We can create as much of it as we need to train the algorithm. 

RL algorithms use trial and error, exploring what works and what doesn’t, and measuring outcomes against a defined “reward function,” until they find a sequence of actions that lead to your goal. The reward function defines which actions are rewarded, and which are penalized, just as you might teach a puppy to come by giving it a dog treat. 

RL is a tool for optimization. Pathmind uses RL agents to explore, interact with, and learn from simulations in AnyLogic, a popular simulation software tool.

How Reinforcement Learning Works

 

Reinforcement learning is the result of repeatedly interacting with an environment through a cyclic iteration of four steps. By observing, performing an action on the environment, calculating a reward, and evaluating the outcome over time an AI agent can learn to achieve a specific task or sequence of decisions needed to execute a task.

Graphic showing Pathmind observing the state of an environment

1. Observe

A Pathmind learning agent can observe the current state of a simulation environment.

Graphic showing Pathmind choosing to take an action based on observations

2. Act

From observations, the agent decides which action it can take.

Graphic showing Pathmind observing changes to the environment based on the actions taken and generating a reward score

3. Calculate reward

Due to the action taken, the environment changes state. The agent receives information about the change of state from observations and calculates a reward score.

Graphic showing Pathmind determining if an action should be taken again based on observations and the reward score

4. Evaluate

Using new observations and reward score, the learning agent can determine if an action was good and should be repeated or bad and avoided.

At the outset, the simulation designer sets the reward policy (reward function) that the learning agent AI has a goal to maximize. There are no hints or suggestions on how to solve the problem. The learning agent figures out how to perform the task to maximize the reward by repeating the above steps. Starting from totally random trials, the learning agent can finish with sophisticated tactics that out perform both human and optimizing algorithm decision makers. Reinforcement learning leverages the power of iterative search over many trials and is the most effective way to train AI from simulation. Pathmind AI platform uses the latest cutting edge academic research in reinforcement learning combined with our teams years of experience researching and developing advanced AI solutions.

Inside a Pathmind Learning Agent

Graphic showing a visualization of a Pathmind learning agent

Within each Pathmind learning agent, there is a function that observes the state of the agent’s environment (the inputs) and maps that state to actions to be taken (the outputs). That is, the agent looks at its surroundings, and decides what to do. 

In RL, we call that function a “policy” (in politics, a policy is a set of actions to be taken, and that is also true for RL, but RL policies are much more concrete and precise). The agent is asking itself: Given what I see, how should I act? The RL policy returns an answer to that question. 

When RL algorithms learn, that is called training. The longer the algorithm trains, the better its answer becomes. 

Pathmind trains many RL algorithms at once, and combines them with other learning techniques, to produce a policy to make good decisions in simulations. Those simulations are built to resemble business scenarios — like warehouses, supply chains, and factories that need to operate efficiently — and the goal is to produce an RL policy that knows how to make decisions that improve business operations. 

In Pathmind, a policy is trained using reinforcement learning in an AnyLogic simulation built by the user.

Why Reinforcement Learning?

Pathmind’s reinforcement learning application combines trial and error, and the evolutionary selection of top-performing algorithms, to produce policies that can often solve problems too complex for human intuition to grasp. You can think of it as a massive search engine to find the best decisions within a simulation.  

RL can also solve problems beyond the reach of other machine learning and mathematical optimization techniques. Unlike other methods that rely on gathering real-world data, RL learns by interacting with the simulation’s environment. This spares the Pathmind user from having to collect real-world data, which can be expensive and time consuming. 

This helps when data collection is limited or impossible, such as with sociological and public health models. It also applies when businesses face large and expensive decisions for which they have few benchmarks: for example, new physical plants, new factory layouts, new delivery routes, etc.

Harness the Power of the Cloud

Cloud with light bulb and gear icon

Trial and error takes a long time. If you were to explore each possible decision on your laptop, one trial at a time, then training an RL algorithm would take forever. But Pathmind’s distributed learning algorithms harness the power of the cloud to run many training sessions in parallel, selecting the top performers as the training proceeds — and they do so automatically. 

Pathmind also sets up, runs, and spins down the clusters of cloud computing used to train reinforcement learning. Users simply upload their simulation, define their goal and download an RL policy once training is complete. The trained policy can then be tested and validated inside of a simulation tool. 

Companies use simulation to surface different decision-making strategies across different scenarios, which may have conflicting criteria of success. 

As a result, simulation developers may want to experiment more freely to find alternate solutions, balance competing business criteria or simply explore how they can achieve the best possible results by closely examining all factors contributing to an overall outcome. 

Pathmind’s web app makes those experiments simple, enabling users to quickly and easily find the best possible outcomes.

Deploy Trained AI 

Cloud with light bulb, gear, and branches icon

Once trained and validated, a Pathmind AI policy can be deployed as an easy-to-use web REST API to make real-world decisions embedded in your business operations.

To take the pain out of AI deployment and management, Pathmind provides a deployment solution that generates the API prediction service and generates API documentation, API  examples and client test code so that your team can easily integrate a trained AI policy within a larger solution. 

In sum, Pathmind’s AI platform combines AnyLogic simulations with reinforcement learning algorithms, running on cloud-scale computing to automate optimization and create deployable AI decision making that solve hard real world problems  – no PhD required.