Supply chain managers face a unique set of problems when it comes to optimizing inventory. Customers want their orders delivered quickly, but too much inventory in stock leads to high costs. If you factor in changing demand, backlogged orders, and other uncertainties, it’s clear that supply chain managers need a fast, intelligent optimization solution.

Reinforcement learning can optimize in response to shifting data, and works even when new information needs to be factored in quickly, making it an ideal tool for supply chain inventory management. To show how successful RL can be applied to this use case, we modified a supply chain model that is publicly available on AnyLogic Cloud. Adding reinforcement learning capabilities to the model resulted in dramatic improvements compared to a more traditional optimization tool.


The supply chain used for this project features a factory, warehouse, and retailer. The retailer orders more inventory from the warehouse if it does not have enough to meet customer demand. Likewise, the warehouse places orders from the factory if it cannot meet the retailer’s request. Customer wait time increases as the order goes further up the chain.

At a quick glance, it might seem logical to keep as much inventory as possible at each location so customer wait times stay low. Add in holding costs, however, and too much product at any one location will quickly impact profit margins. The goal of the simulation is to figure out the optimal minimum inventory levels requesting a restock at all three stages in the chain to balance both wait times and profit margins.

Why Pathmind Was Needed

An important challenge for any supply chain is that customer demand changes unexpectedly. Preventing stock outs is important to avoid lost sales. Many optimization tools, such as heuristics, are not able to deal with variability in demand. Reinforcement learning offers a more dynamic solution capable of optimizing decisions even when data isn’t static. In the case of this supply chain simulation, customer demand fluctuates daily, making it difficult to forecast how much stock is necessary to meet demand while also minimizing excess inventory.

Screenshot of the supply chain optimization model


We first ran OptQuest to see what kind of results a traditional optimization tool would get, resulting in both daily holding costs and customer wait times that were higher than desired. Next, we ran the model with the Pathmind policy and compared the results to OptQuest. Reinforcement learning beat the OptQuest numbers by more than 20%.

The improvement to the outcome of the supply chain model after AI was introduced shows the limitations of other optimization methods when data such as customer demand is not static. Real-world supply chains face a wide range of such variabilities and need an optimization method that will keep them running through disruption.

Project Resources