Supply chain optimization tools need to be robust and capable of managing product flows across multiple types of locations in a network. Multi-echelon inventory optimization can be a challenge for optimizers and baseline heuristics and supply chain managers need an intelligent solution that can run efficiently through the complexities of operations. 

Using Pathmind AI and a product delivery simulation, we demonstrated how reinforcement learning can outperform heuristics and optimizers to improve order serviceability and profits in a supply chain facing a multi-echelon inventory optimization problem.


The supply chain in this project is made up of a network of  2 manufacturing centers, 4 distributors, and 8 retailers. Manufacturing centers produce partially-finished goods and send them to distributors for final assembly. Once the time needed to finish the manufacturing process is met, a distributor can send the products to a retailer. 

Retailers can only meet customer demand with inventory they have in stock. Customers waiting too long to receive their order will generate a lost sale. Demand is variable and changes depending on weekly and seasonal patterns. 

Keeping a large amount of inventory in stock may keep customers happy, but inventory holding costs can quickly eat into profits. Likewise, inefficiently moving goods between locations in the supply chain adds to transportation costs. The simulation aims to proactively make decisions on how much inventory retailers and distributors should order and the best location to order goods from while balancing holding and transportation costs.

Why Pathmind Was Needed

A baseline for this multi-echelon inventory optimization problem was established using an (r, Q) inventory hybridized with a nearest-neighbor heuristic. Put simply, the base rule for orders states that the closest physical distributor or manufacturing center always fulfills an order. Since the retailers, distributors, and manufacturing centers are not evenly spread across the map and demand is variable, that baseline does not guarantee the best results. 

Reinforcement learning is a more intelligent approach since it can optimize around changing customer demands while figuring out efficient paths to more profits and orders fulfilled. Concurrently, that new solution will also cut down on inventory holding and transportation costs.

Screenshot showing stats from the Multi-Echelon Product Delivery model


Pathmind reinforcement learning outperformed the baseline and increased the serviceability of orders from 65% to 85%, while also increasing profits by 34%. To achieve those improvements, the AI learned to place smaller orders more frequently and balance them across locations to prevent long queues. While the optimizer made static choices, Pathmind’s reinforcement learning makes decisions at each step and selects the best actions depending on the current state of the environment. The RL is also able to monitor time and get a sense of how much demand to expect.

Reinforcement learning has a clear advantage over traditional optimizers and simple heuristics in complex multi-echelon inventory optimization challenges. Additional metrics, such as minimizing travel distance or focusing on inventory costs, could also be added since reinforcement learning is able to optimize for multiple goals and can be adapted to the specific needs of a use case.

Project Resources

    • View this model on AnyLogic Cloud.
    • See how we added reinforcement learning to this model and train your own AI policy with the Multi-Echelon Product Delivery tutorial.
    • Check out the webinar “Pathmind Reinforcement Learning for Simulation – Introduction, Process Overview, and Example Models,” featuring this project.