A wind farm with 20 turbines and 3 service crews is modeled here. The purpose of this simulation is to maximize profit (i.e. total revenue minus work cost). To achieve that, both the baseline heuristics and the reinforcement learning policy must proactively repair equipment before it breaks down, while minimizing the frequency of repairs that drive up work and travel costs.
Compared to routine maintenance and predictive failure heuristics, Pathmind’s reinforcement learning achieves 58% more total profit.
Warehouse Putaway & Picking
This model simulates a fulfillment center that uses Reinforcement Learning to move boxes from one place to another and obtain the maximum performance of its workers. It highlights the advantages of using Reinforcement Learning by Pathmind in a simulation model to obtain better results in a dynamic and complex environment.
Reinforcement learning learns to strategically arrange pallets closest to their expected final destination.
This simulation models a factory where 6 products must reach specific working station placed on the same manufacturing line. The movements are handled by rolling conveyors and shuttles. The products initial positions (up to 21 different locations) and targets (up to 3 different target stations) are randomly assigned at the beginning of each simulation. The RL policy is able to evaluate the best route for each situation in order to avoid unnecessary movements, resolve blockages, and successfully complete the manufacturing process in as few moves as possible.
Reinforcement learning is able to complete manufacturing in fewer movements than the heuristic.
Clustering of Physical Stores
Store locations are shifted to determine their optimal location in a city, relative to competitors as well as customer base, in order to maximize their individual sales. But other physical stores employ the same strategy, using reinforcement learning to find the best location.
This competitive dynamic allows the reinforcement learning policy to learn that if all competitors cluster close to one another near the center of the city, sales will be maximized for each individual store.
Reinforcement learning discovers the Nash Equilibrium for several stores.
This model simulates product delivery in Europe, and its purpose is to minimize product wait times and maximize profit. The supply chain includes three manufacturing centers and fifteen distributors that order random amounts of the product every 1-2 days. A fleet of trucks serves each manufacturing facility. When a manufacturing facility receives an order from a distributor, it checks the number of products in storage. If the required amount is available, it sends a loaded truck to the distributor. Otherwise, the order waits until the factory produces sufficient inventory.
Reinforcement learning outperforms the heuristic of sending goods to the nearest manufacturing center by over 80%, maximizing profit and minimizing wait times.
Autonomous Moon Landing
A lunar module attempts to make a safe landing on the moon in this simulation. Several factors such as speed are monitored as the module approaches the designated landing area, and each factor must have values within a safe zone to avoid crashing or drifting into space.
AI learns to land safely on the moon without human intervention, which is not possible with random actions.
Automated Guided Vehicles (AGVs)
A fleet of automated guided vehicles (AGVs) optimizes its dispatching routes to maximize product throughput in a manufacturing center. When components arrive to be processed, the AGVs carry them to the appropriate machines in the factory following a specific processing sequence.
Reinforcement learning outperforms the heuristic to increase throughput in the factory by 50%.
AI Crane Warehouse
The model consists of a generic warehouse where product packages are stored stacked on static racks. There are three processes interacting with the warehouse: 1. Start Process (SP) – Generates packages 2. Intermediate Process (IP) – Transforms packages 3. Final Process (FP) – Consumes packages. An AI-Controlled overhead crane governs all the movements between the processes and the warehouse.
Interconnected Call Centers
Calls are made to each of five interconnected call centers simultaneously. Once a call is received, each call center will decide to either accept the call or transfer it to another call center. A call is balked when the wait time for a particular caller exceeds a randomly initialized threshold (between 20 and 25 minutes). We compare the reinforcement learning policy with three call routing heuristics (no call transferring, shortest queue, and most efficient call center). The objective is to minimize wait times and to minimize balked callers.
The reinforcement learning policy trained using Pathmind outperforms the heuristics by over 9.6%.