A wind farm with 20 turbines and 3 service crews is modeled here. The purpose of this simulation is to maximize profit (i.e. total revenue minus work cost). To achieve that, both the baseline heuristics and the reinforcement learning policy must proactively repair equipment before it breaks down, while minimizing the frequency of repairs that drive up work and travel costs.
Compared to routine maintenance and predictive failure heuristics, Pathmind’s reinforcement learning achieves 58% more total profit.
Automated Guided Vehicles (AGVs)
A fleet of automated guided vehicles (AGVs) optimizes its dispatching routes to maximize product throughput in a manufacturing center. When components arrive to be processed, the AGVs carry them to the appropriate machines in the factory following a specific processing sequence.
Reinforcement learning outperforms the heuristic to increase throughput in the factory by 50%.
Autonomous Moon Landing
A lunar module attempts to make a safe landing on the moon in this simulation. Several factors such as speed are monitored as the module approaches the designated landing area, and each factor must have values within a safe zone to avoid crashing or drifting into space.
AI learns to land safely on the moon without human intervention, which is not possible with random actions.
Clustering of Physical Stores
Store locations are shifted to determine their optimal location in a city, relative to competitors as well as customer base, in order to maximize their individual sales. But other physical stores employ the same strategy, using reinforcement learning to find the best location.
This competitive dynamic allows the reinforcement learning policy to learn that if all competitors cluster close to one another near the center of the city, sales will be maximized for each individual store.
Reinforcement learning discovers the Nash Equilibrium for several stores.
This model simulates product delivery in Europe, and its purpose is to minimize product wait times and maximize profit. The supply chain includes three manufacturing centers and fifteen distributors that order random amounts of the product every 1-2 days. A fleet of trucks serves each manufacturing facility. When a manufacturing facility receives an order from a distributor, it checks the number of products in storage. If the required amount is available, it sends a loaded truck to the distributor. Otherwise, the order waits until the factory produces sufficient inventory.
Reinforcement learning outperforms the heuristic of sending goods to the nearest manufacturing center by over 80%, maximizing profit and minimizing wait times.