Reinforcement learning is a type of machine learning based on rewards and punishments. This article explains its definition, how it functions, and its primary applications.
Artificial intelligence (AI) programs constantly use machine learning to improve speed and efficiency. In reinforcement learning, AI is rewarded for desired actions and punished for undesired actions.
Reinforcement learning can only take place in a controlled environment. The programmer assigns positive and negative values (or "points") to certain behaviors, and the AI can freely explore the environment to seek rewards and avoid punishments.
Ideally, the AI will delay short-term gains in favor of long-term gains, so if it chooses between earning one point in one minute or earning 10 points in two minutes, it will delay gratification and go for the higher value. At the same, it will learn to avoid punitive actions that cause it to lose points.
Andrii Shyp/Getty Images
Real-world applications of AI based on reinforcement learning are somewhat limited, but the method has shown promise in laboratory experiments.
For example, reinforcement learning has trained AI to play video games. The AI learns how to achieve the game's goals through trial and error. For example, in a game likeSuper Mario Bros., the AI will determine the best way to reach the end of each level while avoiding enemies and obstacles. Dozens of AI programs have successfully beaten specific games, and the MuZero program has even mastered video games that it wasn't originally designed to play.
Reinforcement learning has been used to train enterprise resource management (ERM) software to allocate business resources to achieve the best long-term outcomes. Reinforcement learning algorithms have even been used to train robots to walk and perform other physical tasks. Reinforcement learning has also shown promise in statistics, simulation, engineering, manufacturing, and medical research.
The major limitation of reinforcement learning algorithms is their reliance on a closed environment. For example, a robot could use reinforcement learning to navigate a room where everything is stationary. However, reinforcement learning wouldn't help navigate a hallway full of moving people because the environment is constantly changing. The robot would just aimlessly bump into things without developing a clear picture of its surroundings.
Since this learning relies on trial and error, it can consume more time and resources. On the plus side, reinforcement learning doesn't require much human supervision.
Due to its limitations, reinforcement learning is often combined with other types of machine learning. Self-driving vehicles, for example, use reinforcement learning algorithms in conjunction with other machine learning techniques, such as supervised learning, to navigate the roads without crashing.
Reinforcement learning algorithms can be separated into two broad categories: model-based or model-free. A model-based algorithm develops a model of its environment to predict the rewards of potential actions. In model-free reinforcement learning, the AI agent learns directly through trial and error.
Model-based algorithms are ideal for simulations and static environments, such as an assembly line, where the goal is to repeat the same action repeatedly. Examples of model-based reinforcement learning algorithms include value iteration and policy iteration, in which the AI agent follows a strict formula (or "policy") to determine the best course of action.
Model-free algorithms are useful for more dynamic, real-world situations. An example of model-free reinforcement learning is the Deep Q-Network (DQN) algorithm, which uses a neural network to predict outcomes based on past actions and results. Applications of DQN range from predicting the stock market to regulating air quality in large buildings.
There is a variation of reinforcement learning called inverse reinforcement learning, which is when the AI agent learns by observing the actions of humans.
FAQQ-learning is another term for model-free algorithms. This specific kind of reinforcement learning doesn't need a model of an environment to make predictions about it; it aims to "learn" the actions for a variety of states.
A "policy" is a plan that a reinforcement learning system uses to solve problems. It defines what it does and when based on the information it has and the solution it's trying to achieve.