What is reinforcement learning?

#Artificial intelligence
Mar 30, 2022

What is reinforcement learning?

Reinforcement Learning (RL) is a subfield of Machine Learning. The idea behind reinforcement learning is that an agent (an AI) learns from the environment by interacting with it (through trial-and-error) and receiving rewards (positive or negative) as feedback for performing actions. Thus, the learning process in RL is similar to that of humans and animals.

What are the essential components of RL?

In the following, the essential components of RL and how it works are explained using an example. In our example, a robot is to walk from a starting point A to a destination point B.

  • ‍Agent: a program that is trained to solve a specific task. In our example, the agent is the program that controls the robot. It should learn to walk to the target point.
  • ‍Environment: the world in which the agent performs its actions. In our example, the environment is the real world.
  • ‍Action: an action performed by the agent that triggers a change in the environment. In our example, the robot has four actions: walk forward, walk backward, walk left or walk right. After each of these actions, the state of the environment changes.
  • ‍State: the state of the environment at a specific time t. In our example, the state of the environment after the robot has performed one of the four actions.
  • ‍Reward: the positive or negative evaluation of an action. In our example, the program is rewarded for approaching the target point. The program is punished if it moves away from the target point, the robot falls down or it wastes time by running in circles, for example. The important point here is that the agent (i.e., our program) does not maximize the individual scores, but rather the summed scores. This allows the agent to trade off between short-term and long-term rewards and punishments.

Due to the trial-and-error approach, the RL process resembles a loop (see figure). The repetition of actions and the resulting rewards/punishments allow the agent (the program) to learn how to walk with the robot more efficiently from starting point A to destination point B.

Reinforcement Learning: Trial-and-Error Process

Probably the best-known application of reinforcement learning is DeepMind's Alpha Go program, which in 2014 became the first computer program to beat a professional Go player. Go is a complex strategy game for two people that originated in China. AlphaZero, the improved version of Alpha Go, was not only able to defeat its predecessor Alpha Go, but also generalized for other games. Thus, AlphaZero was able to beat the best chess computer up to that time (Stockfish).

Sources (translated): Towards Data Science and Medium

Download PDF

More contributions

Damage good. All good.

Damage good. All good.