Reinforcement Learning refers to a class of machine learning problems, the purpose of which is to learn from successive experiences what needs to be done in order to find the best solution.
In such a problem, we say that an "agent" (the algorithm, in the sense of the code and the variables it uses) interacts with the "environment" to find the optimal solution. Reinforcement learning differs fundamentally from supervised learning and unsupervised learning problems by this interactive and iterative side. The agent tries several solutions (exploration), observes the reaction of the environment, and adapts its behavior (the variables) to find the best strategy (by "exploiting" the exploration results). One of the key concepts of this type of problem is the balance between the phases of exploration and exploitation.
This method is particularly suited to problems that require a compromise between the quest for short-term rewards and long-term rewards. Examples include: teaching a robot to walk in rough terrain, drive (autonomous cars), perform specific tasks, and more. The main families of reinforcement learning problems are bandit algorithms, (partly) Markov decision problems, and game trees.
To learn more.