- AI, But Simple
- Posts
- Fundamentals of Reinforcement Learning
Fundamentals of Reinforcement Learning
AI, But Simple Issue #9
Fundamentals of Reinforcement Learning
AI, But Simple Issue #9
Reinforcement Learning (RL) is a subset of machine learning that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences.
The fundamental idea behind RL is to allow computers to learn from their mistakes.
This is very similar to human behavior, as we evolved through trial and error.
Some common applications of RL include playing games (AlphaGo, chess), robotics (BostonDynamics robots), autonomous vehicles (self-driving cars), finance-related activities (algorithmic trading), and other optimization problems.
In RL, the agent is the decision-maker who interacts with the environment to achieve a certain goal.
Unlike supervised learning, where the feedback provided to the agent (or model) are correct labels, reinforcement learning uses rewards and punishments as signals for positive and negative behavior.
In RL, there is no direct indication of the correct action, but just a positive or negative signal
In reinforcement learning, the agent’s objective is to learn a policy (defines the agent’s behavior) to maximize the amount of reward
A policy can be thought of as a strategy to determine the actions an agent will take at a specific period in time (state)
Some key terms that describe the basic elements of an RL problem are:
Environment — Physical world in which the agent operates
State — Current situation of the agent
Reward — Feedback from the environment
Policy — Method to map agent’s state to actions
Value — Future reward that an agent would receive by taking an action in a particular state
Let’s use a game example to model an RL problem. Let’s suppose we were playing the game Pac-Man, where the goal of the game is to eat the little circles without losing all of your lives.