Reinforcement Learning
A branch of machine learning involved with actors, or agents, taking actions in an environment to boost some incentive that they collect along the way. It is a purposefully broad term, which is why reinforcement learning methods can be applied to a wide variety of real-world problems.
Assume a user is watching someone play a video game. The agent is the player and the environment in the game. The rewards that the player receives (for example, defeating an opponent or completing a level) or does not receive (for example, stepping into a trap or losing a fight) and teaches him how to be a better player.
Reinforcement learning does not fit neatly into the Supervised, Un-Supervised, Semi-Supervised learning groups.
Each decision made by the model in supervised learning, for example, is independent and has no bearing on what we see in the future. Instead, in reinforcement learning, we are interested in our agent’s long-term approach, which might involve sub-optimal decisions at halfway steps and a trade-off between discovery (of unknown paths) and utilization of what we already know about the environment.
Terminology
Let’s go over the fundamental concepts and terminology of Reinforcement Learning.
Agent
A machine that is embedded in an environment and takes measures to alter the environment’s condition. Mobile robots, software agents, and industrial controllers are some examples.
Environment
The environment is the system in which the agent perceives and acts. In RL, Markov Decision Processes (MDPs) is also known as environment. An MDP is a pair.
- (A,S,P,R,γ)
- S denotes a finite set of states.
- A is a limited number of acts.
- P is a probability matrix for state transitions.
- R denotes a reward function.
- y is a discount factor, [0,1].
Markov Decision Processes depicts a wide range of real-world situations, from a basic chessboard to a much more complicated video game.
The rewards are determined by whether the consumer wins or loses the game, winning actions yielding a higher return than losing actions.
Reward Function
The reward mechanism associates states with their corresponding rewards. It is the data that the agents use to learn how to handle their surroundings.
Research goes into developing a good reward function and solving sparse rewards, which occurs when the environment’s rewards are often sparse and do not encourage the agent to learn appropriately.
Approaches
Policy-Based Approach
Policy-based approaches to RL aim to learn the best possible policy. Policy models would either produce the best possible transition from the current state or distribution of possible behavior.
Value-based Approach
Users want to find the optimum value function in value-based methods, which is the highest value function for all policies. Based on the model values, the user can choose which actions to take (i.e., which policy to use).
Application
One of the most common applications in RL is the multi-armed bandit. Each action selection is like a play of one of the slot machine’s levers, and the rewards are the payoffs for hitting the jackpot
Python Walk-through
import numpy as np # Number of bandits k = 3 # Our action values Q = [0 for _ in range(k)] # This is to keep track of the number of times we take each action N = [0 for _ in range(k)] # Epsilon value for exploration eps = 0.1 # True probability of winning for each bandit p_bandits = [0.45, 0.40, 0.80]
Reinforcement Learning is a developing area with a lot more to learn. In reality, research is yet to investigate general-purpose algorithms and models. The significant factor is to become acquainted with concepts such as value functions, policies, and MDPs.
Other useful articles:
- OOP in Python
- Python v2 vs Python v3
- Variables, Data Types, and Syntaxes in Python
- Operators, Booleans, and Tuples
- Loops and Statements in Python
- Python Functions and Modules
- Regular Expressions in Python
- Python Interfaces
- JSON Data and Python
- Pip and its Uses in Python
- File Handling in Python
- Searching and Sorting Algorithms in Python
- System Programming (Pipes &Threads etc.)
- Database Programming in Python
- Debugging with Assertion in Python
- Sockets in Python
- InterOp in Python
- Exception Handling in Python
- Environments in Python
- Foundation of Data Science
- Reinforcement Learning
- Python for AI
- Applied Text Mining in Python
- Python Iterations using Libraries
- NumPy vs SciPy
- Python Array Indexing and Slicing
- PyGame
- PyTorch
- Python & Libraries