Blog

Introduction to Reinforcement Learning and OpenAI Gym

Introduction to Reinforcement Learning and OpenAI Gym

RL deals with the environment and active feedback just like our real world. We are agents, we do things (actions) in real life if they bring us the positive impact we tend to do that again else we try to avoid that. It's how RL works.

Introduction to Reinforcement Learning and OpenAI Gym


Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning. This differs from supervised learning in that it is not appropriate to pose marked input / output sets, so sub-optimal behavior need not be directly corrected. Rather, the emphasis is on finding a balance between discovery (of uncharted territory) and exploitation (of current knowledge).

The agent monitors the world, works to communicate with the environment, and receives a positive or negative reinforcement.
Image Courtesy: Berkeley’s CS 294: Deep Reinforcement Learning

In simple words, RL deals with environment and active feedback just like our real world. We are agents, we do things (actions) in real life, if they bring us positive impact we tend to do that again else we try to avoid that. It's how RL works.

How to implement an RL Algorithm?


There are three approaches to implementing a learning algorithm for strengthening.

Value-based: You will try to maximize the value function V(s) in a quality-based model of reinforcement learning. In this approach, the agent expects the current states to return on a long-term basis under π law.

Policy-based: You are trying to come up with such a policy in a strategy-based RL system that the action taken in each state will enable you achieve maximum reward in the future.

Two types of policy-based approaches are:
Deterministic: The law π generates the same behavior for any system.
Stochastic: Every behavior has a certain likelihood that is calculated by the following equation.

Model-based: You need to create a virtual model for each system using this improvement training process. In that particular environment, the agent learns to act.

Reinforcement Training Forms Two forms of reinforcement learning methods are:

Positive: it is characterized as an event that occurs due to particular behaviour. It increases the behavior's strength and frequency and positively impacts the agent's action.
This type of strengthening helps you maximize performance and sustain change over a longer period of time. Too much strengthening, however, can lead to over-optimizing the state, which can affect the outcomes.

Negative: Harmful reinforcement is characterized as promoting behaviour which occurs due to a negative situation that should have been prevented or avoided. This lets you determine the minimum performance level.

But where we will apply RL?


When we work with supervised and unsupervised learning, usually we have a dataset. But, what about RL? Well, for RL - we have environments. So, it's more like a game/virtual environment instead of boring raw datasets. Do we have to to design those complex games/environments to get started in RL? Luckily, No.

The solution comes with OpenAI Gym. Gym is a toolkit for developing and comparing reinforcement learning algorithms. It makes no assumptions about the structure of your agent, and is compatible with any numerical computation library, such as TensorFlow or Theano.
The gym library is a collection of test problems — environments — that you can use to work out your reinforcement learning algorithms. These environments have a shared interface, allowing you to write general algorithms.

Installing gym is super easy. Just use pip.

pip install gym

Sample code to get started -

import gym
env = gym.make('CartPole-v0')
env.reset()
for _ in range(1000):
env.render()
env.step(env.action_space.sample()) # take a random action
env.close()

Applications of RL:


Applications of Deep RL. Courtesy: REINFORCEMENT LEARNING APPLICATIONS, Yuxi Li


Some of the practical applications of RL can be -

  • Recommender Systems
  • Decision Service
  • Horizon: An Open Source Applied RL Platform
  • News Recommendation
  • Multiple Items Recommendation
  • Computer Systems
  • Neural Architecture Search
  • Device Placement
  • Data Augmentation
  • Cluster Scheduling
  • An Open Platform for Computer Systems
  • NP-Hard Problems
  • Energy
  • Data Center Cooling
  • Smart Grid
  • Finance
  • Option Pricing
  • Order Book Execution
  • Healthcare
  • Dynamic Treatment Strategies
  • Medical Image Report Generation
  • Robotics
  • Dexterous Robot
  • Legged Robot
  • Transportation
  • Ridesharing Order Dispatching