### Statistical Reinforcement Learning: Modern Machine Learning Approaches

**Masashi Sugiyama**

ISBN 9781439856895 – CAT# K12676

Series: Chapman & Hall/CRC Machine Learning & Pattern Recognition

### Features

- Provides an up-to-date and comprehensive introduction to RL
- Presents various types of RL approaches, such as model-based and model-free approaches, policy iteration, and policy search methods
- Uses illustrative examples for readers to easily understand concepts
- Covers approaches recently introduced in the data mining and machine learning fields

### Summary

Reinforcement learning is a mathematical framework for developing computer agents that can learn an optimal behavior by relating generic reward signals with its past actions. With numerous successful applications in business intelligence, plant control, and gaming, the RL framework is ideal for decision making in unknown environments with large amounts of data.

Supplying an up-to-date and accessible introduction to the field, Statistical Reinforcement Learning: Modern Machine Learning Approaches presents fundamental concepts and practical algorithms of statistical reinforcement learning from the modern machine learning viewpoint. It covers various types of RL approaches, including model-based and model-free approaches, policy iteration, and policy search methods.

- Covers the range of reinforcement learning algorithms from a modern perspective
- Lays out the associated optimization problems for each reinforcement learning scenario covered
- Provides thought-provoking statistical treatment of reinforcement learning algorithms

The book covers approaches recently introduced in the data mining and machine learning fields to provide a systematic bridge between RL and data mining/machine learning researchers. It presents state-of-the-art results, including dimensionality reduction in RL and risk-sensitive RL. Numerous illustrative examples are included to help readers understand the intuition and usefulness of reinforcement learning techniques.

This book is an ideal resource for graduate-level students in computer science and applied statistics programs, as well as researchers and engineers in related fields.

### Table of Contents

Introduction to Reinforcement Learning

Reinforcement Learning

Mathematical Formulation

Structure of the Book

Model-Free Policy Iteration

Model-Free Policy Search

Model-Based Reinforcement Learning

MODEL-FREE POLICY ITERATION

Policy Iteration with Value Function Approximation

Value Functions

State Value Functions

State-Action Value Functions

Least-Squares Policy Iteration

Immediate-Reward Regression

Algorithm

Regularization

Model Selection

Remarks

Basis Design for Value Function Approximation

Gaussian Kernels on Graphs

MDP-Induced Graph

Ordinary Gaussian Kernels

Geodesic Gaussian Kernels

Extension to Continuous State Spaces

Illustration

Setup

Geodesic Gaussian Kernels

Ordinary Gaussian Kernels

Graph-Laplacian Eigenbases

Diffusion Wavelets

Numerical Examples

Robot-Arm Control

Robot-Agent Navigation

Remarks

Sample Reuse in Policy Iteration

Formulation

Off-Policy Value Function Approximation

Episodic Importance Weighting

Per-Decision Importance Weighting

Adaptive Per-Decision Importance Weighting

Illustration

Automatic Selection of Flattening Parameter

Importance-Weighted Cross-Validation

Illustration

Sample-Reuse Policy Iteration

Algorithm

Illustration

Numerical Examples

Inverted Pendulum

Mountain Car

Remarks

Active Learning in Policy Iteration

Efficient Exploration with Active Learning

Problem Setup

Decomposition of Generalization Error

Estimation of Generalization Error

Designing Sampling Policies

Illustration

Active Policy Iteration

Sample-Reuse Policy Iteration with Active Learning

Illustration

Numerical Examples

Remarks

Robust Policy Iteration

Robustness and Reliability in Policy Iteration

Robustness

Reliability

Least Absolute Policy Iteration

Algorithm

Illustration

Properties

Numerical Examples

Possible Extensions

Huber Loss

Pinball Loss

Deadzone-Linear Loss

Chebyshev Approximation

Conditional Value-At-Risk

Remarks

MODEL-FREE POLICY SEARCH

Direct Policy Search by Gradient Ascent

Formulation

Gradient Approach

Gradient Ascent

Baseline Subtraction for Variance Reduction

Variance Analysis of Gradient Estimators

Natural Gradient Approach

Natural Gradient Ascent

Illustration

Application in Computer Graphics: Artist Agent

Sumie Paining

Design of States, Actions, and Immediate Rewards

Experimental Results

Remarks

Direct Policy Search by Expectation-Maximization

Expectation-Maximization Approach

Sample Reuse

Episodic Importance Weighting

Per-Decision Importance Weight

Adaptive Per-Decision Importance Weighting

Automatic Selection of Flattening Parameter

Reward-Weighted Regression with Sample Reuse

Numerical Examples

Remarks

Policy-Prior Search

Formulation

Policy Gradients with Parameter-Based Exploration

Policy-Prior Gradient Ascent

Baseline Subtraction for Variance Reduction

Variance Analysis of Gradient Estimators

Numerical Examples

Sample Reuse in Policy-Prior Search

Importance Weighting

Variance Reduction by Baseline Subtraction

Numerical Examples

Remarks

MODEL-BASED REINFORCEMENT LEARNING

Transition Model Estimation

Conditional Density Estimation

Regression-Based Approach

Q-Neighbor Kernel Density Estimation

Least-Squares Conditional Density Estimation

Model-Based Reinforcement Learning

Numerical Examples

Continuous Chain Walk

Humanoid Robot Control

Remarks

Dimensionality Reduction for Transition Model Estimation

Sufficient Dimensionality Reduction

Squared-Loss Conditional Entropy

Conditional Independence

Dimensionality Reduction with SCE

Relation to Squared-Loss Mutual Information

Numerical Examples

Artificial and Benchmark Datasets

Humanoid Robot

Remarks

References

Index

### Author Bio

**Masashi Sugiyama** received his bachelor, master, and doctor of engineering degrees in computer science from the Tokyo Institute of Technology, Japan. In 2001 he was appointed assistant professor at the Tokyo Institute of Technology and he was promoted to associate professor in 2003. He moved to the University of Tokyo as professor in 2014.

He received an Alexander von Humboldt Foundation Research Fellowship and researched at Fraunhofer Institute, Berlin, Germany, from 2003 to 2004. In 2006, he received a European Commission Program Erasmus Mundus Scholarship and researched at the University of Edinburgh, Scotland. He received the Faculty Award from IBM in 2007 for his contribution to machine learning under non-stationarity, the Nagao Special Researcher Award from the Information Processing Society of Japan in 2011, and the Young Scientists’ Prize from the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology for his contribution to the density-ratio paradigm of machine learning.

His research interests include theories and algorithms of machine learning and data mining, and a wide range of applications such as signal processing, image processing, and robot control. He published *Density Ratio Estimation in Machine Learning* (Cambridge University Press, 2012) and *Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation* (MIT Press, 2012).