Statistical Reinforcement Learning: Modern Machine Learning Approaches
Statistical Reinforcement Learning: Modern Machine Learning Approaches
Masashi Sugiyama
ISBN 9781439856895 – CAT# K12676
Series: Chapman & Hall/CRC Machine Learning & Pattern Recognition
Features
- Provides an up-to-date and comprehensive introduction to RL
- Presents various types of RL approaches, such as model-based and model-free approaches, policy iteration, and policy search methods
- Uses illustrative examples for readers to easily understand concepts
- Covers approaches recently introduced in the data mining and machine learning fields
Summary
Reinforcement learning is a mathematical framework for developing computer agents that can learn an optimal behavior by relating generic reward signals with its past actions. With numerous successful applications in business intelligence, plant control, and gaming, the RL framework is ideal for decision making in unknown environments with large amounts of data.
Supplying an up-to-date and accessible introduction to the field, Statistical Reinforcement Learning: Modern Machine Learning Approaches presents fundamental concepts and practical algorithms of statistical reinforcement learning from the modern machine learning viewpoint. It covers various types of RL approaches, including model-based and model-free approaches, policy iteration, and policy search methods.
- Covers the range of reinforcement learning algorithms from a modern perspective
- Lays out the associated optimization problems for each reinforcement learning scenario covered
- Provides thought-provoking statistical treatment of reinforcement learning algorithms
The book covers approaches recently introduced in the data mining and machine learning fields to provide a systematic bridge between RL and data mining/machine learning researchers. It presents state-of-the-art results, including dimensionality reduction in RL and risk-sensitive RL. Numerous illustrative examples are included to help readers understand the intuition and usefulness of reinforcement learning techniques.
This book is an ideal resource for graduate-level students in computer science and applied statistics programs, as well as researchers and engineers in related fields.
Table of Contents
Introduction to Reinforcement Learning
Reinforcement Learning
Mathematical Formulation
Structure of the Book
Model-Free Policy Iteration
Model-Free Policy Search
Model-Based Reinforcement Learning
MODEL-FREE POLICY ITERATION
Policy Iteration with Value Function Approximation
Value Functions
State Value Functions
State-Action Value Functions
Least-Squares Policy Iteration
Immediate-Reward Regression
Algorithm
Regularization
Model Selection
Remarks
Basis Design for Value Function Approximation
Gaussian Kernels on Graphs
MDP-Induced Graph
Ordinary Gaussian Kernels
Geodesic Gaussian Kernels
Extension to Continuous State Spaces
Illustration
Setup
Geodesic Gaussian Kernels
Ordinary Gaussian Kernels
Graph-Laplacian Eigenbases
Diffusion Wavelets
Numerical Examples
Robot-Arm Control
Robot-Agent Navigation
Remarks
Sample Reuse in Policy Iteration
Formulation
Off-Policy Value Function Approximation
Episodic Importance Weighting
Per-Decision Importance Weighting
Adaptive Per-Decision Importance Weighting
Illustration
Automatic Selection of Flattening Parameter
Importance-Weighted Cross-Validation
Illustration
Sample-Reuse Policy Iteration
Algorithm
Illustration
Numerical Examples
Inverted Pendulum
Mountain Car
Remarks
Active Learning in Policy Iteration
Efficient Exploration with Active Learning
Problem Setup
Decomposition of Generalization Error
Estimation of Generalization Error
Designing Sampling Policies
Illustration
Active Policy Iteration
Sample-Reuse Policy Iteration with Active Learning
Illustration
Numerical Examples
Remarks
Robust Policy Iteration
Robustness and Reliability in Policy Iteration
Robustness
Reliability
Least Absolute Policy Iteration
Algorithm
Illustration
Properties
Numerical Examples
Possible Extensions
Huber Loss
Pinball Loss
Deadzone-Linear Loss
Chebyshev Approximation
Conditional Value-At-Risk
Remarks
MODEL-FREE POLICY SEARCH
Direct Policy Search by Gradient Ascent
Formulation
Gradient Approach
Gradient Ascent
Baseline Subtraction for Variance Reduction
Variance Analysis of Gradient Estimators
Natural Gradient Approach
Natural Gradient Ascent
Illustration
Application in Computer Graphics: Artist Agent
Sumie Paining
Design of States, Actions, and Immediate Rewards
Experimental Results
Remarks
Direct Policy Search by Expectation-Maximization
Expectation-Maximization Approach
Sample Reuse
Episodic Importance Weighting
Per-Decision Importance Weight
Adaptive Per-Decision Importance Weighting
Automatic Selection of Flattening Parameter
Reward-Weighted Regression with Sample Reuse
Numerical Examples
Remarks
Policy-Prior Search
Formulation
Policy Gradients with Parameter-Based Exploration
Policy-Prior Gradient Ascent
Baseline Subtraction for Variance Reduction
Variance Analysis of Gradient Estimators
Numerical Examples
Sample Reuse in Policy-Prior Search
Importance Weighting
Variance Reduction by Baseline Subtraction
Numerical Examples
Remarks
MODEL-BASED REINFORCEMENT LEARNING
Transition Model Estimation
Conditional Density Estimation
Regression-Based Approach
Q-Neighbor Kernel Density Estimation
Least-Squares Conditional Density Estimation
Model-Based Reinforcement Learning
Numerical Examples
Continuous Chain Walk
Humanoid Robot Control
Remarks
Dimensionality Reduction for Transition Model Estimation
Sufficient Dimensionality Reduction
Squared-Loss Conditional Entropy
Conditional Independence
Dimensionality Reduction with SCE
Relation to Squared-Loss Mutual Information
Numerical Examples
Artificial and Benchmark Datasets
Humanoid Robot
Remarks
References
Index
Author Bio
Masashi Sugiyama received his bachelor, master, and doctor of engineering degrees in computer science from the Tokyo Institute of Technology, Japan. In 2001 he was appointed assistant professor at the Tokyo Institute of Technology and he was promoted to associate professor in 2003. He moved to the University of Tokyo as professor in 2014.
He received an Alexander von Humboldt Foundation Research Fellowship and researched at Fraunhofer Institute, Berlin, Germany, from 2003 to 2004. In 2006, he received a European Commission Program Erasmus Mundus Scholarship and researched at the University of Edinburgh, Scotland. He received the Faculty Award from IBM in 2007 for his contribution to machine learning under non-stationarity, the Nagao Special Researcher Award from the Information Processing Society of Japan in 2011, and the Young Scientists’ Prize from the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology for his contribution to the density-ratio paradigm of machine learning.
His research interests include theories and algorithms of machine learning and data mining, and a wide range of applications such as signal processing, image processing, and robot control. He published Density Ratio Estimation in Machine Learning (Cambridge University Press, 2012) and Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation (MIT Press, 2012).
