Written By: Qingyang Xu (et AI)
Date Created: December 11, 2025
Last Modified: January 11, 2026
CS 224R Deep Reinforcement Learning
This page summarize Sutton & Barto (2015) textbook on RL
Reinforcement Learning (Sutton & Barto)
Key concepts
- Policy: $\pi(a\mid s)$
- State value: $V_\pi(s)=\mathbb E_\pi[G_t\mid S_t=s]$
- Action value: $Q_\pi(s,a)=\mathbb E_\pi[G_t\mid S_t=s,A_t=a]$
- Optimal action value: $Q_*(s,a)=\max_\pi Q_\pi(s,a)$
- Model: $p(s',r\mid s,a)$ or a sampler for it
- Prediction problem: given $\pi$, compute $V_\pi$ or $Q_\pi$
- Control problem: compute an optimal policy $\pi_$ (equivalently $V_, Q_*$)
- Learning updates from real experience (interacting with environment)
- Planning updates from model-generated simulations
- Episode: agent-environment interactions from initial to final states