K V Subrahmanyam

Text and reference books:

Reinforcement learning: Theory and algorithms by Alekh Agarwal, Nan Jiang and Sham Kakade

Regret analysis of stochastic and non stochastic multi-armed bandit problems by Sebastien Buebeck and Nicolo Cesa-Bianchi

Lecture 2, MDP-s basics

Lecture 3, Policy Iteration, Chernoff bounds, Q value iteration

Lecture 4, Q value iteration, Induced MDP

Lecture 5, Induced MDP, Implicit explore exploit

Lecture 6, Fredman's inequality

Lecture 7, Analysis of R-max

Lecture 9, Policy gradient, REINFORCE

Lecture 10, SARSA, Function approximators, TD

Lecture 11, TD, Monte Carlo function approximators, Batch reinforcement, DQNN

Lecture 12, Multi armed bandits Video recording

Lecture 13, Stochastic bandits Video recording

Lecture 14, UCB Video recording

Lecture 15, Adversarial bandits Video recording

Lecture 16, Exp-3 analysis Video recording

Lecture 17, Contextual bandits Video recording

Lecture 18, EXP-4 Video recording