Text and reference books:
Reinforcement learning: Theory and algorithms by Alekh Agarwal, Nan Jiang and Sham Kakade
Regret analysis of stochastic and non stochastic multi-armed bandit problems by Sebastien Buebeck and Nicolo Cesa-Bianchi
Lecture 2, MDP-s basics
Lecture 3, Policy Iteration, Chernoff bounds, Q value iteration
Lecture 4, Q value iteration, Induced MDP
Lecture 5, Induced MDP, Implicit explore exploit
Lecture 6, Fredman's inequality
Lecture 7, Analysis of R-max
Lecture 9, Policy gradient, REINFORCE
Lecture 10, SARSA, Function approximators, TD
Lecture 11, TD, Monte Carlo function approximators, Batch reinforcement, DQNN
Lecture 12, Multi armed bandits Video recording
Lecture 13, Stochastic bandits Video recording
Lecture 14, UCB Video recording
Lecture 15, Adversarial bandits Video recording
Lecture 16, Exp-3 analysis Video recording
Lecture 17, Contextual bandits Video recording
Lecture 18, EXP-4 Video recording