• Text and reference books:

  • Reinforcement learning: Theory and algorithms by Alekh Agarwal, Nan Jiang and Sham Kakade

    Regret analysis of stochastic and non stochastic multi-armed bandit problems by Sebastien Buebeck and Nicolo Cesa-Bianchi

    Lecture 2, MDP-s basics

    Lecture 3, Policy Iteration, Chernoff bounds, Q value iteration

    Lecture 4, Q value iteration, Induced MDP

    Lecture 5, Induced MDP, Implicit explore exploit

    Lecture 6, Fredman's inequality

    Lecture 7, Analysis of R-max

    Lecture 9, Policy gradient, REINFORCE

    Lecture 10, SARSA, Function approximators, TD

    Lecture 11, TD, Monte Carlo function approximators, Batch reinforcement, DQNN

    Lecture 12, Multi armed bandits Video recording

    Lecture 13, Stochastic bandits Video recording

    Lecture 14, UCB Video recording

    Lecture 15, Adversarial bandits Video recording

    Lecture 16, Exp-3 analysis Video recording

    Lecture 17, Contextual bandits Video recording

    Lecture 18, EXP-4 Video recording