Madhavan Mukund



Data Mining and Machine Learning,
Aug–Dec 2020

Data Mining and Machine Learning

Aug–Dec, 2020


Administrative details

  • Lectures and Tutorials: Video lectures will be uploaded each week. There will be one live online class discussion each week, details to be announced.

  • Online discussion forum: Please sign up for the course on Piazza

  • Teaching assistants: Debjit Paria, Samarth Ramesh

  • Evaluation:

    • 15% for quizzes on Moodle

    • 35% for assignments

    • 50% for final exam

    • Copying is fatal

  • Text and reference books:

    • Web Data Mining by Bing Liu.

    • Foundations of Data Science by Avrim Blum, John Hopcroft and Ravi Kannan

    • Machine Learning by Tom Mitchell.

    • C4.5: Programs for Machine Learning by Ross Quinlan.

    • Artificial Intelligence: A Modern Approach (3rd ed) by Stuart J Russell and Peter Norvig.

    • Hands-On Machine Learning with Scikit-Learn, Keras and Tensorflow (2nd ed) by Aurélien Géron


Course plan

Here is a tentative list of topics.

  • Supervised learning: Frequent itemsets, association rules, regression, decision trees, naive Bayes, classifier evaluation, PAC learning, VC dimension, ensemble classifiers, expectation maximization, semi-supervised learning, linear classifiers, perceptrons, SVM, kernel methods, neural networks.

  • Unsupervised learning: Clustering, outlier detection, PCA, dimensionality reduction.

  • Other topics: Probabilistic graphical models, Bayesian networks, hidden Markov models, …



Lecture summary

Week 1, August 17-23, 2020

  • Lecture 1: Introduction (Slides), (Video)

  • Lecture 2: Market-Basket Analysis (Slides), (Video)

  • Lecture 3: The Apriori Algorithm (Slides), (Video)

  • Lecture 4: Association Rules (Slides), (Video)

    • Liu, Chapter 2.1, 2.2, 2.5

Week 2, August 24-30, 2020


Week 3, August 31-September 6, 2020

  • Lecture 10: Linear Regression (Slides), (Video)

  • Lecture 11: Regression, the non-linear case (Slides), (Video)

    • Géron, Chapter 4
  • Lecture 12: Regression for Classification (Slides), (Video)

  • Lecture 13: Regression using decision trees (Slides), (Video)

    • Géron, Chapter 6
  • Lecture 14: Handling overfitting in decision trees (Slides), (Video)

    • Liu, Chapter 3.2.4
    • Mitchell, Chapter 3.7.1
    • Quinlan, Chapter 4

Week 4, September 7-13, 2020

  • Live session: 12 September 2020 (Video)


Week 5, September 14-20, 2020

  • Lecture 15: Naïve Bayes classifiers (Slides), (Video)

    • Liu, Chapter 3.6
  • Lecture 16: Naïve Bayes text classification (Slides), (Video)

    • Liu, Chapter 3.7
  • Live session: 18 September 2020 (Video)


Week 6, September 21-27, 2020

  • Lecture 17: PAC Learning (Slides), (Video)

    • Blum, Hopcroft and Kannan, Chapter 5.4, upto 5.4.2
  • Live session: 25 September 2020 (Video)


Week 8, October 5-11, 2020

  • Lecture 18: VC-Dimension (Slides), (Video)

    • Blum, Hopcroft and Kannan, Chapter 5.5, upto 5.5.2
  • Lecture 19: Shatter functions (Slides), (Video)

    • Blum, Hopcroft and Kannan, Chapter 5.5.3–5.5.5, 5.6

Week 9, October 12-18, 2020


Week 10, October 19-25, 2020


Week 11, October 26-November 1, 2020


Week 12, November 2-8, 2020


Week 13, November 9-15, 2020


Week 14, November 16-22, 2020


Week 15, November 23-29, 2020


Week 16, November 30-December 6, 2020

  • Live class, November 30: Hidden Markov Models (Class notes ) (Video)

    Hidden Markov Models: Filtering, prediction, smoothing, most likely explanation; Dynamic Bayesian Networks

    • Russell and Norvig, Chapter 15.1, 15.2, 15.5 (brief)