Madhavan Mukund



Data Mining and Machine Learning,
Aug-Nov 2017

Data Mining and Machine Learning

Aug-Nov, 2017


Administrative details

  • Teaching assistants: Akash Kumar, Siddarth R, Nisarg Patel

  • Evaluation:

    • Assignments 40%, midsemester exam 20%, final exam 40%

    • Copying is fatal

  • Textbook and reading material

    • Main textbooks:

    • Supplementary material:

      • Data Mining; Concepts and Techniques by Jiawei Han and Micheline Kamber.

      • Machine Learning by Tom Mitchell.

      • C4.5: Programs for Machine Learning by Ross Quinlan.

      • Artificial Intelligence: A Modern Approach by Stuart J Russell and Peter Norvig.


Course plan

Here is a tentative list of topics.

  • Supervised learning: Frequent itemsets, association rules, decision trees, naive Bayes, SVM, classifier evaluation, expectation maximization, ensemble classifiers.

  • Unsupervised learning: Clustering, outlier detection.

  • Text mining: Basic ideas from information retrieval, TF/IDF model, Page Rank, HITS

  • Other topics: Probabilistic graphical models, Bayesian networks, Markov models, neural networks, ranking and social choice, …



Lecture summary

  • Lecture 1, 08 Aug 2017:

    Frequent itemsets, a-priori algorithm

    • Liu, Chapter 2.1 and 2.2.1 (part)
  • Lecture 2, 10 Aug 2017:

    A-priori algorithm, association rule generation, tabular data, multiple minimum supports, class association rules

    • Liu, Chapter 2.2-2.5
  • Lecture 3, 17 Aug 2017:

    Decision trees

    • Liu, Chapter 3.2
    • Mitchell, Chapter 3
    • Quinlan, Chapters 1,2
  • Lecture 4, 22 Aug 2017:

    Discretizing continuous attributes

    • Liu, Chapter 3.2.3
    • Mitchell, Chapter 3.7.2
    • Quinlan, Chapter 2.4

    Overfitting and tree pruning

    • Liu, Chapter 3.2.4
    • Mitchell, Chapter 3.7.1
    • Quinlan, Chapter 4

    Classifier evaluation

    • Liu, Chapter 3.3
    • Manning, Raghavan and Schütze, Chapter 8.3
  • Lecture 5, 24 Aug 2017:

    Naive Bayesian Classifiers

    • Liu, Chapter 3.6

    Generative probablisitic models and parameter estimation, naive Bayes text classifiction

  • Lecture 6, 29 Aug 2017:

    Support vector machines (SVMs), the linearly separable case

  • Lecture 7, 31 Aug 2017:

    SVMs with soft margins, kernel functions

  • Lecture 8, 5 Sep 2017:

    A formal setting for machine learning, online learning, Perceptron algorithm, VC-dimension

    • Blum, Hopcroft and Kannan, Chapter 5.1, 5.5, 5.6, 5.9
  • Lecture 9, 7 Sep 2017:

    True error and sample error, sample size vs overfitting, VC-dimension, ensemble classifiers: bagging and boosting

    • Liu, Chapter 3.10
    • Blum, Hopcroft and Kannan, Chapter 5.1, 5.2, 5.5, 5.9, 5.10
  • Lecture 10, 12 Sep 2017:

    Clustering: K-Means, hierarchical

    • Liu, Chapter 4.1—4.4
  • Lecture 11, 19 Sep 2017:

    Density based clustering

    Density based local outlier detection

    Semi supervised learning: Expectation-Maximization