Teaching assistants: Akash Kumar, Siddarth R, Nisarg Patel
Evaluation:
Assignments 40%, midsemester exam 20%, final exam 40%
Copying is fatal
Textbook and reading material
Main textbooks:
Web Data Mining by Bing Liu.
An Introduction to Information Retrieval by Christopher D Manning, Prabhakar Raghavan and Hinrich Schütze
Foundations of Data Science by Avrim Blum, John Hopcroft and Ravi Kannan
Supplementary material:
Data Mining; Concepts and Techniques by Jiawei Han and Micheline Kamber.
Machine Learning by Tom Mitchell.
C4.5: Programs for Machine Learning by Ross Quinlan.
Artificial Intelligence: A Modern Approach by Stuart J Russell and Peter Norvig.
Here is a tentative list of topics.
Supervised learning: Frequent itemsets, association rules, decision trees, naive Bayes, SVM, classifier evaluation, expectation maximization, ensemble classifiers.
Unsupervised learning: Clustering, outlier detection.
Text mining: Basic ideas from information retrieval, TF/IDF model, Page Rank, HITS
Other topics: Probabilistic graphical models, Bayesian networks, Markov models, neural networks, ranking and social choice, …
Lecture 1, 08 Aug 2017:
Frequent itemsets, a-priori algorithm
Lecture 2, 10 Aug 2017:
A-priori algorithm, association rule generation, tabular data, multiple minimum supports, class association rules
Lecture 3, 17 Aug 2017:
Decision trees
Lecture 4, 22 Aug 2017:
Discretizing continuous attributes
Overfitting and tree pruning
Classifier evaluation
Lecture 5, 24 Aug 2017:
Naive Bayesian Classifiers
Generative probablisitic models and parameter estimation, naive Bayes text classifiction
Lecture 6, 29 Aug 2017:
Support vector machines (SVMs), the linearly separable case
Lecture 7, 31 Aug 2017:
SVMs with soft margins, kernel functions
Lecture 8, 5 Sep 2017:
A formal setting for machine learning, online learning, Perceptron algorithm, VC-dimension
Lecture 9, 7 Sep 2017:
True error and sample error, sample size vs overfitting, VC-dimension, ensemble classifiers: bagging and boosting
Lecture 10, 12 Sep 2017:
Clustering: K-Means, hierarchical
Lecture 11, 19 Sep 2017:
Density based clustering
Density based local outlier detection
Semi supervised learning: Expectation-Maximization
Lecture 12, 21 Sep 2017:
Convergence of EM algorithm
EM for text classification
Reading material for 03–12 Oct 2017:
Regression
Lecture 13, 17 Oct 2017:
Boolean information retrieval: documents, terms, postings
Tokenization, stop words, stemming and lemmatization, ,skip lists, positional postings and phrase queries
Parametric and zone indexes, weighted zone scoring, tf-idf, scoring in the vector space model.
Lecture 14, 19 Oct 2017:
Tf-idf and variants, PageRank
Lecture 15, 24 Oct 2017:
PageRank, HITS, Latent Semantic Indexing
Lecture 16, 31 Oct 2017:
Bayesian networks: basic definitions, semantics, exact inference
Lecture 17, 02 Nov 2017:
Bayesian networks: Conditional independence, D-separation
Lecture 18, 07 Nov 2017:
Bayesian networks: exact inference, approximate inference, sampling
Lecture 19, 09 Nov 2017:
Temporal models: inference (most likely explanation, Viterbi algorithm), Hidden Markov Models (HMMs)
Lecture 20, 21 Nov 2017:
Neural networks: Multilayer perceptrons, sigmoid neurons, network architecture, learning weights, cross entropy cost function, overfitting and regularization
Lecture 21, 23 Nov 2017:
Neural networks: Backpropagation, the unstable gradient problem, convolutional networks, deep learning