Predictive Analytics: Regression and Classification

Objectie: This is a core course of M.Sc. Data Science program. Students from other programs of CMI can take this course, provided they have already taken the course titled "Probability and Statistics with R". As you know CMI's M.Sc. Data Science program is mainly oriented for students who aspire to have a corporate career. A predictive model is an important tool that is used daily in corporate practices. The course will provide an overview of basic ideas in statistical predictive models. The objective is to understand how statistical models are used to handle prediction problems. The stress will be on understanding the construction of the models and implementation.


Python or R: The course is agnostic to either Python or R. you must be equally at ease with both language.

Home Work and Quizes: I will update more detail regarding Home work and quizes in the moodle


Lectures
Lecture 1 - Part 1 : Introduction YouTube Link (Slides) Welcome to Predictive Analytics: Regression & Classification course of CMI. This Aug-Nov 2020 semester. In this pandemic year, I am going to record my lectures and upload them on YouTube. So that you can see the lectures from the safety of your home. Stay safe and stay well. Each lecture will have 4 or 5 parts. Each part will be on average 10 to 15 minutes of length. So each lecture will be of one hour and fifteen minutes of length. In Lecture 1 Part 1, I introduce the course and the objective of the course. As you know the objective of the CMI's MSc in Data Science is to prepare the students for an industry career. In our experience, we see in industry and business, predictive analytics is extremely important, and almost part of the daily life of a data scientist in the corporate sector. Hence we have this full course on "Predictive Analytics: Regression & Classification" as a core course. The other core course is "Advance Machine Learning" which I will teach jointly with Prof Madhavan Mukund. In that course, we will focus on deep learning and its application on Supervised, Unsupervised, and Reinforcement learning. This course most of the material I relied on Hastie & Tibshiran's two books "Introduction to Statistical Learning" and "Elements of Statistical Learning." I like these two books because I learned these materials from these two books and their paper that they wrote over last 20 years.

Lecture 1 - Part 2 : Least Square Methods YouTube Link (Slides) In this part, I introduce the concept of simple and multiple linear regression as regression hyper-plane. Then I introduce the concept of least square methods.

Lecture 1 - Part 3 : Least Square Method YouTube Link (Slides) In part 3 of lecture 1, we discuss the results normal equations will always have at least one solution. If the system is of full rank then the least square method will give you analytically solvable unique solutions. Also, we discuss why mean absolute deviation does not have an analytical solution?

Lecture 1 - Part 4 : Gauss Markov theorem YouTube Link (Slides) In the part 4 of lecture 1, we introduce the underlying assumptions of linear regression models. We discuss how under the assumptions of the homogeneity and independence on the residuals, the Gauss Markov theorem is developed. We also discuss the concept of mean squared error (MSE). How the MSE and prediction accuracy are related?

Lecture 1 : Part 5: Geometry of Regression Model and Feature Engineering YouTube Link (Slides) In the part 5 (and the last part) of lecture 1, we discuss some examples. Also, we discuss what discuss as basis expansion in mathematical statistics is known as feature engineering in ML. With feature engineering, we put the original data into a higher dimension and hope that we will find a good fit for linear hyper-plane in a higher dimension, which will explain the non-linear relationship between the feature space and target variable.

Lecture 2 - Part 1: Statistical Inference of Regression Coefficient YouTube Link (Slides) In this part of lecture 2, we discuss the sampling distribution of regression coefficients, statistical inference of regression coefficient. We discuss how t-test can be conducted to test if a predictor/feature has a statistically significant effect on the dependent variable or target variables.

Lecture 2 - Part 2 : Checking Model Assumptions YouTube Link (Slides) In this part, we discuss how to check the model assumptions like (1) Independence, (2) Homogeneity or (3) Normality using statistical tests. We discuss how Bartlet's rank test can be used to check randomness in residuals. You can check the Breusch-Pagan Test to check homogeneity. Then we discuss how Kolmogorv-Smironov test can be used to check the normality !

Lecture 2 - Part 3 : Model Comparison with R-squared, RMSE, AIC or BIC YouTube Link (Slides) In this part of Lecture 2, we discuss how to compare two or several models using selection criteria, such as RMSE, R-squared, adjusted-R-Squared, AIC or BIC and select the best model among the set of possible models.

Lecture 2 - Part 4: Model Complexity and Bias-Variance tradeoff YouTube Link (Slides) In this part of lecture 2, we tried to understand the issues of model complexities. We discuss the concept of what we really mean by a complex model. Some time complex models help us to achieve high prediction accuracy. However, we often achieve it as cost of the interpretability of the model. We try to capture the model complexity with respect to the concept of bias-variance tradeoff. Finally, we discuss how we can achieve a parsimonious model by minimizing MSE.

Lecture 2 - Part 5 : Feature selection and Dimension Reduction YouTube Link (Slides) In the last and final part of lecture 2, we discuss the stepwise feature selection or variable selection algorithms. We discuss how feature selection technique can be used to reduce the dimension of the problem and achieve an interpretable and parsimonious regression model.

Lecture 3 - Part 1 : Multicollinearity and Variance Inflation Factor YouTube Link (Slides) In this part of lecture 3, we discuss the problem of multicollinearity in the regression problem. We discuss how the correlation between strong or more features induces a strong correlation between the OLS estimators of coefficients. This makes the sampling distribution a very strong elliptical shape due to tight correlation. As a result, the standard error increases significantly and the confidence interval becomes very large. We discuss how the Variance Inflation Factor (VIF) can be used to measure the contribution of each feature towards the multicollinearity problem.

Lecture 3 - Part 2: Regularization with LASSO, Ridge and Elastic Net YouTube Link (Slides) In the second part of Lecture 3, we discuss what is "Ill-posed problems"? We discuss how Tikhonov Regularization reconstructs the unconstrained minimization of the OLS method into constraint minimization. We discuss how L2 penalty corresponds to Ridge solution and L1 penalty corresponds to LASSO solutions. We discuss why LASSO is a continuous subset selection and one should use LASSO feature selection over stepwise feature selection.

Lecture 3 - Part 3: Regression Analysis with Python YouTube Link (Colab Notebook) (mtcars.csv data set) In the third part of Lecture 3, we present a demo of regression analysis using Python in Google Colab Notebook. We mainly used the 'sklearn' and 'statmodels' package for this exercise.

Lecture 3 - Part 4: Regression Analysis with R YouTube Link (R code) This is the last part of Lecture 3. In this part, we discuss how to implement the regression analysis project for mtcars dataset using R. The R-implementation is being done only with two features weight and horsepower. We showed here with only these two features we can achieve almost 85% accuracy in terms of R-squared. I recommend you to try some more features and try to discover if you can increase the model accuracy further.

Lecture 4 - Part 1 : Capital Asset Pricing Model YouTube Link (Slides) In part 1 of lecture 4, I present a very celebrated application of the statistical regression model in quantitative finance, known as the Capital Asset Pricing Model (CAPM). The CAPM is often used to evaluate if an asset is overpriced, or underpriced, or fairly priced.

Lecture 4 - Part 2 : Bootstrap Regression YouTube Link (Slides) In part 2 and the last part of Lecture 4, we discuss the concept of Bootstrap statistics and nonparametric bootstrap regression. We discuss the two algorithms of Bootstrap regression: (1) Residual Bootstrap Regression and (2) Paired Bootstrap Regression

Lecture 5 - Part 1 : Time Series Forecasting with Regression Model YouTube Link (Slides) In lecture 5, we focus on how regression model technique can be implemented for the long term and short term forecasting. We used the AirPassengers dataset for this lecture. We develop a long-term forecasting model by modelling the trend and seasonality as a function of time. On the other hand, we develop the model for short term forecasting using the Auto-regressive model.

Lecture 5 - Part 2 : Granger Causal model. YouTube Link (Slides) (R code) In this part of lecture 5, I introduce a nice application of the regression model, known as the Granger Causal model. Correlation does not imply Causation. In practice establishing causation from the data is very difficult. However, Granger causal regression model tries to answer the question of causality with limited capability.

Lecture 6 - Part 1 : Ridge regression with Python. YouTube Link (Colab Notebook) In this video, we go through the hands-on session of handling multicollinearity and Ridge regression with Boston house price dataset.

Lecture 7 - Part 1 : Logistic Regression YouTube Link (Slides) In this video, we introduce the concept of logistic regression for binary class classification problem.

Lecture 7 - Part 2 : MLE of coefficient of Logistic Regression YouTube Link (Slides) In this part of Lecture 7, we discuss how to estimate the regression coefficient of logistic regression.

Lecture 7 - Part 3 : Fit logistic Regression with optim function in R YouTube Link ( R Code ) In this part 3 of lecture 7, we present a general recipe to fit any statistical model, by minimizing the negative log-likelihood function using the optim function in R. Here we particularly implemented it for logistic regression. Then we compare it with the built-in function glm in R.