Madhavan Mukund



Data Mining and Machine Learning,
Jan-Apr 2023

Assignment 1: Classification

29 Jan, 2023
Due 12 Feb, 2023



Task 1

The "Bank Marketing Data Set" from the UCI Machine Learning Repository is related with direct marketing campaigns (phone calls) of a Portuguese banking institution.

The classification goal is to predict if the client will subscribe a term deposit (variable y). You can find a description of the attributes at the original UCI URL, https://archive.ics.uci.edu/ml/datasets/Bank+Marketing.

The UCI page contains multiple versions of the data, so the version that you need to work with is here:

Your task is to build two classifiers for this data set: a decision tree and a naïve Bayes classifier. Use a suitable evaluation metric to compare the performance of the classifiers.


Task 2

The following is a database of 1698 Hindi Movies from 2005-2017: https://www.kaggle.com/datasets/rishidamarla/bollywood-movies-dataset

A movie is a hit if revenue > budget, and it is a flop otherwise. The goal is to predict whether a movie will be a hit or flop, given all the other attributes.

Once again, your task is to build two classifiers for this data set: a decision tree and a naïve Bayes classifier. Use a suitable evaluation metric to compare the performance of the classifiers.


Solving the Tasks

  • You can use any programming language, including Python and R. You can make use of standard packages for analytics and machine learning. Clearly document any external packages used by your code.

  • Submit the following via Moodle, as a Jupyter notebook if you are using Python and as a single archive (zip, tar.gz, …) otherwise:

    • The code you used to solve the assignment.

    • If you have voluminous output to report, save it somewhere on the cloud and provide a link.

    • A short write up describing how your code ran on the data sets: the parameters used, time taken, space required, and anything else of interest. This should include a comparative evaluation of the two classifiers.

  • You may work alone or in groups of two. Each group makes a single submission to Moodle. Use either person's Moodle account to submit. The submission should mention the names of the two partners.

  • There will be a short oral presentation and question/answer session for each submission.