Madhavan Mukund

In Assignment 1, you created decision tree and naïve Bayes classifiers for the Diabetes 130-US hospitals for years 1999-2008 Data Set .

Create a random forest classifier for this dataset and compare its performance to your earlier decision tree classifier.
Use bagging to create an ensemble of naïve Bayes classifiers for this dataset and compare the performance to your earlier naïve Bayes classifier on the full training set.

You can use any programming language, including Python and R. You can make use of standard packages for analytics and machine learning. Clearly document any external packages used by your code.
Submit via Moodle a single archive (zip, tar.gz, …) containing:
- The code you used to solve the assignment.
- A link to the output produced by your code. Do not include the output in this submission. Save it somewhere on the cloud and provide a link.
- A short write up describing how your code ran on the data sets: the parameters used, time taken, space required, how you split the training and test data, and anything else of interest.
You may work individually or in groups of two. If you are working in a group, the group makes a single submission to Moodle. Use either person's Moodle account to submit. The submission should mention the names of the two partners.
There will be a short oral presentation and question/answer session for each submission.

Teaching