The Diabetes 130-US hospitals for years 1999-2008 Data Set from the UCI Machine Learning Repository contains information about over 100,000 patients with diabetes treated acros 130 US hospitals. The task is to predict whether a patient requires readmmission. The target attribute readmitted takes three values, <30, >30 and NO.
There are 50 attributes overall, including the target. Of these, the attributes, weight, payer_code and medical_specialty have significant numbers of missing values and should be discarded. There are four other attributes that have a smaller fraction of missing values. You can omit the rows where these missing values occur and work with the rest of the data.
Build two models to predict the value of the attribute readmitted.
You can use any programming language, including Python and R. You can make use of standard packages for analytics and machine learning. Clearly document any external packages used by your code.
Submit via Moodle a single archive (zip, tar.gz, …) containing:
The code you used to solve the assignment.
A link to the output produced by your code. Do not include the output in this submission. Save it somewhere on the cloud and provide a link.
A short write up describing how your code ran on the data sets: the parameters used, time taken, space required, how you split the training and test data, and anything else of interest.
You may work individually or in groups of two. If you are working in a group, the group makes a single submission to Moodle. Use either person's Moodle account to submit. The submission should mention the names of the two partners.
There will be a short oral presentation and question/answer session for each submission.