Data Mining and Machine Learning

Aug-Nov, 2013

Assignment 1: Supervised Learning

7 September, 2013, due 22 September 2013


The "Census Income" data set from the UCI Machine Learning Repository contains income information for over 48,000 individuals taken from the 1994 US census. The original URL for the UCI repository is http://archive.ics.uci.edu/ml/datasets/Census+Income.

The task is to predict whether a person makes over 50K a year. In this assignment you have to build two classifiers for this data set:

Do 10-fold cross validation to evaluate your classifier and submit your tabulated results along with your code.

More information about the dataset is available at http://www.cmi.ac.in/~madhavan/courses/datamining13-aug/assignment1/census-income-readme.txt. The actual dataset is available as a csv (comma separated values) file at http://www.cmi.ac.in/~madhavan/courses/datamining13-aug/assignment1/census-income.csv. The first line has the attribute names.

Note: You should ignore the attribute fnlwgt when building the classifier. This attribute describes the sampling weight of each entry and is only useful if you are trying to extrapolate aggregate statistics from this dataset.


Last updated Sat 7 Sep, 2013