Madhavan Mukund



Data Mining and Machine Learning,
Jan-Apr 2023

Assignment 3: Semi-Supervised Learning

28 Mar, 2023
Due 15 Apr 18 Apr, 2023



The Task

In Lecture 16 on 9 March, 2023, we saw an example of using clustering for semi-supervised learning of the MNIST dataset, where we used K-Means clustering to identify a small subset of labelled images to seed the classification process.

Your task is to conduct a similar experiment with the following two datasets.

  1. The Fashion MNIST dataset for which we discussed how to build a neural network (multi-layer perceptron, MLP) model in Lecture 20 on 28 March, 2023.

  2. The Overhead MNIST dataset for which you can find a standard neural network (multi-layer perceptron, MLP) model here.

The MNIST example started with 50 clusters. Experiment with different (relatively small) values of K for these two datasets.


Solving the Task

  • You can use any programming language, including Python and R. You can make use of standard packages for analytics and machine learning. Clearly document any external packages used by your code.

  • Submit the following via Moodle, as a Jupyter notebook if you are using Python and as a single archive (zip, tar.gz, …) otherwise:

    • The code you used to solve the assignment.

    • If you have voluminous output to report, save it somewhere on the cloud and provide a link.

    • A short write up describing your experiments.

  • You may work alone or in groups of two. Each group makes a single submission to Moodle. Use either person's Moodle account to submit. The submission should mention the names of the two partners.

  • There will be a short oral presentation and question/answer session for each submission.