What is Topological Data Analysis?

The application of topological techniques to traditional data analysis, which before has mostly developed on a statistical setting, has opened up new opportunities. There is a growing interest to explore this field further as well as look for new applications. Some of the notable successes (such as the identification of a new type of breast cancer, or the discovery new basketball playing positions) in the recent years have earned praise from the industry.
Indeed, with the explosion in the amount and variety of available data, identifying, extracting and exploiting their underlying structure has become a problem of fundamental importance. Many such data come in the form of point clouds, sitting in potentially high-dimensional spaces, yet concentrated around low-dimensional geometric structures that need to be uncovered. The non-trivial topology of these structures is challenging for classical exploration techniques such as dimensionality reduction. The goal of TDA is therefore to develop novel methods that can reliably capture geometric or topological information (connectivity, loops, holes, curvature,etc) from the data without the need for an explicit mapping to lower-dimensional space.
Watch the following videos to know more about TDA and its applications.

Learning goals of this course

This course is not about algebraic topology. We will not spend time, for example, proving various properties of simplicial homology. The aim of this course is to introduce TDA as a tool to data analysts and teach using a hands on approach. We will explore a lot of real-life examples using various software packages. In fact, the Moodle course page contains around 30 articles about applications of TDA to various fields like developmental economics, image segmentation, cosmology, medical imaging, protein flexibility, weather forecasting, robotics etc. Some of these will be discussed in class, some will be assigned as course project. At the end of the course we expect that the participants gain insight into how persistent homology works, they have used all the important packages and seen enough data analysis examples. This should enable them to use TDA in new areas in future.

Course Information

Lecture schedule: Mondays and Wednesdays at 10:30 am
Venue: LH 801
Instructors: Sourish Das
Priyavrat Deshpande
Office hours: TBA
Prerequisites: A first course in topology/ metric spaces. Proficiency in Python, R and Matlab.
Text:
  1. Computational Topology An Introduction by H. Edelsbrunner, J. Harer, AMS, 2013.
  2. Topology and Data by Gunnar Carlson, Bull. Amer. Math. Soc. 46 (2009).
Evaluation: Assignments (30%), project report (40%), project presentation (40%).
Recourse material: Detailed syllabus, class notes, extra notes, assignments, project reading etc. is uploaded on Moodle.

Syllabus

The broad strokes as well as the topics covered in each week.
Mathematics Statistics Software tools
  • Basics of topology.
  • Simplicial complexes, homology and Betti number.
  • Persistent homology and bar codes.
  • Morse theory and Mapper.
  • Dimensionality reduction and PCA.
  • Clustering algorithms.
  • Inference techniques.

Assignments

The homework will be assigned bi-weekly; it will be uploaded on Moodle. It is your duty to submit the solutions on time. There are two types of home works; math problems and programming. The math problems are to be solved and the solutions to be submitted in the written format. Copying and/or plagiarism will not be tolerated. Here are a few writing guidelines you might want to follow.
  1. Feel free to work together, but you should submit your own work.
  2. Your questions/comments/suggestions are most welcome. I will also be fairly generous with the hints. However, do not expect any kind of help, including extensions, on the day a homework is due.
  3. Please turn in a neat stapled stack of papers. Refrain from using blank / printing paper as far as possible use ruled paper.
  4. Your final version should be as polished as you can make it. This probably means that you cannot submit sketchy solutions or sloppily written first versions. Please expect to do a fair amount of rewriting. Do not hand in work with parts crossed out; either use a pencil and erase or rewrite.
  5. Please write complete sentences that form paragraphs and so forth. It might be a good idea to use short simple sentences; avoid long complicated sentences.
  6. Do use commonly accepted notation (e.g., for functions, sets, etc.) and never invent new notation when there is already some available.
  7. For the programming assignments you may upload your source code and screenshot of a successful run on the Moodle. However, do hand in a short write up about your program and a few sample outputs.

Projects

Project is an important part of this course. A list of possible projects and the corresponding reference material is located on the Moodle course page. Everybody is supposed to have selected their project topic before of end of August. A typical project involves reading and understanding a paper (that deals with a real life application of TDA); reproducing their results and finally try their algorithm on a different data set. A project report will consist of a summary of the paper you read and a discussion about the new analysis you might have performed.