Chennai Mathematical Institute


Data Science Colloquium Series
2:00 PM - 3:00 pm, NKN Hall, CMI New Building
Data Points in High Dimension: A Problem That Still Haunts the Practitioners

Sawata Sahoo


Data heterogeneity and high dimensional feature space are two of the most commonly occurring challenges in statistical modeling. When the two appear together, clustering the data points into segments of homogeneous points is scientifically a difficult problem on its own. The scope of separating data points into such homogeneous segments very delicately rests on few interesting conditions, largely related to feature dimension, data spread and feature dependency. The situation is a honey trap lure to dimension reduction techniques within the class of singular value decomposition. Unfortunately, this is far from getting us to the solution. There are two interesting and contradicting possibilities: There might be true clusters in the current feature space which is possibly blurred by so many features. Alternatively, we might see the separation of points at a different feature space. Are the two possibilities of talking about the same problem? The talk is about unfolding this never-ending mystery.

Short Biography:

Dr. Sahoo has been working as the Director of Advanced Analytics for the Algorithms and Data Insights team in Gartner. Previously, he worked as Senior Data Scientist for Walmart, Swiss Re, and Samsung. His primary interest has been Statistical methods and Machine Learning techniques in High Dimensional data related to feature transformation, feature selection and regularization.

He did his Ph.D. in Statistics from North Carolina State University, Raleigh after completing his Master and Bachelor Degree in Statistics from the University of Calcutta and Ramakrishna Mission Residential College respectively.

He loves to spend a good amount of leisure time in travelling, watching movies and cooking experiments.