Chennai Mathematical Institute

Seminars




Data Science Seminar
Date: Friday, 9 August 2024
Time: 2:00 PM
Venue: NKN hall
Importance of Statistics in the Era of Data Science

Rajeeva Karandikar
Chennai Mathematical Institute.
09-08-24


Abstract

Over the last two decades, Data Science has emerged as a major theme and people have been asking questions as to the relevance of Statistics in this new age of Data Science , AI/ML, Analytics etc. In this talk, we will see through examples that just having a large chunk of data is not enough to make the correct decision. It is well understood that the data based decision strongly depends on the validity of the data- and is called GIGO- Garbage In Garbage Out. But apart from validity of the data, there are several other issues that one must take into account. For example, Is the data representative of the population that one wishes to draw conclusions about? Have we got data on all variables associated with the phenomenon under consideration? Importance of these is well known to the statistical community.

Also, one must take into account the consequences of making an incorrect decision. One should take into account all that one knows about the phenomenon that one wishes to make a decision about and translate it into a mathematical or statistical model - after all it is the model that will give the correct frame of reference between the data on one hand and desired conclusion on the other hand. Thus GMGO - Garbage Model Garbage Output is as important as GIGO. Indeed, my hypothesis is that with decisions based on data being on the rise (with explosion of computing power at hand and communication speed), there is an urgent need for increased statistical literacy levels.