The future of statistics in the Big Data ecosystem
Recently I was asked by different people what are the challenges and the future of statistics in the Big Data ecosystem. Today I came across to a very interesting YouTube video / talk given by professor Michael I. Jordan, University of California, Berkeley, who is in my opinion the most accomplished researcher in machine learning and applied statistics. The title of the talk summarises it all: On the Computational and Statistical Interface and "Big Data".
Abstract:
The rapid growth in the size and scope of datasets in science and technology has created a need for novel foundational perspectives on data analysis that blend the statistical and computational sciences. That classical perspectives from these fields are not adequate to address emerging problems in "Big Data" is apparent from their sharply divergent nature at an elementary level---in computer science, the growth of the number of data points is a source of "complexity"that must be tamed via algorithms or hardware, whereas in statistics, the growth of the number of data points is a source of "simplicity" in that inferences are generally stronger and asymptotic results or concentration theorems can be invoked. We present several research vignettes on topics at the computation/statistics interface, an interface that we aim to characterize in terms of theoretical tradeoffs between statistical risk, amount of data and "externalities" such as computation,communication and privacy.
Slide are available here: http://www.stat.harvard.edu/NRC2014/MichaelJordan.pdf
Link to the talk: https://www.youtube.com/watch?v=zdavG9xbVp0