The future of statistics in the Big Data ecosystem

Oscar C.

Published Jun 3, 2015

Recently I was asked by different people what are the challenges and the future of statistics in the Big Data ecosystem. Today I came across to a very interesting YouTube video / talk given by professor Michael I. Jordan, University of California, Berkeley, who is in my opinion the most accomplished researcher in machine learning and applied statistics. The title of the talk summarises it all: On the Computational and Statistical Interface and "Big Data".

Abstract:
The rapid growth in the size and scope of datasets in science and technology has created a need for novel foundational perspectives on data analysis that blend the statistical and computational sciences. That classical perspectives from these fields are not adequate to address emerging problems in "Big Data" is apparent from their sharply divergent nature at an elementary level---in computer science, the growth of the number of data points is a source of "complexity"that must be tamed via algorithms or hardware, whereas in statistics, the growth of the number of data points is a source of "simplicity" in that inferences are generally stronger and asymptotic results or concentration theorems can be invoked. We present several research vignettes on topics at the computation/statistics interface, an interface that we aim to characterize in terms of theoretical tradeoffs between statistical risk, amount of data and "externalities" such as computation,communication and privacy.

Slide are available here: http://www.stat.harvard.edu/NRC2014/MichaelJordan.pdf

Link to the talk: https://www.youtube.com/watch?v=zdavG9xbVp0

To view or add a comment, sign in

The future of statistics in the Big Data ecosystem

Oscar C.

More articles by Oscar C.

Explore content categories

More articles by Oscar C.

Why Separate Storage and Query Engine Concerns?

Comparing aggregation approaches when using lateral view explode on nested data in Spark

Useful resources for combining data and data augumentation

Bootstrap for Big Data: the ingenious idea of Bag of Little Bootstraps

Query By Committee and Challenges in Batch Learning

Explore content categories