EE380 Computer Systems Colloquium

Big Data is (at least) Four Different Problems
Wednesday, June 1, 2016 - 4:30pm to 5:30pm
Gates B03
Michael Stonebraker (MIT)
Abstract / Description: 

"Big Data" means different things to different people. To me, it means one of four totally different problems:

Big volumes of data, but "small" analytics. The traditional data warehouse vendors support SQL analytics on very large volumes of data. In this talk, I make a few comments on where I see this market going, and possible technical disruptions ahead.
Big analytics on big volumes of data. By big analytics, I mean data clustering, regressions, machine learning, and other much more complex analytics on very large amounts of data. I will explain the various approaches to integrating complex analytics into DBMSs, and discuss which ones seem more promising.

Big velocity. By this I mean being able to absorb and process a firehose of incoming data for applications like electronic trading. In this market, the traditional SQL vendors are a non-starter. I will discuss alternatives including complex event processing (CEP), NoSQL and NewSQL systems. I will also make a few comments about the "internet of things".

Big Diversity. Many enterprises are faced with integrating a larger and larger number of data sources with diverse data (spreadsheets, web sources, XML, traditional DBMSs). The traditional ETL products do not appear up to the challenges of this new world, and I talk about alternate ways to go, and conclude that this is the "800 pound gorilla in the corner".