ISL Colloquium

Large scale, real-time stream processing using Spark Streaming
Friday, May 1, 2015 - 7:05am to 8:05am
Mechanical Engineering, Rm 530-127
Tathagata Das (Databricks)
Abstract / Description: 

Spark Streaming is a extension to the Spark cluster computing framework that enables high-speed, fault-tolerant stream processing. It provides a new programming model called "discretized streams" which allows one to express complex distributed stream processing algorithm using simple, functional, batch-like operators. It makes it easy to apply windows on streams, join streams with static datasets, join streams with other streams. Furthermore, since it is built on the Spark processing engine, it allows developers to seamlessly other computation models - streaming machine learning algorithms, streaming combined with data frames, streaming combined with ad hoc SQL queries, etc. We will also discuss how we can combine other data ingestion tools like Kafka and Kinesis to build a scalable, distributed stream processing pipeline.