Erasure coding for big-data systems: Theory and Practice
Friday, October 7, 2016 - 1:15pm to 2:15pm
Packard 202
Rashmi K. Vinayak (Berkeley)
Abstract / Description: 

Erasure codes are being increasingly deployed as an alternative to data replication in large-scale distributed storage systems to achieve fault tolerance in a storage-efficient manner. This paradigm shift has opened up exciting challenges and opportunities both on the theoretical as well as the system-design fronts. Specifically, while traditional codes are optimal in utilizing storage space, they significantly increase the usage of other important cluster resources such as network and device I/O. Furthermore, the usage of codes has primarily been limited to achieving space-efficient fault tolerance for storing "cold'' (less-frequently accessed) data, beyond which the potential of codes in big-data systems is largely unexplored.

We present new code constructions, and design and build erasure-coded storage systems, which provably reduce the usage of network and device I/O by a significant amount while not compromising on storage efficiency. Our codes have been evaluated on Facebook's data warehouse cluster in production, and will be a part of the next release of Apache Hadoop. Furthermore, we explore new avenues for coding in big-data systems, in particular for "hot'' (frequently accessed) data, by showing how codes can be employed to achieve significant benefits in load balancing and reducing latency in data-intensive cluster caches.


The Information Theory Forum (IT-Forum) at Stanford ISL is an interdisciplinary academic forum which focuses on mathematical aspects of information processing. With a primary emphasis on information theory, we also welcome researchers from signal processing, learning and statistical inference, control and optimization to deliver talks at our forum. We also warmly welcome industrial affiliates in the above fields. The forum is typically held in Packard 202 every Friday at 1:00 pm during the academic year.

The Information Theory Forum is organized by graduate students Jiantao Jiao and Yanjun Han. To suggest speakers, please contact any of the students.


Rashmi K. Vinayak is a postdoctoral researcher at UC Berkeley working with Prof. Ion Stoica (AMPLab) and Prof. Kannan Ramchandran (BLISS). She received her PhD from UC Berkeley in September 2016. She is a recipient of the the Eli Jury Award 2016 from EECS, UC Berkeley for outstanding achievement in the area of systems, communications, control, or signal processing, the Google Anita Borg Memorial Scholarship 2015-16, the Microsoft Research PhD Fellowship 2013-15, the Facebook Fellowship 2012-13, and the IEEE Data Storage Best Paper and Best Student Paper Awards for 2011 and 2012. Her research interests lie in the theoretical and system challenges that arise in storage and analysis of big data, with a current focus on erasure coding for big-data systems.