Ana Klimovic & Team Win Award For Work on Next-Generation Memories

Ana Klimovic, EE PhD '19
April 2018

Congratulations to Ana Klimovic (PhD candidate '19), Professor Christos Kozyrakis, and postdoc Heiner Litz. They won the 2018 Memorable Paper Award for System Architecture and Applications at the 9th Annual Non-Volatile Memories Workshop (NVMW) hosted by the University of California, San Diego. Their paper, "ReFlex: Remote Flash == Local Flash" was one of six finalists for the award selected from over 80 papers submitted to the workshop.

About The Memorable Paper Award

The Memorable Paper Award recognizes the best recent research on non-volatile memories published throughout the world. It is given annually to outstanding research published in the last two years that is expected to have substantial impact on the study of non-volatile memories. To be eligible, the paper must have been published in peer-reviewed venue in the last two years and the lead researcher must have been a student at the time.

About the Non-Volatile Memories Workshop

The Non-Volatile Memories Workshop is the world's premier venue for research into how to use non-volatile memory technology to improve the performance, reliability, and efficiency of computing systems. It was founded in 2010 by Dr. Paul Siegel and Dr. Steven Swanson of the University of California, San Diego's Jacob School of engineering. The workshop is a co-production of the Center for Magnetic Recording Research and the Non-Volatile Systems Laboratory at UC San Diego. More information, including a detailed program, is available at nvmw.ucsd.edu.

Please join us in congratulating Ana, Christos, and Heiner on their award! 


Award winner Ana Kilmovic (center) with general chairs of NVMW'18 Professor Steven Swanson (left) and Professor Paul Siegel (right), both of UCSD.

Paper Summary:

Internet companies such as Facebook and Google host trillions of messages, photos, and videos for their users. Hence, they need storage systems that are massive in scale, fast to access, and cost effective. Scale is achieved by hosting internet services in datacenters with thousands of machines, each contributing its local storage to the global data pool. Speed is achieved by selectively replacing slow hard disks in machines with Flash storage devices that can serve data accesses with 100x lower latency and 10,000x higher throughput.

However, Flash makes it difficult to build a cost-effective storage system. Flash devices are typically underutilized in terms of capacity and throughput due to the imbalance in the compute and storage requirements of the internet services running on each machine. In the past, datacenter operators dealt with the same challenge for disks by allowing services running on each machine to allocate storage over the network on any disk with spare capacity and bandwidth in the datacenter. Remote (over the network) access to disks enables utilizing all available capacity and throughput. Past efforts to implement similar remote access systems for Flash devices have run into significant challenges. Network protocol processing at the throughput of Flash devices requires a large number of processor cores and adds overheads that cancel out the latency advantages of using Flash. Moreover, when two remote machines access the same Flash device, interference between the two access streams can lead to unpredictable performance degradation.

To address these challenges, researchers Ana Klimovic, Heiner Litz and Christos Kozyrakis developed a software system called ReFlex. ReFlex enables high performance access to remote Flash storage with minimal compute resources and provides predictable performance for multiple services sharing a Flash device over the network. Using a single processing core, the system can process up to 850,000 requests per second which is 11x more than a traditional Linux network storage system. ReFlex makes remote Flash look like local Flash to applications, making it easy for a service running on a particular machine to use spare Flash capacity and bandwidth on other machines in the datacenter. To provide predictable performance when multiple remote machines access the same Flash device, ReFlex uses a novel scheduler to process incoming requests in an interference-aware manner.

ReFlex is having an increasing impact in industry and, in collaboration with IBM Research, has been integrated into the Apache Crail distributed storage system. This integration allows popular data analytics frameworks to leverage ReFlex to improve their resource efficiency while maintaining high, predictable performance. ReFlex is also being ported to a system on chip (SoC) platform by Broadcom Limited. ReFlex is open-source software and available at: https://www.github.com/stanford-mast/reflex.

 

Excerpted from the full NVMW'18 press release.