EE380 Computer Systems Colloquium

Topic: 
Flash Reliability in Production: The Expected and the Unexpected
Wednesday, February 24, 2016 - 4:30pm to 5:30pm
Venue: 
Gates B03
Speaker: 
Bianca Schroeder (University of Toronto)
Abstract / Description: 

As solid state drives based on flash technology are becoming a staple for persistent data storage in data centers, it is important to understand their reliability characteristics. While there is a large body of work based on experiments with individual flash chips in a controlled lab environment under synthetic workloads, there is a dearth of information on their behavior in the field.

This talk presents a large-scale field study covering many millions of drive days, ten different drive models, different flash technologies (MLC, eMLC, SLC) over 6 years of use in a data centre production environment. We study a wide range of reliability characteristics and come to a number of unexpected conclusions. For example, raw bit error rates (RBER) grow at a much slower rate with wear-out than the exponential rate commonly assumed and more importantly they are not predictive of uncorrectable errors or other error modes. The widely used metric UBER (uncorrectable bit error rate) is not a meaningful metric, since we see no correlation between the number of reads and the number of uncorrectable errors. We see no evidence that higher-end SLC drives are more reliable than MLC drives. Comparing with traditional hard disk drives, flash drives have a significantly lower replacement rate in the field, however, they have a higher rate of uncorrectable errors.


 

The Stanford EE Computer Systems Colloquium (EE380) meets on Wednesdays 4:30-5:45 throughout the academic year. Talks are given before a live audience in Room B03 in the basement of the Gates Computer Science Building on the Stanford Campus. The live talks (and the videos hosted at Stanford and on YouTube) are open to the public.

Bio:

Bianca is an associate professor and Canada Research Chair in the Computer Science Department at the University of Toronto and a member of the computer systems and networks group . Before joining UofT, she spent 2 years as a post-doc at Carnegie Mellon University working with Garth Gibson. She received her doctorate from the Computer Science Department at Carnegie Mellon University under the direction of Mor Harchol-Balter. She is an Alfred P. Sloan Research Fellow, the recipient of the Outstanding Young Canadian Computer Science Prize of the Canadian Association for Computer Science, an Ontario Early Researcher Award, an NSERC Accelerator Award, a two-time winner of the IBM PhD fellowship and her work has won four best paper awards and one best presentation award. She has served on numerous program committees and has co-chaired the TPCs of Usenix FAST'14, ACM Sigmetrics'14 and IEEE NAS'11. She is also an associate editor for IEEE TDSC and a member of the steering committee of Usenix FAST. Her work on hard drive reliability and her work on DRAM reliability have been featured in articles at a number of news sites, including Computerworld, Wired, Slashdot, PCWorld, StorageMojo and eWEEK.