Technology scaling of DRAM cells has enabled higher capacity memory for the last few decades. Unfortunately, DRAM cells become vulnerable to failure as they scale down to a smaller size. Enabling high-performance, energy-efficient, scalable memory systems without sacrificing the reliability is a major research challenge. My work focuses on designing a scalable memory system by rethinking the traditional assumptions in abstraction and separation of responsibilities across system layers.
In this talk, I will discuss three fundamental ways to enable memory scaling. First, we can enable scaling by letting the manufacturers build smaller cells without providing any strict reliability guarantee. I envision manufacturers shipping DRAMs without fully ensuring correct operation, and the system being responsible for detecting and mitigating DRAM failures while operating in the field. However, designing such a system is difficult due to intermittent DRAM failures. In this talk, I will discuss a system design, capable of providing reliability guarantees even in the presence of intermittent failures. Second, tolerating failures in the application can improve DRAM scalability. The fundamental challenge of such a system is how to assure, verify, and quantify the quality of the results. I envision a system that limits the impact of memory failures such that it is possible to statically determine the worst-case results from the maximum possible error in the input. Third, we can enable high-capacity memory leveraging the emerging non-volatile memory technologies that are predicted to be more scalable. I will present my vision to redefine the hardware and operating system interface to unify memory and storage system with non-volatile memory and discuss the opportunities and challenges of such a system.