Mingyu Gao (PhD '18) and co-authors received the acknowledgement at ISCA 2016. Their paper is titled, "DRAF: A Low-Power DRAM-Based Reconfigurable Acceleration Fabric".
IEEE Micro will include a complete list of 2016's significant papers in its annual publication, "Micro's Top Picks from the Computer Architecture Conferences" in its May / June 2017 issue. The issue collects some of the year's most significant research papers in computer architecture based on novelty and potential for long-term impact. Any computer architecture paper (not a combination of papers) published in the top conferences of 2016 (including MICRO-49) is eligible. The Top Picks committee will recognize those significant and insightful papers that have the potential to influence the work of computer architects for years to come.
FPGAs are a popular target for application-specific accelerators because they lead to a good balance between flexibility and energy efficiency. However, FPGA lookup tables introduce significant area and power overheads, making it difficult to use FPGA devices in environments with tight cost and power constraints. This is the case for datacenter servers, where a modestly-sized FPGA cannot accommodate the large number of diverse accelerators that datacenter applications need.
This paper introduces DRAF, an architecture for bit-level reconfigurable logic that uses DRAM subarrays to implement dense lookup tables. DRAF overlaps DRAM operations like bitline precharge and charge restoration with routing within the reconfigurable routing fabric to minimize the impact of DRAM latency. It also supports multiple configuration contexts that can be used to quickly switch between different accelerators with minimal latency. Overall, DRAF trades off some of the performance of FPGAs for significant gains in area and power. DRAF improves area density by 10x over FPGAs and power consumption by more than 3x, enabling DRAF to satisfy demanding applications within strict power and cost constraints. While accelerators mapped to DRAF are 2-3x slower than those in FPGAs, they still deliver a 13x speedup and an 11x reduction in power consumption over a Xeon core for a wide range of datacenter tasks, including analytics and interactive services like speech recognition.
Congratulations to Mingyu and co-authors. His research advisor is Christos Kozyrakis