With the ending of Moore's Law, many computer architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. The Tensor Processing Unit (TPU), deployed in Google datacenters since 2015, is a custom chip that accelerates deep neural networks (DNNs). We compare the TPU to contemporary server-class CPUs and GPUs deployed in the same datacenters. Our benchmark workload, written using the high-level TensorFlow framework, uses production DNN applications that represent 95% of our datacenters' DNN demand. The TPU is on average about 15X–30X faster than its contemporary GPU or CPU, with Performance/Watt 30X–80X higher.
After 40 years as a UC Berkeley professor, David Patterson retired in 2016 and joined Google as a distinguished engineer. He has been Chair of Berkeley's CS Division, Chair of the Computing Research Association, and President of the Association for Computing Machinery. His most successful research projects have been Reduced Instruction Set Computers (RISC), Redundant Arrays of Inexpensive Disks (RAID), and Network of Workstations. All helped lead to multibillion-dollar industries. This research led to many papers, six books, and about 40 honors, including election to the National Academy of Engineering, the National Academy of Sciences, the Silicon Valley Engineering Hall of Fame, and Fellow of the Computer History Museum. He shared the IEEE von Neumann Medal and the NEC C&C Prize with John Hennessy, past president of Stanford University and co-author of two of his books.