The RC64 many-core architecture combines many small cores, many shared memory banks, a hardware scheduler, and two custom active networks-on-chip: cores-to-memories and cores-to-scheduler. A shared-memory, de-synchronized PRAM-like task-based and non-locking programming model promotes simplicity and ease of programming. A theoretical model (almost) justifies increasing the number of cores while making them smaller and slower, maximizing performance-to-power ratio. Several benchmark simulations are demonstrated, showing close to linear speedup and high performance-to-power ratio in signal processing, linear algebra and machine learning applications. A software ecosystem for RC64 is also discussed.
Prof. Ran Ginosar received BSc from the Technion and PhD from Princeton University. He has conducted research at Bell Laboratories, the University of Utah and Intel Research Laboratories in Oregon, USA. He has co-founded several start-up companies in the area of VLSI and parallel processing. His research interests focus on VLSI, asynchronous logic and parallel processing architectures.