Reducing the numerical precision of neural networks is one of the simplest, most effective and most common methods to improve resource efficiency (e.g. by reducing the memory and power requirements). Much research has been invested in finding how to quantize neural nets without significantly degrading performance. I will describe the main bottlenecks and solutions in various settings:
1) 1bit inference (1bit weights and activations), NIPS 2016 - link
2) 8bit training (8bit weights, activations, gradients, and batch-norm), NeurIPS 2018 - link
3) 4bit inference when quantization is done only post-training, Arxiv 2019 - link
4) Calculating the maximum trainable depth as a function of the numerical precision, Arxiv 2019 - link
Since October 2017, Daniel Soudry is an assistant professor (Taub Fellow) in the Department of Electrical Engineering at the Technion, working in the areas of machine learning and theoretical neuroscience. Before that, he did his post-doc (as a Gruss Lipper fellow) working with Prof. Liam Paninski in the Department of Statistics, and the Center for Theoretical Neuroscience at Columbia University. He did his Ph.D. in the Department of Electrical Engineering at the Technion, Israel Institute of Technology, under the guidance of Prof. Ron Meir. He received his B.Sc. degree in Electrical Engineering and Physics from the Technion.