In many applications, including natural language processing, sensor networks, collaborative filtering, and federated learning, data are collected in batches, some potentially corrupt, biased, or even adversarial. Learning algorithms for this setting have therefore garnered considerable recent attention. We develop a general framework for robust learning from batches, and determine the least number of samples required for robust density estimation and classification over both discrete and continuous domains. Perhaps surprisingly, we show that robust learning can be achieved with essentially the same number of samples as required for genuine data. For the important problems of learning discrete and piecewise-polynomial densities, and of interval-based classification, we achieve these limits with polynomial-time algorithms.
Based on joint work with Ayush Jain.
Bio: Alon Orlitsky received B.Sc. degrees in Mathematics and Electrical Engineering from Ben Gurion University, and M.Sc. and Ph.D. degrees in Electrical Engineering from Stanford University. After a decade with the Communications Analysis Research Department of Bell Laboratories and a year as a quantitative analyst at D.E. Shaw and Company, he joined the University of California San Diego, where he is currently a professor of Electrical and Computer Engineering and of Computer Science and Engineering and holds the Qualcomm Chair for Information Theory and its Applications. His research concerns information theory, statistical modeling, and machine learning, focusing on fundamental limits and practical algorithms for extracting knowledge from data. Among other distinctions, Alon is a recipient of the 2021 Information Theory Society Shannon Award and a co-recipient of the 2017 ICML Best Paper Honorable Mention Award, the 2015 NeurIPS Paper Award, and the 2006 Information Theory Society Paper Award.