Robust statistics traditionally focuses on outliers, or perturbations in total variation distance. However, a dataset could be corrupted in many other ways, such as systematic measurement errors and missing covariates. We generalize the robust statistics approach to consider perturbations under any Wasserstein distance, and show that robust estimation is possible whenever a distribution's population statistics are robust under a certain family of friendly perturbations. This generalizes a property called resilience previously employed in the special case of mean estimation with outliers. We justify the generalized resilience property by showing that it holds under moment or hypercontractive conditions. Even in the total variation case, these subsume conditions in the literature for mean estimation, regression, and covariance estimation; the resulting analysis simplifies and sometimes improves these known results in both population limit and finite-sample rate. Our robust estimators are based on minimum distance (MD) functionals (Donoho and Liu, 1988), which project onto a set of distributions under a discrepancy related to the perturbation. We present two approaches for designing MD estimators with good finite- sample rates: weakening the discrepancy and expanding the set of distributions. We also present connections to Gao et al. (2019)'s recent analysis of generative adversarial networks for robust estimation.
Joint work with Banghua Zhu and Jacob Steinhardt
Jiantao Jiao is an Assistant Professor in the Department of Electrical Engineering and Computer Sciences and Department of Statistics at University of California, Berkeley. He received his B.Eng. degree in Electronic Engineering from Tsinghua University, Beijing, China in 2012, and his M.Sc. and Ph.D. degrees in Electrical Engineering from Stanford University in 2014 and 2018, respectively. He is a recipient of the Presidential Award of Tsinghua University and the Stanford Graduate Fellowship. He was a semi-plenary speaker at ISIT 2015 and a co-recipient of the ISITA 2016 Student Paper Award and MobiHoc 2019 best paper award. His research interests are in statistical machine learning, high-dimensional and nonparametric statistics, mathematical programming, applied probability, information theory, and their applications.