Mean estimation in Mahalanobis distance is a fundamental problem in statistics: given i.i.d. samples from a high-dimensional distribution with unknown mean and covariance, the goal is to find an estimator with small Mahalanobis distance from the true mean. To protect the privacy of the individuals who participate in the dataset, we study statistical estimators which satisfy differential privacy, a condition that has become a standard criterion for individual privacy in statistics and machine learning.
We present two differentially private mean estimators for multivariate (sub)Gaussian distributions with unknown covariance. All previous estimators with the same accuracy guarantee in Mahalanobis loss either require strong a priori bounds on the covariance matrix or require that the number of samples grows superlinearly with the dimension of the data, which is suboptimal. Our algorithms achieve nearly optimal sample complexity (matching that of the known-covariance case) by adapting the noise added due to privacy to the distribution's covariance matrix, without explicitly estimating it.
Joint work with Gavin Brown, Marco Gaboardi, Adam Smith, Jonathan Ullman.
Bio: Lydia Zakynthinou is a PhD student in the Khoury College of Computer Sciences at Northeastern University. She is interested in the theoretical foundations of machine learning and data privacy and their connections to statistics and information theory. She earned her ECE diploma from the National Technical University of Athens in 2015 and her MSc on Logic, Algorithms, and Theory of Computation from the University of Athens in 2017. Since Fall 2020, her research has been supported by a Facebook Fellowship.