Modern large-scale statistical models require to estimate thousands to millions of parameters. Understanding the tradeoff between statistical optimality and computational tractability in such models remains an outstanding challenge. Under a random design assumption, we establish lower bounds to statistical estimation with two popular classes of tractable estimators in several popular statistical models. First, in high-dimensional linear models we show that a large gap often exists between the optimal statistical error and that achieved by least squares with a convex penalty. Examples of such estimators include the Lasso, ridge regression, and MAP estimation with log-concave priors and Gaussian noise. Second, in generalized linear models and low-rank matrix estimation problems, we introduce the class of 'general first order methods,' examples of which include gradient descent, projected gradient descent, and their accelerated versions. We derive lower bounds for general first order methods which are tight up to asymptotically negligible terms. Our results demonstrate a gap to statistical optimality for general first order methods in both sparse phase retrieval and sparse PCA.
This is joint work with Andrea Montanari and Yuchen Wu.