The Euclidean k-means problem is a classical problem that has been extensively studied in several communities. In this problem, we are given a set of n points in Euclidean space R^d , and the goal is to choose k center points in R^d so that the sum of squared distances of each point to its nearest center is minimized. The best approximation algorithms for this problem include a polynomial time constant factor approximation for general k and a (1+eps)-approximation which runs in time poly(n).exp(k/eps). At the other extreme, the only known computational complexity result for this problem is NP-hardness. The main difficulty in obtaining hardness results stems from the Euclidean nature of the problem, and the fact that any point in R^d can be a potential center. This gap in understanding left open the intriguing possibility that the problem might admit a PTAS for all k, d.

In this talk, I will present a new SDP relaxation for k-means. The study of integrality gaps for this relaxation led to the first hardness of approximation result for the Euclidean k-means problem. We show that there exists a constant delta > 0 such that it is NP-hard to approximate the k-means objective to within a factor of (1+delta). I will outline some of the ideas in the hardness of approximation proof, and discuss other research directions that came from the study of the k-means SDP.

Joint work with Pranjal Awasthi, Ravishankar Krishnaswamy and Ali Sinop.