A large range of problems in statistics, machine learning and signal processing can be cast as fitting a structured low-rank matrix to noisy data. A popular approach to the resulting rank-constrained optimization problem is convex SDP relaxation, which runs in polynomial time in principle and enjoys desirable statistical guarantees. For large-scale problems, however, the (super-)quadratic time complexity of SDPs is often too high. An attractive alternative is to run projected gradient descent directly over the space of low-rank matrices. This approach is highly scalable, but its convergence and statistical accuracy were unclear due to non-convexity. Here we develop a unified framework characterizing the convergence of this non-convex method as well as the statistical properties of the resulting solutions. Our results provide insights into why we should expect non-convex methods to work in general, and yield global guarantees for a large number of concrete problems, including matrix completion with real and one-bit observations, matrix decomposition, robust and sparse PCA and graph clustering. Our framework is powerful enough to accommodate nonlinear measurements, matrices with arbitrary rank, and the noisy, constrained setting with additional structures in the noise and target matrices, and moreover does not require resampling.