AR on handheld, monocular, "through-the-camera" platforms such as mobile phones is a challenging task. While traditional, geometry based approaches provide useful data in certain scenarios, for truly immersive experiences we need to leverage the prior knowledge encapsulated in learned CNNs. In this talk I will discuss the capabilities and limitations of such traditional methods, the need for CNN-based solutions, and the challenges to training accurate and efficient CNNs on this task. I will describe our recent work on implicit, 3D representations for AR, with applications in novel view synthesis, scene reconstruction and arbitrary object manipulation. Finally, I will present a project opportunity, to learn such representations from a dataset of single images.