Unsupervised learning with generative models has the potential of discovering rich representations of 3D scenes. Such Neural Scene Representations may subsequently support a wide variety of downstream tasks, ranging from robotics to computer graphics to medical imaging. However, existing methods ignore one of the most fundamental properties of scenes: their three-dimensional structure. In this talk, I will make the case for equipping Neural Scene Representations with an inductive bias for 3D structure, enabling self-supervised discovery of shape and appearance from few observations. By embedding an implicit scene representation in a neural rendering framework and learning a prior over these representations, I will show how we can enable 3D reconstruction from only a single posed 2D image. I will show how the features we learn in this process are already useful to the downstream task of semantic segmentation. I will then show how gradient-based meta-learning can enable fast inference of implicit representations.