A broad range of applications in visual eﬀects, computer animation, autonomous driving, and man-machine interaction heavily depend on robust and fast algorithms to obtain high-quality reconstructions of our physical world in terms of geometry, motion, reflectance, and illumination. Especially, with the increasing popularity of virtual, augmented and mixed reality devices, there comes a rising demand for real-time and low-latency solutions.
This talk covers data-parallel optimization and state-of-the-art machine learning techniques to tackle the underlying 3D and 4D reconstruction problems based on novel mathematical models and fast algorithms. The particular focus of this talk is on self-supervised face reconstruction from a collection of unlabeled in-the-wild images. The proposed approach can be trained end-to-end without dense annotations by fusing a convolutional encoder with a differentiable expert-designed renderer and a self-supervised training loss.
The resulting reconstructions are the foundation for advanced video editing effects, such as photo-realistic re-animation of portrait videos. The core of the proposed approach is a generative rendering-to-video translation network that takes computer graphics renderings as input and generates photo-realistic modified target videos that mimic the source content. With the ability to freely control the underlying parametric face model, we are able to demonstrate a large variety of video rewrite applications. For instance, we can reenact the full head using interactive user-controlled editing and realize high-fidelity visual dubbing.