3D Computer Vision (3D Vision) techniques have been the key solutions to various scene perception problems such as depth from image(s), camera/object pose estimation, localization and 3D reconstruction of a scene. These solutions are the major part of many AI applications including AR/VR, autonomous driving and robotics. In this talk, I will first review several categories of 3D Vision problems and their challenges. Given the category of static scene perception, I will introduce several learning-based depth estimation methods such as PlaneRCNN, Neural RGBD, and camera pose estimation methods including MapNet as well as few registration algorithms deployed in NVIDIA's products. I will then introduce more challenging real world scenarios where scenes contain non-stationary rigid changes, non-rigid motions, or varying appearance due to the reflectance and lighting changes, which can cause scene reconstruction to fail due to the view dependent properties. I will discuss several solutions to these problems and conclude by summarizing the future directions for 3D Vision research that are being conducted by NVIDIA's learning and perception research (LPR) team.
Kihwan Kim is a senior research scientist in learning and perception research group at NVIDIA Research. He received Ph.D degree in Computer Science from Georgia Institute of Technology in 2011, and BS from Yonsei University in 2001. Prior to join Georgia Tech, he spent five years as an R&D engineer at Samsung and also worked for Disney Research Pittsburgh as a visiting research associate. His research interests span the areas of computer vision, graphics, machine learning and multimedia. A common thread in his research is in understanding scenes from image(s), and estimating the motion and structure of geometric information extracted from the scene. He led NVIDIA's SLAM project (NVSLAM) and currently leads various 3D Vision projects in NVIDIA. More information: https://www.kihwan23.com