High-quality 3D object recognition is an important component of many vision and robotics systems. We tackle the object recognition problem using two data representations: Volumetric representation, where the 3D object is discretized spatially as binary voxels - 1 if the voxel is occupied and 0 otherwise. Pixel representation where the 3D object is represented as a set of projected 2D pixel images. At the time of submission, we obtained leading results on the Princeton ModelNet challenge. Some of the best deep learning architectures for classifying 3D CAD models use Convolutional Neural Networks (CNNs) on pixel representation, as seen on the ModelNet leaderboard. Diverging from this trend, we combine both the above representations and exploit them to learn new features. This approach yields a significantly better classifier than using either of the representations in isolation. To do this, we introduce new Volumetric CNN (V-CNN) architectures.
Reza Zadeh is CEO at Matroid and Adjunct Professor at Stanford University. His work focuses on machine learning, distributed computing, and discrete applied mathematics. He has served on the Technical Advisory Board of Microsoft and Databricks.