Towards accurate 3D human body reconstruction from silhouettes
We propose a novel computer vision system for reconstructing 3D body shapes from 2D images with the goal of producing highly accurate anthropomorphic measurements from a pair of images. We adopt a supervised learning approach that maps silhouette images to 3D body shapes via a convolutional neural network (CNN). We propose three key improvements over previous approaches: (1) Large-scale realistic synthetic data generation, including more realistic variations in segmentation noise and camera viewpoints. (2) A multi-task learning (MTL) approach to predicting multiple outputs such as shape, 3D joint locations, pose angles, and body volume. (3) A new network architecture that additionally takes known body measurements (e.g., height) and per-pixel segmentation confidence as input. Ablation studies show the improvement in accuracy due to the various components of our system. Results demonstrate that our system produces state-of-the-art results on body circumference errors. We also analyze the repeatability of our system in the presence of realistic camera, background, and pose variations. Our system achieves a vertex standard deviation of ~3mm on the  CAESAR dataset.