Image 2 (IMAGE) Singapore University of Technology and Design Caption The structure of the three-stage encoder for 3D body key-points detection. The encoder consists of a human bounding box detection model, a top-down human pose estimation model, and a 2D-3D key-points lifting model. The encoder takes a video as input and outputs 3D pose key-points. Credit SUTD Usage Restrictions Image should be used with appropriate caption and credit. License Original content Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.