Abstract
We present a method for reconstructing the global position of motion capture where position sensing is poor or unavailable. Capture systems, such as IMU suits, can provide excellent pose and orientation data of a capture subject, but otherwise need post processing to estimate global position. We propose a solution that trains a neural network to predict, in real-time, the height and body displacement given a short window of pose and orientation data. Our training dataset contains pre-recorded data with global positions from many different capture subjects, performing a wide variety of activities in order to broadly train a network to estimate on like and unseen activities. We compare training on two network architectures, a universal network (u-net) and a traditional convolutional neural network (CNN) - observing better error properties for the u-net in our results. We also evaluate our method for different classes of motion. We observe high quality results for motion examples with good representation in specialized datasets, while general performance appears better in a more broadly sampled dataset when input motions are far from training examples.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Global Position Prediction for Interactive Motion Capture
- Siddharth Biswal, Joshua Kulas, Haoqi Sun, Balaji Goparaju, M Brandon Westover, Matt T Bianchi, and Jimeng Sun. 2017. SLEEPNET: automated sleep staging system via deep learning. arXiv preprint arXiv:1707.08262 (2017).Google Scholar
- Jason PC Chiu and Eric Nichols. 2016. Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics 4 (2016), 357--370.Google Scholar
Cross Ref
- Juan Antonio Corrales, FA Candelas, and Fernando Torres. 2008. Hybrid tracking of human operators using IMU/UWB data fusion by a Kalman filter. In 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 193--200.Google Scholar
Digital Library
- James L Coyte, David Stirling, Montserrat Ros, Haiping Du, and Andrew Gray. 2013. Displacement profile estimation using low cost inertial motion sensors with applications to sporting and rehabilitation exercises. In 2013 IEEE/ASME International Conference on Advanced Intelligent Mechatronics. IEEE, 1290--1295.Google Scholar
Cross Ref
- Raphael Dumas and Janis Wojtusch. 2017. Estimation of the Body Segment Inertial Parameters for the Rigid Body Biomechanical Models Used in Motion Analysis. 1--31.Google Scholar
- Benedikt Fasel, Jörg Spörri, Julien Chardonnens, Josef Kröll, Erich Müller, and Kamiar Aminian. 2017. Joint inertial sensor orientation drift reduction for highly dynamic movements. IEEE journal of biomedical and health informatics 22, 1 (2017), 77--86.Google Scholar
- Marianne J Floor-Westerdijk, H Martin Schepers, Peter H Veltink, Edwin HF van Asseldonk, and Jaap H Buurke. 2012. Use of inertial sensors for ambulatory assessment of center-of-mass displacements during walking. IEEE transactions on biomedical engineering 59, 7 (2012), 2080--2084.Google Scholar
- Eric Foxlin. 2005. Pedestrian tracking with shoe-mounted inertial sensors. IEEE Computer graphics and applications 25, 6 (2005), 38--46.Google Scholar
Digital Library
- Katerina Fragkiadaki, Sergey Levine, Panna Felsen, and Jitendra Malik. 2015. Recurrent network models for human dynamics. In Proceedings of the IEEE International Conference on Computer Vision. 4346--4354.Google Scholar
Digital Library
- Partha Ghosh, Jie Song, Emre Aksan, and Otmar Hilliges. 2017. Learning human motion models for long-term predictions. In 2017 International Conference on 3D Vision (3DV). IEEE, 458--466.Google Scholar
Cross Ref
- Alex Graves, Navdeep Jaitly, and Abdel-rahman Mohamed. 2013. Hybrid speech recognition with deep bidirectional LSTM. In 2013 IEEE workshop on automatic speech recognition and understanding. IEEE, 273--278.Google Scholar
Cross Ref
- Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media.Google Scholar
- Daniel Holden, Taku Komura, and Jun Saito. 2017. Phase-functioned neural networks for character control. ACM Transactions on Graphics (TOG) 36, 4 (2017), 1--13.Google Scholar
Digital Library
- Daniel Holden, Jun Saito, and Taku Komura. 2016. A deep learning framework for character motion synthesis and editing. ACM Transactions on Graphics (TOG) 35, 4 (2016), 1--11.Google Scholar
Digital Library
- Daniel Holden, Jun Saito, Taku Komura, and Thomas Joyce. 2015. Learning motion manifolds with convolutional autoencoders. In SIGGRAPH Asia 2015 Technical Briefs. 1--4.Google Scholar
- Hojin Ju, Min Su Lee, So Young Park, Jin Woo Song, and Chan Gook Park. 2015. A pedestrian dead-reckoning system that considers the heel-strike and toe-off phases when using a foot-mounted IMU. Measurement Science and Technology 27, 1 (2015), 015702.Google Scholar
Cross Ref
- Manon Kok, Jeroen D. Hol, and Thomas B. Schön. 2017. Using Inertial Sensors for Position and Orientation Estimation. CoRR abs/1704.06053 (2017).Google Scholar
- Michael Lapinski, Eric Berkson, Thomas Gill, Mike Reinold, and Joseph A Paradiso. 2009. A distributed wearable, wireless sensor system for evaluating professional baseball pitchers and batters. In 2009 International Symposium on Wearable Computers. IEEE, 131--138.Google Scholar
Digital Library
- Jehee Lee, Jinxiang Chai, Paul SA Reitsma, Jessica K Hodgins, and Nancy S Pollard. 2002. Interactive control of avatars animated with human motion data. In Proceedings of the 29th annual conference on Computer graphics and interactive techniques. 491--500.Google Scholar
Digital Library
- Tong Li, Lei Wang, Qingguo Li, and Tao Liu. 2020. Lower-body walking motion estimation using only two shank-mounted inertial measurement units (IMUs). In 2020 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM). IEEE, 1143--1148.Google Scholar
Digital Library
- Julieta Martinez, Michael J Black, and Javier Romero. 2017. On human motion prediction using recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2891--2900.Google Scholar
Cross Ref
- Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017. Monocular 3d human pose estimation in the wild using improved cnn supervision. In 2017 international conference on 3D vision (3DV). IEEE, 506--516.Google Scholar
Cross Ref
- Alberto Menache. 2011. Understanding motion capture for computer animation. Elsevier.Google Scholar
- Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In International conference on machine learning. PMLR, 1310--1318.Google Scholar
- Dario Pavllo, Christoph Feichtenhofer, David Grangier, and Michael Auli. 2019. 3d human pose estimation in video with temporal convolutions and semi-supervised training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7753--7762.Google Scholar
Cross Ref
- Mathias Perslev, Sune Darkner, Lykke Kempfner, Miki Nikolic, Poul Jørgen Jennum, and Christian Igel. 2021. U-Sleep: resilient high-frequency sleep staging. npj Digital Medicine 4, 1 (2021), 1--12.Google Scholar
- Mathias Perslev, Michael Jensen, Sune Darkner, Poul Jø rgen Jennum, and Christian Igel. 2019. U-Time: A Fully Convolutional Network for Time Series Segmentation Applied to Sleep Staging. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc., 4415--4426.Google Scholar
- Daniel Roetenberg, Henk Luinge, and Per Slycke. 2009. Xsens MVN: Full 6DOF human motion tracking using miniature inertial sensors. Xsens Motion Technologies BV, Tech. Rep 1 (2009).Google Scholar
- Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.Google Scholar
Cross Ref
- Loren Arthur Schwarz, Diana Mateus, and Nassir Navab. 2012. Recognizing multiple human activities and tracking full-body pose in unconstrained environments. Pattern Recognition 45, 1 (2012), 11--23.Google Scholar
Digital Library
- Soshi Shimada, Vladislav Golyanik, Weipeng Xu, and Christian Theobalt. 2020. Physcap: Physically plausible monocular 3d motion capture in real time. ACM Transactions on Graphics (TOG) 39, 6 (2020), 1--16.Google Scholar
Digital Library
- Takaaki Shiratori, Hyun Soo Park, Leonid Sigal, Yaser Sheikh, and Jessica K. Hodgins. 2011. Motion Capture from Body-Mounted Cameras. ACM Trans. Graph. 30, 4, Article 31 (July 2011), 10 pages.Google Scholar
Digital Library
- Ronit Slyper and Jessica K Hodgins. 2008. Action capture with accelerometers. In Symposium on Computer Animation. 193--199.Google Scholar
- Sebastian Starke, Yiwei Zhao, Taku Komura, and Kazi Zaman. 2020. Local motion phases for learning multi-contact character movements. ACM Transactions on Graphics (TOG) 39, 4 (2020), 54--1.Google Scholar
Digital Library
- Young Soo Suh. 2014. Inertial sensor-based smoother for gait analysis. Sensors 14, 12 (2014), 24338--24357.Google Scholar
Cross Ref
- Dominic Thewlis, Chris Bishop, Nathan Daniell, and Gunther Paul. 2013. Next-generation low-cost motion capture systems can provide comparable spatial accuracy to high-end systems. Journal of applied biomechanics 29, 1 (2013), 112--117.Google Scholar
Cross Ref
- Márton Véges and András Lőrincz. 2019. Absolute human pose estimation with depth prediction network. In 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--7.Google Scholar
Cross Ref
- Daniel Vlasic, Rolf Adelsberger, Giovanni Vannucci, John Barnwell, Markus Gross, Wojciech Matusik, and Jovan Popović. 2007. Practical motion capture in everyday surroundings. ACM transactions on graphics (TOG) 26, 3 (2007), 35-es.Google Scholar
Digital Library
- He Wang, Edmond SL Ho, Hubert PH Shum, and Zhanxing Zhu. 2019. Spatio-temporal manifold learning for human motions via long-horizon modeling. IEEE transactions on visualization and computer graphics 27, 1 (2019), 216--227.Google Scholar
- Prabancoro Adhi Catur Widagdo, Hsin-Huang Lee, and Chung-Hsien Kuo. 2017. Limb motion tracking with inertial measurement units. In 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 582--587.Google Scholar
Cross Ref
- Xuesu Xiao and Shuayb Zarar. 2018. Machine learning for placement-insensitive inertial motion capture. In 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 6716--6721.Google Scholar
Cross Ref
- Qilong Yuan, I-Ming Chen, and Shang Ping Lee. 2011. SLAC: 3D localization of human based on kinetic human movement capture. In 2011 IEEE International Conference on Robotics and Automation. IEEE, 848--853.Google Scholar
Cross Ref
- He Zhang, Sebastian Starke, Taku Komura, and Jun Saito. 2018. Mode-adaptive neural networks for quadruped motion control. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1--11.Google Scholar
Digital Library
- Yang Zheng, Ka-Chun Chan, and Charlie CL Wang. 2014. Pedalvatar: An IMU-based real-time body motion capture system using foot rooted kinematic model. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 4130--4135.Google Scholar
Cross Ref
- Xiaowei Zhou, Menglong Zhu, Georgios Pavlakos, Spyridon Leonardos, Konstantinos G Derpanis, and Kostas Daniilidis. 2018. Monocap: Monocular human motion capture using a cnn coupled with a geometric prior. IEEE transactions on pattern analysis and machine intelligence 41, 4 (2018), 901--914.Google Scholar
- Yi Zhou, Jingwan Lu, Connelly Barnes, Jimei Yang, Sitao Xiang, et al. 2020. Generative tweening: Long-term inbetweening of 3d human motions. arXiv preprint arXiv:2005.08891 (2020).Google Scholar
Index Terms
Global Position Prediction for Interactive Motion Capture
Recommendations
Robust solving of optical motion capture data by denoising
Raw optical motion capture data often includes errors such as occluded markers, mislabeled markers, and high frequency noise or jitter. Typically these errors must be fixed by hand - an extremely time-consuming and tedious task. Due to this, there is a ...
Validation of a motion capture suit for clinical gait analysis
PervasiveHealth '17: Proceedings of the 11th EAI International Conference on Pervasive Computing Technologies for HealthcareGait analysis is often supported by technology. Due to limitations in optical systems, such as limited measurement volumes and the requirement of a laboratory environment, low-cost inertial measurement unit (IMU) based motion capture systems might be ...
Using motion capture for interactive motion editing
VRCAI '14: Proceedings of the 13th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in IndustryMotion capture technology has been widely used for creating character motions. Motion editing is usually also required to adjust captured motions. Because character poses which include joint rotations, body positions, and orientations are high-...






Comments