Abstract
In a conventional optical motion capture (MoCap) workflow, two processes are needed to turn captured raw marker sequences into correct skeletal animation sequences. Firstly, various tracking errors present in the markers must be fixed (cleaning or refining). Secondly, an agent skeletal mesh must be prepared for the actor/actress, and used to determine skeleton information from the markers (re-targeting or solving). The whole process, normally referred to as solving MoCap data, is extremely time-consuming, labor-intensive, and usually the most costly part of animation production. Hence, there is a great demand for automated tools in industry. In this work, we present MoCap-Solver, a production-ready neural solver for optical MoCap data. It can directly produce skeleton sequences and clean marker sequences from raw MoCap markers, without any tedious manual operations. To achieve this goal, our key idea is to make use of neural encoders concerning three key intrinsic components: the template skeleton, marker configuration and motion, and to learn to predict these latent vectors from imperfect marker sequences containing noise and errors. By decoding these components from latent vectors, sequences of clean markers and skeletons can be directly recovered. Moreover, we also provide a novel normalization strategy based on learning a pose-dependent marker reliability function, which greatly improves system robustness. Experimental results demonstrate that our algorithm consistently outperforms the state-of-the-art on both synthetic and real-world datasets.
Supplemental Material
- Kfir Aberman, Peizhuo Li, Dani Lischinski, Olga Sorkine-Hornung, Daniel Cohen-Or, and Baoquan Chen. 2020. Skeleton-aware networks for deep motion retargeting. ACM Trans. Graph. 39, 4 (2020), 62.Google Scholar
Digital Library
- Ijaz Akhter, Tomas Simon, Sohaib Khan, Iain A. Matthews, and Yaser Sheikh. 2012. Bilinear spatiotemporal basis models. ACM Trans. Graph. 31, 2 (2012), 17:1--17:12.Google Scholar
Digital Library
- Andreas Aristidou, Daniel Cohen-Or, Jessica K. Hodgins, and Ariel Shamir. 2018. Self-similarity analysis for motion capture cleaning. CGF 37, 2 (2018), 297--309.Google Scholar
Cross Ref
- Andreas Aristidou and Joan Lasenby. 2013. Real-time marker prediction and CoR estimation in optical motion capture. Vis. Comput. 29, 1 (2013), 7--26.Google Scholar
Cross Ref
- Jan Baumann, Björn Krüger, Arno Zinke, and Andreas Weber. 2011. Data-driven completion of motion capture data. In Proc. of VRIPHYS. 111--118.Google Scholar
- Michael Burke and Joan Lasenby. 2016. Estimating missing marker positions using low dimensional Kalman smoothing. J. Biomechanics 49, 9 (2016), 1854--1858.Google Scholar
Cross Ref
- Jinxiang Chai and Jessica K. Hodgins. 2005. Performance animation from low-dimensional control signals. ACM Trans. Graph. 24, 3 (2005), 686--696.Google Scholar
Digital Library
- CMU. 2000. CMU graphics lab motion capture database. http://mocap.cs.cmu.edu (2000).Google Scholar
- Klaus Dorfmüller-Ulhaas. 2007. Robust optical user motion tracking using a kalman filter. (2007).Google Scholar
- Yinfu Feng, Mingming Ji, Jun Xiao, Xiaosong Yang, Jian J. Zhang, Yueting Zhuang, and Xuelong Li. 2015. Mining spatial-temporal patterns and structural sparsity for human motion data denoising. IEEE Trans. Cyber. 45, 12 (2015), 2693--2706.Google Scholar
Cross Ref
- Yinfu Feng, Jun Xiao, Yueting Zhuang, Xiaosong Yang, Jian J. Zhang, and Rong Song. 2014. Exploiting temporal stability and low-rank structure for motion capture data refinement. Inf. Sci. 277 (2014), 777--793.Google Scholar
Cross Ref
- Katerina Fragkiadaki, Sergey Levine, Panna Felsen, and Jitendra Malik. 2015. Recurrent network models for human dynamics. In Proc. of ICCV. 4346--4354.Google Scholar
Cross Ref
- Félix G. Harvey, Mike Yurick, Derek Nowrouzezahrai, and Christopher J. Pal. 2020. Robust motion in-betweening. ACM Trans. Graph. 39, 4 (2020), 60.Google Scholar
Digital Library
- Gustav Eje Henter, Simon Alexanderson, and Jonas Beskow. 2020. MoGlow: probabilistic and controllable motion synthesis using normalising flows. ACM Trans. Graph. 39, 6 (2020), 236:1--236:14.Google Scholar
Digital Library
- Lorna Herda, Pascal Fua, Ralf Plänkers, Ronan Boulic, and Daniel Thalmann. 2000. Skeleton-based motion capture for robust reconstruction of human motion. In Proc. of CA. IEEE, 77.Google Scholar
Cross Ref
- Daniel Holden. 2018. Robust solving of optical motion capture data by denoising. ACM Trans. Graph. 37, 4 (2018), 165:1--165:12.Google Scholar
Digital Library
- Daniel Holden, Jun Saito, and Taku Komura. 2016. A deep learning framework for character motion synthesis and editing. ACM Trans. Graph. 35, 4 (2016), 138:1--138:11.Google Scholar
Digital Library
- Daniel Holden, Jun Saito, Taku Komura, and Thomas Joyce. 2015. Learning motion manifolds with convolutional autoencoders. In SIGGRAPH Asia Technical Briefs. 18:1--18:4.Google Scholar
- Alexander Hornung, Sandip Sar-Dessai, and Leif Kobbelt. 2005. Self-calibrating optical motion tracking for articulated bodies. In Proc. of VR. 75--82.Google Scholar
- Manuel Kaufmann, Emre Aksan, Jie Song, Fabrizio Pece, Remo Ziegler, and Otmar Hilliges. 2020. Convolutional autoencoders for human motion infilling. CoRR (2020).Google Scholar
- Adam G. Kirk, James F. O'Brien, and David A. Forsyth. 2005. Skeletal parameter estimation from optical motion capture data. In Proc. of CVPR. 782--788.Google Scholar
- Ranch Y. Q. Lai, Pong C. Yuen, and Kelvin K. W. Lee. 2011. Motion capture data completion and denoising by singular value thresholding. In EG Short Papers. 45--48.Google Scholar
- Kyungho Lee, Seyoung Lee, and Jehee Lee. 2018. Interactive character animation by learning multi-objective control. ACM Trans. Graph. 37, 6 (2018), 180:1--180:10.Google Scholar
Digital Library
- Lei Li, James McCann, Nancy S. Pollard, and Christos Faloutsos. 2010. BoLeRO: A principled technique for including bone length constraints in motion capture occlusion filling. In Proc. of SCA. 179--188.Google Scholar
- Shujie Li, Yang Zhou, Haisheng Zhu, Wenjun Xie, Yang Zhao, and Xiaoping Liu. 2019. Bidirectional recurrent autoencoder for 3D skeleton motion data refinement. Comput. Graph. 81 (2019), 92--103.Google Scholar
Digital Library
- Shu-Jie Li, Hai-Sheng Zhu, Liping Zheng, and Lin Li. 2020. A perceptual-based noise-agnostic 3D skeleton motion data refinement network. IEEE Access 8 (2020), 52927--52940.Google Scholar
Cross Ref
- Guodong Liu and Leonard McMillan. 2006. Estimation of missing markers in human motion capture. Vis. Comput. 22, 9-11 (2006), 721--728.Google Scholar
Digital Library
- Xin Liu, Yiu-ming Cheung, Shu-Juan Peng, Zhen Cui, Bineng Zhong, and Ji-Xiang Du. 2014. Automatic motion capture data denoising via filtered subspace clustering and low rank matrix approximation. Signal Process. 105 (2014), 350--362.Google Scholar
Cross Ref
- Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. 2015. SMPL: A skinned multi-person linear model. ACM Trans. Graph. 34, 6 (2015), 248.Google Scholar
Digital Library
- Naureen Mahmood, Nima Ghorbani, Nikolaus F. Troje, Gerard Pons-Moll, and Michael J. Black. 2019. AMASS: Archive of motion capture as surface shapes. In Proc. of ICCV. 5441--5450.Google Scholar
- Utkarsh Mall, G. Roshan Lal, Siddhartha Chaudhuri, and Parag Chaudhuri. 2017. A deep recurrent framework for cleaning motion capture data. CoRR (2017).Google Scholar
- Sang Il Park and Jessica K. Hodgins. 2006. Capturing and animating skin deformation in human motion. ACM Trans. Graph. 25, 3 (2006), 881--889.Google Scholar
Digital Library
- Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed AA Osman, Dimitrios Tzionas, and Michael J Black. 2019. Expressive body capture: 3d hands, face, and body from a single image. In Proc. of CVPR. 10975--10985.Google Scholar
Cross Ref
- Dario Pavllo, Mathias Delahaye, Thibault Porssut, Bruno Herbelin, and Ronan Boulic. 2019. Real-time neural network prediction for handling two-hands mutual occlusions. Comput. Graph. X 2 (2019).Google Scholar
- Dario Pavllo, Christoph Feichtenhofer, Michael Auli, and David Grangier. 2020. Modeling human motion with Quaternion-based neural networks. Int. J. Comput. Vis. 128, 4 (2020), 855--872.Google Scholar
Digital Library
- Maksym Perepichka, Daniel Holden, Sudhir Mudur, and Tiberiu Popa. 2019. Robust marker trajectory repair for MOCAP using kinematic reference. In Proc. of MIG. 1--10.Google Scholar
Digital Library
- Kathleen M Robinette, Sherri Blackwell, Hein Daanen, Mark Boehmer, and Scott Fleming. 2002. Civilian American and European surface anthropometry resource (CAESAR). Technical Report.Google Scholar
- Javier Romero, Dimitrios Tzionas, and Michael J Black. 2017. Embodied hands: Modeling and capturing hands and bodies together. ACM Trans. Graph. 36, 6 (2017), 245.Google Scholar
Digital Library
- Jochen Tautges, Arno Zinke, Björn Krüger, Jan Baumann, Andreas Weber, Thomas Helten, Meinard Müller, Hans-Peter Seidel, and Bernd Eberhardt. 2011. Motion reconstruction using sparse accelerometer data. ACM Trans. Graph. 30, 3 (2011), 18:1--18:12.Google Scholar
Digital Library
- Graham W. Taylor, Geoffrey E. Hinton, and Sam T. Roweis. 2006. Modeling human motion using binary latent variables. In Proc. of NIPS. 1345--1352.Google Scholar
- Zhao Wang, Shuang Liu, Rongqiang Qian, Tao Jiang, Xiaosong Yang, and Jian J Zhang. 2016. Human motion data refinement unitizing structural sparsity and spatial-temporal information. In Proc. of ICSP. 975--982.Google Scholar
Cross Ref
- Jun Xiao, Yinfu Feng, Mingming Ji, Xiaosong Yang, Jian J. Zhang, and Yueting Zhuang. 2015. Sparse motion bases selection for human motion denoising. Signal Process. 110 (2015), 108--122.Google Scholar
Digital Library
- Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proc. of AAAI. 7444--7452.Google Scholar
Cross Ref
- Victor B. Zordan and Nicholas C. Van Der Horst. 2003. Mapping optical motion capture data to skeletal motion using a physical model. In Proc. of SCA. 245--250.Google Scholar
Index Terms
MoCap-solver: a neural solver for optical motion capture data
Recommendations
Robust solving of optical motion capture data by denoising
Raw optical motion capture data often includes errors such as occluded markers, mislabeled markers, and high frequency noise or jitter. Typically these errors must be fixed by hand - an extremely time-consuming and tedious task. Due to this, there is a ...
Mobilizing mocap, motion blending, and mayhem: rig interoperability for crowd simulation on Incredibles 2
SIGGRAPH '18: ACM SIGGRAPH 2018 TalksThe stylized world of Incredibles 2 features large urban crowds both in everyday situations and in scenes of panicked mayhem. While Pixar's now academy award winning animation software, Presto, has allowed us to create expressive and nuanced rigs for ...
Motion capture and visualization of the hip joint with dynamic MRI and optical systems: Research Articles
Special Issue: The Very Best Papers from CASA 2004We present a methodology for motion tracking and visualization of the hip joint by combining MR images and optical motion capture systems. MRI is typically used to capture the subject's anatomy while optical systems are used to capture and analyse the ...





Comments