skip to main content
research-article

Cross Refinement Techniques for Markerless Human<?brk?> Motion Capture

Authors Info & Claims
Published:04 March 2020Publication History
Skip Abstract Section

Abstract

This article presents a global 3D human pose estimation method for markerless motion capture. Given two calibrated images of a person, it first obtains the 2D joint locations in the images using a pre-trained 2D Pose CNN, then constructs the 3D pose based on stereo triangulation. To improve the accuracy and the stability of the system, we propose two efficient optimization techniques for the joints. The first one, called cross-view refinement, optimizes the joints based on epipolar geometry. The second one, called cross-joint refinement, optimizes the joints using bone-length constraints. Our method automatically detects and corrects the unreliable joint, and consequently is robust against heavy occlusion, symmetry ambiguity, motion blur, and highly distorted poses. We evaluate our method on a number of benchmark datasets covering indoors and outdoors, which showed that our method is better than or on par with the state-of-the-art methods. As an application, we create a 3D human pose dataset using the proposed motion capture system, which contains about 480K images of both indoor and outdoor scenes, and demonstrate the usefulness of the dataset for human pose estimation.

References

  1. Ijaz Akhter and Michael J. Black. 2015. Pose-conditioned joint angle limits for 3D human pose reconstruction. In Proceedings of the IEEE CVPR.Google ScholarGoogle Scholar
  2. Sikandar Amin, Mykhaylo Andriluka, Marcus Rohrbach, and Bernt Schiele. 2013. Multi-view pictorial structures for 3D human pose estimation. In Proceedings of the BMVC.Google ScholarGoogle ScholarCross RefCross Ref
  3. Mykhaylo Andriluka, Stefan Roth, and Bernt Schiele. 2009. Pictorial structures revisited: People detection and articulated pose estimation. In Proceedings of the IEEE CVPR. 1014--1021.Google ScholarGoogle ScholarCross RefCross Ref
  4. Michal Balazia and Petr Sojka. 2018. Gait recognition from motion capture data. ACM Trans. Multim. Comput. Commun. Appl. 14, 1s (2018), 22:1--22:18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Vasileios Belagiannis, Sikandar Amin, Mykhaylo Andriluka, Bernt Schiele, Nassir Navab, and Slobodan Ilic. 2014. 3D pictorial structures for multiple human pose estimation. In Proceedings of the IEEE CVPR. 1669--1676.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Vasileios Belagiannis, Sikandar Amin, Mykhaylo Andriluka, Bernt Schiele, Nassir Navab, and Slobodan Ilic. 2016. 3D pictorial structures revisited: Multiple human pose estimation. IEEE Trans. Pattern Anal. Mach. Intell. 38, 10 (2016), 1929--1942.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Martin Bergtholdt, Jörg Kappes, Stefan Schmidt, and Christoph Schnörr. 2010. A study of parts-based object class detection using complete graphs. Int. J. Comput. Vis. 87, 1--2 (2010), 93.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In Proceedings of the ECCV. Springer, 561--578.Google ScholarGoogle Scholar
  9. Magnus Burenius, Josephine Sullivan, and Stefan Carlsson. 2013. 3D pictorial structures for multiple view articulated pose estimation. In Proceedings of the IEEE CVPR. 3618--3625.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the IEEE CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  11. Joao Carreira, Pulkit Agrawal, Katerina Fragkiadaki, and Jitendra Malik. 2016. Human pose estimation with iterative error feedback. In Proceedings of the IEEE CVPR. 4733--4742.Google ScholarGoogle ScholarCross RefCross Ref
  12. Ching-Hang Chen and Deva Ramanan. 2017. 3D human pose estimation= 2D pose estimation+ matching. In Proceedings of the IEEE CVPR, Vol. 2. 6.Google ScholarGoogle Scholar
  13. Xipeng Chen, Kwan-Yee Lin, Wentao Liu, Chen Qian, and Liang Lin. 2019. Weakly supervised discovery of geometry-aware representation for 3D human pose estimation. In Proceedings of the IEEE CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  14. Xianjie Chen and Alan L. Yuille. 2014. Articulated pose estimation by a graphical model with image dependent pairwise relations. In Proceedings of the NIPS. 1736--1744.Google ScholarGoogle Scholar
  15. Yen-Lin Chen and Jinxiang Chai. 2009. 3D reconstruction of human motion and skeleton from uncalibrated monocular video. In Proceedings of the ACCV. Springer.Google ScholarGoogle Scholar
  16. Ahmed Elhayek, Edilson de Aguiar, Arjun Jain, J. Thompson, Leonid Pishchulin, Mykhaylo Andriluka, Christoph Bregler, Bernt Schiele, and Christian Theobalt. 2017. MARCOnl-ConvNet-based MARker-less motion capture in outdoor and indoor scenes. IEEE Trans. Pattern Anal. Mach. Intell. 39, 3 (2017), 501--514.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ahmed Elhayek, Edilson de Aguiar, Arjun Jain, Jonathan Tompson, Leonid Pishchulin, Mykhaylo Andriluka, Christoph Bregler, Bernt Schiele, and Christian Theobalt. 2015. Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras. In Proceedings of the IEEE CVPR. 3810--3818.Google ScholarGoogle ScholarCross RefCross Ref
  18. Haoshu Fang, Yuanlu Xu, Wenguan Wang, Xiaobai Liu, and Song-Chun Zhu. 2017. Learning knowledge-guided pose grammar machine for 3D human pose estimation. arXiv preprint:1710.06513 (2017).Google ScholarGoogle Scholar
  19. Pedro F. Felzenszwalb and Daniel P. Huttenlocher. 2005. Pictorial structures for object recognition. Int. J. Comput. Vis. 61, 1 (2005), 55--79.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Martin A. Fischler and Robert A. Elschlager. 1973. The representation and matching of pictorial structures. IEEE Trans. Comput. 100, 1 (1973), 67--92.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Richard Hartley and Andrew Zisserman. 2003. Multiple View Geometry in Computer Vision. Cambridge University Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Edmond S. L. Ho, Jacky C. P. Chan, Taku Komura, and Howard Leung. 2013. Interactive partner control in close interactions for real-time applications. ACM Trans. Multim. Comput. Commun. Applic. 9, 3 (2013), 21.Google ScholarGoogle Scholar
  23. Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014. Human3.6m: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36, 7 (2014), 1325--1339.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Vahid Kazemi, Magnus Burenius, Hossein Azizpour, and Josephine Sullivan. 2013. Multi-view body part recognition with random forests. In Proceedings of the BMVC.Google ScholarGoogle ScholarCross RefCross Ref
  25. Muhammed Kocabas, Salih Karagoz, and Emre Akbas. 2019. Self-supervised learning of 3D human pose using multi-view geometry. In Proceedings of the IEEE CVPR. 1077–1086.Google ScholarGoogle ScholarCross RefCross Ref
  26. Miaopeng Li, Zimeng Zhou, Jie Li, and Xinguo Liu. 2018. Bottom-up pose estimation of multiple person with bounding box constraint. In Proceedings of the IEEE ICPR.Google ScholarGoogle ScholarCross RefCross Ref
  27. Miaopeng Li, Zimeng Zhou, and Xinguo Liu. 2019. Multi-person pose estimation using bounding box constraint and LSTM. IEEE Trans. Multim. 21, 10 (2019), 2653–2663.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sijin Li and Antoni B. Chan. 2014. 3D human pose estimation from monocular images with deep convolutional neural network. In Proceedings of the ACCV. Springer, 332--347.Google ScholarGoogle Scholar
  29. Sijin Li, Weichen Zhang, and Antoni B. Chan. 2015. Maximum-margin structured learning with deep networks for 3D human pose estimation. In Proceedings of the ICCV. 2848--2856.Google ScholarGoogle Scholar
  30. Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A skinned multi-person linear model. ACM Trans. Graph. 34, 6 (2015), 248.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Alvaro Marcos-Ramiro, Daniel Pizarro, Marta Marron-Romera, and Daniel Gatica-Perez. 2015. Let your body speak: Communicative cue extraction on natural interaction using RGBD data. IEEE Trans. Multim. 17, 10 (2015), 1721--1732.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Julieta Martinez, Rayat Hossain, Javier Romero, and James J. Little. 2017. A simple yet effective baseline for 3D human pose estimation. In Proceedings of the IEEE ICCV, Vol. 206. 3.Google ScholarGoogle Scholar
  33. Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017. Monocular 3D human pose estimation in the wild using improved CNN supervision. In Proceedings of the 3DV.Google ScholarGoogle ScholarCross RefCross Ref
  34. Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017. VNect: Real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. 36, 4 (2017), 44.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Thomas B. Moeslund, Adrian Hilton, and Volker Krüger. 2006. A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Underst. 104, 2–3 (2006), 90--127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In Proceedings of the ECCV. Springer, 483--499.Google ScholarGoogle ScholarCross RefCross Ref
  37. Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, and Kostas Daniilidis. 2017. Coarse-to-fine volumetric prediction for single-image 3D human pose. In Proceedings of the IEEE CVPR. 1263--1272.Google ScholarGoogle ScholarCross RefCross Ref
  38. Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, and Kostas Daniilidis. 2017. Harvesting multiple views for marker-less 3D human pose annotations. arXiv preprint:1704.04793 (2017).Google ScholarGoogle Scholar
  39. Tomas Pfister, James Charles, and Andrew Zisserman. 2015. Flowing ConvNets for human pose estimation in videos. In Proceedings of the IEEE ICCV. 1913--1921.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2012. Reconstructing 3D human pose from 2D image landmarks. In Proceedings of the ECCV. Springer.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Marta Sanzari, Valsamis Ntouskos, and Fiora Pirri. 2016. Bayesian image based 3D pose estimation. In Proceedings of the ECCV. Springer, 566--582.Google ScholarGoogle ScholarCross RefCross Ref
  42. Yemin Shi, Yonghong Tian, Yaowei Wang, and Tiejun Huang. 2017. Sequential deep trajectory descriptor for action recognition with three-stream CNN. IEEE Trans. Multim. 19, 7 (2017), 1510--1520.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Leonid Sigal, Alexandru O. Balan, and Michael J. Black. 2010. Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. 87, 1--2 (2010), 4.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Leonid Sigal, Michael Isard, Horst Haussecker, and Michael J. Black. 2012. Loose-limbed people: Estimating 3D human pose and motion using non-parametric belief propagation. Int. J. Comput. Vis. 98, 1 (2012), 15--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Yong Su, Zhiyong Feng, Jianhai Zhang, Weilong Peng, and Meng Xing. 2018. Sequential articulated motion reconstruction from a monocular image sequence. ACM Trans. Multim. Comput. Commun. Applic. 14, 1s (2018), 23.Google ScholarGoogle Scholar
  46. Xiao Sun, Jiaxiang Shang, Shuang Liang, and Yichen Wei. 2017. Compositional human pose regression. In Proceedings of the IEEE ICCV.Google ScholarGoogle ScholarCross RefCross Ref
  47. Graham W. Taylor, Leonid Sigal, David J. Fleet, and Geoffrey E. Hinton. 2010. Dynamical binary latent variable models for 3D human pose tracking. In Proceedings of the IEEE CVPR. 631--638.Google ScholarGoogle Scholar
  48. Bugra Tekin, Isinsu Katircioglu, Mathieu Salzmann, Vincent Lepetit, and Pascal Fua. 2016. Structured prediction of 3D human pose with deep neural networks. In Proceedings of the BMVC.Google ScholarGoogle ScholarCross RefCross Ref
  49. Bugra Tekin, Pablo Marquez Neila, Mathieu Salzmann, and Pascal Fua. 2017. Learning to fuse 2D and 3D image cues for monocular body pose estimation. In Proceedings of the IEEE ICCV.Google ScholarGoogle ScholarCross RefCross Ref
  50. Bugra Tekin, Artem Rozantsev, Vincent Lepetit, and Pascal Fua. 2016. Direct prediction of 3D body poses from motion compensated sequences. In Proceedings of the IEEE CVPR. 991--1000.Google ScholarGoogle ScholarCross RefCross Ref
  51. Jonathan J. Tompson, Arjun Jain, Yann LeCun, and Christoph Bregler. 2014. Joint training of a convolutional network and a graphical model for human pose estimation. In Proceedings of the NIPS. 1799--1807.Google ScholarGoogle Scholar
  52. Alexander Toshev and Christian Szegedy. 2014. Deeppose: Human pose estimation via deep neural networks. In Proceedings of the IEEE CVPR. 1653--1660.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Bastian Wandt, Hanno Ackermann, and Bodo Rosenhahn. 2016. 3D reconstruction of human motion from monocular image sequences. IEEE Trans. Pattern Anal. Mach. Intell. 38, 8 (2016), 1505–1516.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Bastian Wandt, Hanno Ackermann, and Bodo Rosenhahn. 2018. A kinematic chain space for monocular motion capture. In Proceedings of the ECCV.Google ScholarGoogle Scholar
  55. Chunyu Wang, Yizhou Wang, Zhouchen Lin, Alan L. Yuille, and Wen Gao. 2014. Robust estimation of 3D human poses from a single image. In Proceedings of the IEEE CVPR.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In Proceedings of the IEEE CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  57. Jiahong Wu, He Zheng, Bo Zhao, Yixin Li, Baoming Yan, Rui Liang, Wenjia Wang, Shipei Zhou, Guosen Lin, Yanwei Fu et al. 2017. AI challenger: A large-scale dataset for going deeper in image understanding. arXiv preprint:1711.06475 (2017).Google ScholarGoogle Scholar
  58. Wei Yang, Wanli Ouyang, Xiaolong Wang, Jimmy Ren, Hongsheng Li, and Xiaogang Wang. 2018. 3D human pose estimation in the wild by adversarial learning. arXiv preprint:1803.09722 (2018).Google ScholarGoogle Scholar
  59. Angela Yao, Juergen Gall, Luc V. Gool, and Raquel Urtasun. 2011. Learning probabilistic non-linear latent variable models for tracking complex activities. In Proceedings of the NIPS. 1359--1367.Google ScholarGoogle Scholar
  60. Hashim Yasin, Umar Iqbal, Bjorn Kruger, Andreas Weber, and Juergen Gall. 2016. A dual-source approach for 3D pose estimation from a single image. In Proceedings of the IEEE CVPR. 4948--4956.Google ScholarGoogle ScholarCross RefCross Ref
  61. Petrissa Zell, Bastian Wandt, and Bodo Rosenhahn. 2017. Joint 3D human motion capture and physical analysis from monocular videos. In Proceedings of the IEEE CVPRW.Google ScholarGoogle ScholarCross RefCross Ref
  62. Feng Zhou and Fernando De la Torre. 2014. Spatio-temporal matching for human detection in video. In Proceedings of the ECCV. Springer, 62--77.Google ScholarGoogle ScholarCross RefCross Ref
  63. Xingyi Zhou, Qixing Huang, Xiao Sun, Xiangyang Xue, and Yichen Wei. 2017. Towards 3D human pose estimation in the wild: A weakly supervised approach. In Proceedings of the IEEE ICCV.Google ScholarGoogle ScholarCross RefCross Ref
  64. Xiaowei Zhou, Spyridon Leonardos, Xiaoyan Hu, and Kostas Daniilidis. 2015. 3D shape estimation from 2D landmarks: A convex relaxation approach. In Proceedings of the IEEE CVPR. 4447--4455.Google ScholarGoogle ScholarCross RefCross Ref
  65. Xingyi Zhou, Xiao Sun, Wei Zhang, Shuang Liang, and Yichen Wei. 2016. Deep kinematic pose regression. In Proceedings of the ECCV. Springer, 186--201.Google ScholarGoogle ScholarCross RefCross Ref
  66. Xiaowei Zhou, Menglong Zhu, Spyridon Leonardos, Konstantinos G. Derpanis, and Kostas Daniilidis. 2016. Sparseness meets deepness: 3D human pose estimation from monocular video. In Proceedings of the IEEE CVPR. 4966--4975.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Cross Refinement Techniques for Markerless Human<?brk?> Motion Capture

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!