skip to main content
research-article

Real-time pose and shape reconstruction of two interacting hands with a single depth camera

Published:12 July 2019Publication History
Skip Abstract Section

Abstract

We present a novel method for real-time pose and shape reconstruction of two strongly interacting hands. Our approach is the first two-hand tracking solution that combines an extensive list of favorable properties, namely it is marker-less, uses a single consumer-level depth camera, runs in real time, handles inter- and intra-hand collisions, and automatically adjusts to the user's hand shape. In order to achieve this, we embed a recent parametric hand pose and shape model and a dense correspondence predictor based on a deep neural network into a suitable energy minimization framework. For training the correspondence prediction network, we synthesize a two-hand dataset based on physical simulations that includes both hand pose and shape annotations while at the same time avoiding inter-hand penetrations. To achieve real-time rates, we phrase the model fitting in terms of a nonlinear least-squares problem so that the energy can be optimized based on a highly efficient GPU-based Gauss-Newton optimizer. We show state-of-the-art results in scenes that exceed the complexity level demonstrated by previous work, including tight two-hand grasps, significant inter-hand occlusions, and gesture interaction.1

References

  1. Riza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. 2018. DensePose: Dense Human Pose Estimation in the Wild. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  2. Riza Alp Guler, George Trigeorgis, Epameinondas Antonakos, Patrick Snape, Stefanos Zafeiriou, and Iasonas Kokkinos. 2017. DenseReg: Fully Convolutional Dense Shape Regression In-The-Wild. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  3. Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2015. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561 (2015).Google ScholarGoogle Scholar
  4. Seungryul Baek, Kwang In Kim, and Tae-Kyun Kim. 2018. Augmented Skeleton Space Transfer for Depth-Based Hand Pose Estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  5. Luca Ballan, Aparna Taneja, Juergen Gall, Luc Van Gool, and Marc Pollefeys. 2012. Motion Capture of Hands in Action using Discriminative Salient Points. In European Conference on Computer Vision (ECCV). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Michael M Bronstein, Alexander M Bronstein, Ron Kimmel, and Irad Yavneh. 2006. Multigrid multidimensional scaling. Numerical linear algebra with applications 13, 2--3 (2006), 149--171.Google ScholarGoogle Scholar
  7. Yujun Cai, Liuhao Ge, Jianfei Cai, and Junsong Yuan. 2018. Weakly-supervised 3d hand pose estimation from monocular rgb images. In European Conference on Computer Vision. Springer, Cham, 1--17.Google ScholarGoogle ScholarCross RefCross Ref
  8. Chiho Choi, Ayan Sinha, Joon Hee Choi, Sujin Jang, and Karthik Ramani. 2015. A collaborative filtering approach to real-time hand pose estimation. In Proceedings of the IEEE international conference on computer vision. 2336--2344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Liuhao Ge, Yujun Cai, Junwu Weng, and Junsong Yuan. 2018. Hand PointNet: 3D Hand Pose Estimation Using Point Sets. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  10. Shangchen Han, Beibei Liu, Robert Wang, Yuting Ye, Christopher D Twigg, and Kenrick Kin. 2018. Online optical marker-based hand tracking with deep labels. ACM Transactions on Graphics (TOG) 37, 4 (2018), 166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Markus Höll, Markus Oberweger, Clemens Arth, and Vincent Lepetit. 2018. Efficient Physics-Based Implementation for Realistic Hand-Object Interaction in Virtual Reality. In 2018 IEEE Conference on Virtual Reality and 3D User Interfaces.Google ScholarGoogle Scholar
  12. Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 (2017).Google ScholarGoogle Scholar
  13. Chun-Hao Huang, Benjamin Allain, Jean-Sébastien Franco, Nassir Navab, Slobodan Ilic, and Edmond Boyer. 2016. Volumetric 3d tracking by detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3862--3870.Google ScholarGoogle ScholarCross RefCross Ref
  14. Sameh Khamis, Jonathan Taylor, Jamie Shotton, Cem Keskin, Shahram Izadi, and Andrew Fitzgibbon. 2015. Learning an efficient model of hand shape variation from depth images. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2540--2548.Google ScholarGoogle ScholarCross RefCross Ref
  15. David Kim, Otmar Hilliges, Shahram Izadi, Alex D Butler, Jiawen Chen, Iason Oikonomidis, and Patrick Olivier. 2012. Digits: freehand 3D interactions anywhere using a wrist-worn gloveless sensor. In Proceedings of the 25th annual ACM symposium on User interface software and technology. ACM, 167--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  17. Oscar Koller, O Zargaran, Hermann Ney, and Richard Bowden. 2016. Deep sign: hybrid CNN-HMM for continuous sign language recognition. In Proceedings of the British Machine Vision Conference 2016.Google ScholarGoogle ScholarCross RefCross Ref
  18. Nikolaos Kyriazis and Antonis Argyros. 2014. Scalable 3d tracking of multiple interacting objects. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3430--3437. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. LeapMotion. 2016. https://developer.leapmotion.com/orion.Google ScholarGoogle Scholar
  20. Stan Melax, Leonid Keselman, and Sterling Orsten. 2013. Dynamics based 3D skeletal hand tracking. In Proceedings of Graphics Interface 2013. Canadian Information Processing Society, 63--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Franziska Mueller, Florian Bernard, Oleksandr Sotnychenko, Dushyant Mehta, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2018. GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB. In Proceedings of Computer Vision and Pattern Recognition (CVPR). 11. http://handtracker.mpi-inf.mpg.de/projects/GANeratedHands/Google ScholarGoogle ScholarCross RefCross Ref
  22. Franziska Mueller, Dushyant Mehta, Oleksandr Sotnychenko, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2017. Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor. In International Conference on Computer Vision (ICCV).Google ScholarGoogle Scholar
  23. Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision. Springer, 483--499.Google ScholarGoogle ScholarCross RefCross Ref
  24. Markus Oberweger, Paul Wohlhart, and Vincent Lepetit. 2015. Training a feedback loop for hand pose estimation. In IEEE International Conference on Computer Vision (ICCV). 3316--3324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Iason Oikonomidis, Nikolaos Kyriazis, and Antonis A Argyros. 2011a. Efficient model-based 3D tracking of hand articulations using Kinect.. In BMVC, Vol. 1. 3.Google ScholarGoogle Scholar
  26. Iason Oikonomidis, Nikolaos Kyriazis, and Antonis A Argyros. 2011b. Full dof tracking of a hand interacting with an object by modeling occlusions and physical constraints. In IEEE International Conference on Computer Vision (ICCV). IEEE, 2088--2095. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Iasonas Oikonomidis, Nikolaos Kyriazis, and Antonis A Argyros. 2012. Tracking the articulated motion of two strongly interacting hands. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 1862--1869. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Chen Qian, Xiao Sun, Yichen Wei, Xiaoou Tang, and Jian Sun. 2014. Realtime and Robust Hand Tracking from Depth. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1106--1113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Edoardo Remelli, Anastasia Tkach, Andrea Tagliasacchi, and Mark Pauly. 2017. Low-Dimensionality Calibration Through Local Anisotropic Scaling for Robust Hand Model Personalization. In The IEEE International Conference on Computer Vision (ICCV).Google ScholarGoogle Scholar
  30. Grégory Rogez, Maryam Khademi, JS Supančič III, Jose Maria Martinez Montiel, and Deva Ramanan. 2014. 3D hand pose detection in egocentric RGB-D images. In Workshop at the European Conference on Computer Vision. Springer, 356--371.Google ScholarGoogle Scholar
  31. Javier Romero, Dimitrios Tzionas, and Michael J. Black. 2017. Embodied Hands: Modeling and Capturing Hands and Bodies Together. ACM Trans. Graph. 36, 6, Article 245 (Nov. 2017), 17 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.Google ScholarGoogle ScholarCross RefCross Ref
  33. Toby Sharp, Cem Keskin, Duncan Robertson, Jonathan Taylor, Jamie Shotton, David Kim, Christoph Rhemann, Ido Leichter, Alon Vinnikov, Yichen Wei, et al. 2015. Accurate, robust, and flexible real-time hand tracking. In Proceedings of ACM Conference on Human Factors in Computing Systems (CHI). ACM, 3633--3642. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Tomas Simon, Hanbyul Joo, Iain Matthews, and Yaser Sheikh. 2017. Hand Keypoint Detection in Single Images using Multiview Bootstrapping. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  35. Mohamed Soliman, Franziska Mueller, Lena Hegemann, Joan Sol Roo, Christian Theobalt, and Jürgen Steimle. 2018. FingerInput: Capturing Expressive Single-Hand Thumb-to-Finger Microgestures. In Proceedings of the 2018 ACM International Conference on Interactive Surfaces and Spaces. ACM, 177--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Adrian Spurr, Jie Song, Seonwook Park, and Otmar Hilliges. 2018. Cross-Modal Deep Variational Hand Pose Estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  37. Srinath Sridhar, Franziska Mueller, Antti Oulasvirta, and Christian Theobalt. 2015. Fast and Robust Hand Tracking Using Detection-Guided Optimization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 9. http://handtracker.mpi-inf.mpg.de/projects/FastHandTracker/Google ScholarGoogle Scholar
  38. Srinath Sridhar, Franziska Mueller, Michael Zollhöefer, Dan Casas, Antti Oulasvirta, and Christian Theobalt. 2016. Real-time Joint Tracking of a Hand Manipulating an Object from RGB-D Input. In European Conference on Computer Vision (ECCV). 17. http://handtracker.mpi-inf.mpg.de/projects/RealtimeHO/Google ScholarGoogle ScholarCross RefCross Ref
  39. Srinath Sridhar, Antti Oulasvirta, and Christian Theobalt. 2013. Interactive markerless articulated hand motion tracking using RGB and depth data. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2456--2463. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Srinath Sridhar, Helge Rhodin, Hans-Peter Seidel, Antti Oulasvirta, and Christian Theobalt. 2014. Real-time Hand Tracking Using a Sum of Anisotropic Gaussians Model. In Proceedings of the International Conference on 3D Vision (3DV). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. James Steven Supančič, Grégory Rogez, Yi Yang, Jamie Shotton, and Deva Ramanan. 2018. Depth-Based Hand Pose Estimation: Methods, Data, and Challenges. International Journal of Computer Vision 126, 11 (01 Nov 2018), 1180--1198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Andrea Tagliasacchi, Matthias Schroeder, Anastasia Tkach, Sofien Bouaziz, Mario Botsch, and Mark Pauly. 2015. Robust Articulated-ICP for Real-Time Hand Tracking. Computer Graphics Forum (Symposium on Geometry Processing) 34, 5 (2015).Google ScholarGoogle Scholar
  43. David Joseph Tan, Thomas Cashman, Jonathan Taylor, Andrew Fitzgibbon, Daniel Tarlow, Sameh Khamis, Shahram Izadi, and Jamie Shotton. 2016. Fits Like a Glove: Rapid and Reliable Hand Shape Personalization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5610--5619.Google ScholarGoogle ScholarCross RefCross Ref
  44. Danhang Tang, Hyung Jin Chang, Alykhan Tejani, and Tae-Kyun Kim. 2014. Latent regression forest: Structured estimation of 3d articulated hand posture. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3786--3793. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Danhang Tang, Jonathan Taylor, Pushmeet Kohli, Cem Keskin, Tae-Kyun Kim, and Jamie Shotton. 2015. Opening the Black Box: Hierarchical Sampling Optimization for Estimating Human Hand Pose. In Proc. ICCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Jonathan Taylor, Lucas Bordeaux, Thomas Cashman, Bob Corish, Cem Keskin, Toby Sharp, Eduardo Soto, David Sweeney, Julien Valentin, Benjamin Luff, et al. 2016. Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences. ACM Transactions on Graphics (TOG) 35, 4 (2016), 143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Jonathan Taylor, Jamie Shotton, Toby Sharp, and Andrew Fitzgibbon. 2012. The vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 103--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Jonathan Taylor, Vladimir Tankovich, Danhang Tang, Cem Keskin, David Kim, Philip Davidson, Adarsh Kowdle, and Shahram Izadi. 2017. Articulated Distance Fields for Ultra-fast Tracking of Hands Interacting. ACM Trans. Graph. 36, 6, Article 244 (Nov. 2017), 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Anastasia Tkach, Mark Pauly, and Andrea Tagliasacchi. 2016. Sphere-meshes for real-time hand modeling and tracking. ACM Transactions on Graphics (TOG) 35, 6 (2016), 222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Anastasia Tkach, Andrea Tagliasacchi, Edoardo Remelli, Mark Pauly, and Andrew Fitzgibbon. 2017. Online Generative Model Personalization for Hand Tracking. ACM Trans. Graph. 36, 6, Article 243 (Nov. 2017), 11 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Jonathan Tompson, Murphy Stein, Yann Lecun, and Ken Perlin. 2014. Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks. ACM Transactions on Graphics 33 (August 2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Dimitrios Tzionas, Luca Ballan, Abhilash Srikantha, Pablo Aponte, Marc Pollefeys, and Juergen Gall. 2016. Capturing Hands in Action using Discriminative Salient Points and Physics Simulation. International Journal of Computer Vision (IJCV) (2016). http://files.is.tue.mpg.de/dtzionas/Hand-Object-Capture Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Mickeal Verschoor, Daniel Lobo, and Miguel A Otaduy. 2018. Soft Hand Simulation for Smooth and Robust Natural Interaction. In IEEE Conference on Virtual Reality and 3D User Interfaces (VR). IEEE, 183--190.Google ScholarGoogle Scholar
  54. Chengde Wan, Thomas Probst, Luc Van Gool, and Angela Yao. 2017. Crossing Nets: Combining GANs and VAEs with a Shared Latent Space for Hand Pose Estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 680--689.Google ScholarGoogle ScholarCross RefCross Ref
  55. Chengde Wan, Angela Yao, and Luc Van Gool. 2016. Hand pose estimation from local surface normals. In European conference on computer vision. Springer, 554--569.Google ScholarGoogle ScholarCross RefCross Ref
  56. Lingyu Wei, Qixing Huang, Duygu Ceylan, Etienne Vouga, and Hao Li. 2016. Dense Human Body Correspondences Using Convolutional Networks. In Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  57. Qi Ye and Tae-Kyun Kim. 2018. Occlusion-aware Hand Pose Estimation Using Hierarchical Mixture Density Network. In The European Conference on Computer Vision (ECCV).Google ScholarGoogle Scholar
  58. Shanxin Yuan, Guillermo Garcia-Hernando, Björn Stenger, Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee, Pavlo Molchanov, Jan Kautz, Sina Honari, Liuhao Ge, Junsong Yuan, Xinghao Chen, Guijin Wang, Fan Yang, Kai Akiyama, Yang Wu, Qingfu Wan, Meysam Madadi, Sergio Escalera, Shile Li, Dongheui Lee, Iason Oikonomidis, Antonis Argyros, and Tae-Kyun Kim. 2018. Depth-Based 3D Hand Pose Estimation: From Current Achievements to Future Goals. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  59. Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. 2017. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing 26, 7 (2017), 3142--3155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Wenping Zhao, Jianjie Zhang, Jianyuan Min, and Jinxiang Chai. 2013. Robust Realtime Physics-based Motion Control for Human Grasping. ACM Trans. Graph. 32, 6, Article 207 (Nov. 2013), 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Christian Zimmermann and Thomas Brox. 2017. Learning to Estimate 3D Hand Pose from Single RGB Images.. In International Conference on Computer Vision (ICCV).Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Real-time pose and shape reconstruction of two interacting hands with a single depth camera

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Graphics
        ACM Transactions on Graphics  Volume 38, Issue 4
        August 2019
        1480 pages
        ISSN:0730-0301
        EISSN:1557-7368
        DOI:10.1145/3306346
        Issue’s Table of Contents

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 July 2019
        Published in tog Volume 38, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader