skip to main content
research-article

Improving Multiperson Pose Estimation by Mask-aware Deep Reinforcement Learning

Authors Info & Claims
Published:05 July 2020Publication History
Skip Abstract Section

Abstract

Research on single-person pose estimation based on deep neural networks has recently witnessed progress in both accuracy and execution efficiency. However, multiperson pose estimation is still a challenging topic, partially because the object regions are selected greedily from proposals via class-agnostic nonmaximum suppression (NMS), and the misalignment in the redundant detection yields inaccurate human poses. Therefore, we consider how to obtain the optimal input in human pose estimation under conditions in which intermediate label information is not available. As supervised learning–based alignment does not generalize well to unseen samples in the human pose space, in this article, we present a mask-aware deep reinforcement learning approach to modify the detection result. We use mask information to remove the adverse effects from the cluttered background and to select the optimal action according to the revised reward function. We also propose a new regularization term to punish joints that are outside of the silhouette region in the human pose estimation stage. We evaluate our approach on the MPII Multiperson dataset and the MS-COCO Keypoints Challenge. The results show that our approach yields competing inference results when it is compared to the other state-of-the-art approaches.

References

  1. Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2014. 2D human pose estimation: New benchmark and state of the art analysis. In Proceedings of the CVPR. 3686--3693.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Juan C. Caicedo and Svetlana Lazebnik. 2015. Active object localization with deep reinforcement learning. In Proceedings of the ICCV. 2488--2496.Google ScholarGoogle Scholar
  3. Qingxing Cao, Liang Lin, Yukai Shi, Xiaodan Liang, and Guanbin Li. 2017a. Attention-aware face hallucination via deep reinforcement learning. In Proceedings of the CVPR. 690--698.Google ScholarGoogle ScholarCross RefCross Ref
  4. Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017b. Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the CVPR. 3641--3648.Google ScholarGoogle ScholarCross RefCross Ref
  5. Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, and Jian Sun. 2018. Cascaded pyramid network for multi-person pose estimation. In Proceedings of the CVPR. 1574--1584.Google ScholarGoogle ScholarCross RefCross Ref
  6. Xiao Chu, Wei Yang, Wanli Ouyang, Cheng Ma, Alan L. Yuille, and Xiaogang Wang. 2017. Multi-context attention for human pose estimation. In Proceedings of the CVPR. 1831--1840.Google ScholarGoogle ScholarCross RefCross Ref
  7. Ronan Collobert, Koray Kavukcuoglu, and Clément Farabet. 2011. Torch7: A Matlab-like environment for machine learning. In Proceedings of the NIPS Workshop. EPFL--CONF--192376.Google ScholarGoogle Scholar
  8. Jifeng Dai, Kaiming He, and Jian Sun. 2015. Convolutional feature masking for joint object and stuff segmentation. In Proceedings of the CVPR. 3992--4000.Google ScholarGoogle ScholarCross RefCross Ref
  9. Jifeng Dai, Kaiming He, and Jian Sun. 2016. Instance-aware semantic segmentation via multi-task network cascades. In Proceedings of the CVPR. 3150--3158.Google ScholarGoogle ScholarCross RefCross Ref
  10. Abhishek Das, Satwik Kottur, José M. F. Moura, Stefan Lee, and Dhruv Batra. 2017. Learning cooperative visual dialog agents with deep reinforcement learning. In Proceedings of the ICCV. 2951--2960.Google ScholarGoogle ScholarCross RefCross Ref
  11. Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai, and Cewu Lu. 2017. RMPE: Regional multi-person pose estimation. In Proceedings of the ICCV. 1640--1648.Google ScholarGoogle ScholarCross RefCross Ref
  12. Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the CVPR. 580--587.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Rıza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. 2018. Densepose: Dense human pose estimation in the wild. In Proceedings of the CVPR. 7297--7306.Google ScholarGoogle ScholarCross RefCross Ref
  14. Adam W. Harley, Konstantinos G. Derpanis, and Iasonas Kokkinos. 2017. Segmentation-aware convolutional networks using local attention masks. In Proceedings of the ICCV, Vol. 2. 7.Google ScholarGoogle Scholar
  15. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the ICCV. 2980--2988.Google ScholarGoogle Scholar
  16. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the CVPR. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  17. Charmgil Hong and Milos Hauskrecht. 2015. Multivariate conditional anomaly detection and its clinical application. In Proceedings of the AAAI. 4239--4240.Google ScholarGoogle Scholar
  18. Chen Huang, Simon Lucey, and Deva Ramanan. 2017. Learning policies for adaptive tracking with deep feature cascades. In Proceedings of the ICCV. 105--114.Google ScholarGoogle ScholarCross RefCross Ref
  19. Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Evgeny Levinkov, Bjoern Andres, and Bernt Schiele. 2017. ArtTrack: Articulated multi-person tracking in the wild. In Proceedings of the CVPR. 520--527.Google ScholarGoogle ScholarCross RefCross Ref
  20. Eldar Insafutdinov, Leonid Pishchulin, Bjoern Andres, Mykhaylo Andriluka, and Bernt Schiele. 2016. DeeperCut: A deeper, stronger, and faster multi-person pose estimation model. In Proceedings of the ECCV. 34--50.Google ScholarGoogle ScholarCross RefCross Ref
  21. Umar Iqbal and Juergen Gall. 2016. Multi-person pose estimation with local joint-to-person associations. In Proceedings of the ECCV. 627--642.Google ScholarGoogle ScholarCross RefCross Ref
  22. Lipeng Ke, Ming-Ching Chang, Honggang Qi, and Siwei Lyu. 2018. Multi-scale structure-aware network for human pose estimation. In Proceedings of the ECCV. 713--728.Google ScholarGoogle ScholarCross RefCross Ref
  23. Hei Law and Jia Deng. 2018. CornerNet: Detecting objects as paired keypoints. In Proceedings of the ECCV. 734--750.Google ScholarGoogle ScholarCross RefCross Ref
  24. Evgeny Levinkov, Jonas Uhrig, Siyu Tang, Mohamed Omran, Eldar Insafutdinov, Alexander Kirillov, Carsten Rother, Thomas Brox, Bernt Schiele, and Bjoern Andres. 2017. Joint graph decomposition 8 node labeling: Problem, algorithms, applications. In Proceedings of the CVPR. 417--422.Google ScholarGoogle ScholarCross RefCross Ref
  25. Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, and Yichen Wei. 2017. Fully convolutional instance-aware semantic segmentation. In Proceedings of the CVPR. 1450--1458.Google ScholarGoogle ScholarCross RefCross Ref
  26. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Proceedings of the ECCV. 740--755.Google ScholarGoogle Scholar
  27. Honglin Liu, Dehui Kong, Shaofan Wang, and Baocai Yin. 2016. Sparse pose regression via componentwise clustering feature point representation. IEEE Trans. Multimedia 18, 7 (2016), 1233--1244.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Si Liu, Jiashi Feng, Csaba Domokos, Hui Xu, Junshi Huang, Zhenzhen Hu, and Shuicheng Yan. 2014. Fashion parsing with weak color-category labels. IEEE Trans. Multimedia 16, 1 (2014), 253--265.Google ScholarGoogle ScholarCross RefCross Ref
  29. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, and Georg Ostrovski. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--538.Google ScholarGoogle Scholar
  30. Alejandro Newell, Zhiao Huang, and Jia Deng. 2017. Associative embedding: End-to-end learning for joint detection and grouping. In Proceedings of the NIPS. 2274--2284.Google ScholarGoogle Scholar
  31. Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In Proceedings of the ECCV. 483--499.Google ScholarGoogle ScholarCross RefCross Ref
  32. George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, and Kevin Murphy. 2017. Towards accurate multi-person pose estimation in the wild. In Proceedings of the CVPR. 4903--4911.Google ScholarGoogle ScholarCross RefCross Ref
  33. Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter V. Gehler, and Bernt Schiele. 2016. DeepCut: Joint subset partition and labeling for multi person pose estimation. In Proceedings of the CVPR. 4929--4937.Google ScholarGoogle ScholarCross RefCross Ref
  34. Yongming Rao, Jiwen Lu, and Jie Zhou. 2017. Attention-aware deep reinforcement learning for video face recognition. In Proceedings of the ICCV. 3931--3940.Google ScholarGoogle ScholarCross RefCross Ref
  35. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the NIPS. 91--99.Google ScholarGoogle Scholar
  36. Yan Tian, Leonid Sigal, Fernando De la Torre, and Yonghua Jia. 2013. Canonical locality preserving latent variable model for discriminative pose inference. Image Vis. Comput. 31, 3 (2013), 223--230.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Van Hasselt, Marc Lanctot, and Nando De Freitas. 2016. Dueling network architectures for deep reinforcement learning. In Proceedings of the ICML. 560--567.Google ScholarGoogle Scholar
  38. Bo Xiao, Panayiotis Georgiou, Brian Baucom, and Shrikanth S. Narayanan. 2015. Head motion modeling for human behavior analysis in dyadic interaction. IEEE Trans. Multimedia 17, 7 (2015), 1107--1119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Shuqin Xie, Zitian Chen, Chao Xu, and Cewu Lu. 2018. Environment upgrade reinforcement learning for non-differentiable multi-stage pipelines. In Proceedings of the CVPR. 472--479.Google ScholarGoogle ScholarCross RefCross Ref
  40. Wei Yang, Shuang Li, Wanli Ouyang, Hongsheng Li, and Xiaogang Wang. 2017. Learning feature pyramids for human pose estimation. In Proceedings of the ICCV. 840--847.Google ScholarGoogle ScholarCross RefCross Ref
  41. Sangdoo Yun, Jongwon Choi, Youngjoon Yoo, Kimin Yun, and Jin Young Choi. 2017. Action-decision networks for visual tracking with deep reinforcement learning. In Proceedings of the CVPR. 2711--2720.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Improving Multiperson Pose Estimation by Mask-aware Deep Reinforcement Learning

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 16, Issue 3
          August 2020
          364 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/3409646
          Issue’s Table of Contents

          Copyright © 2020 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 5 July 2020
          • Online AM: 7 May 2020
          • Accepted: 1 April 2020
          • Revised: 1 March 2020
          • Received: 1 July 2019
          Published in tomm Volume 16, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!