skip to main content
research-article

FasterPose: A Faster Simple Baseline for Human Pose Estimation

Published:04 March 2022Publication History
Skip Abstract Section

Abstract

The performance of human pose estimation depends on the spatial accuracy of keypoint localization. Most existing methods pursue the spatial accuracy through learning the high-resolution (HR) representation from input images. By the experimental analysis, we find that the HR representation leads to a sharp increase of computational cost, while the accuracy improvement remains marginal compared with the low-resolution (LR) representation. In this article, we propose a design paradigm for cost-effective network with LR representation for efficient pose estimation, named FasterPose. Whereas the LR design largely shrinks the model complexity, how to effectively train the network with respect to the spatial accuracy is a concomitant challenge. We study the training behavior of FasterPose and formulate a novel regressive cross-entropy (RCE) loss function for accelerating the convergence and promoting the accuracy. The RCE loss generalizes the ordinary cross-entropy loss from the binary supervision to a continuous range, thus the training of pose estimation network is able to benefit from the sigmoid function. By doing so, the output heatmap can be inferred from the LR features without loss of spatial accuracy, while the computational cost and model size has been significantly reduced. Compared with the previously dominant network of pose estimation, our method reduces 58% of the FLOPs and simultaneously gains 1.3% improvement of accuracy. Extensive experiments show that FasterPose yields promising results on the common benchmarks, i.e., COCO and MPII, consistently validating the effectiveness and efficiency for practical utilization, especially the low-latency and low-energy-budget applications in the non-GPU scenarios.

REFERENCES

  1. [1] Andriluka Mykhaylo, Pishchulin Leonid, Gehler Peter, and Schiele Bernt. 2014. 2D human pose estimation: New benchmark and state of the art analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 36863693.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Bulat Adrian and Tzimiropoulos Georgios. 2016. Human pose estimation via convolutional part heatmap regression.Google ScholarGoogle Scholar
  3. [3] Cai Yuanhao, Wang Zhicheng, Luo Zhengxiong, Yin Binyi, Du Angang, Wang Haoqian, Zhou Xinyu, Zhou Erjin, Zhang Xiangyu, and Sun Jian. 2020. Learning delicate local representations for multi-person pose estimation. Retrieved from https://arXiv:2003.04030.Google ScholarGoogle Scholar
  4. [4] Cao Zhe, Simon Tomas, Wei Shih-En, and Sheikh Yaser. 2017. Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 72917299.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Chen Yilun, Wang Zhicheng, Peng Yuxiang, Zhang Zhiqiang, Yu Gang, and Sun Jian. 2018. Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 71037112.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Chéron Guilhem, Laptev Ivan, and Schmid Cordelia. 2015. P-CNN: Pose-based CNN features for action recognition. In Proceedings of the IEEE International Conference on Computer Vision. 32183226.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Cho Nam-Gyu, Yuille Alan L., and Lee Seong-Whan. 2013. Adaptive occlusion state estimation for human pose tracking under self-occlusions. Pattern Recogn. 46, 3 (2013), 649661.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Fang Hao-Shu, Xie Shuqin, Tai Yu-Wing, and Lu Cewu. 2017. Rmpe: Regional multi-person pose estimation. In Proceedings of the IEEE International Conference on Computer Vision. 23342343.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Faster RCNN. 2015. Faster-RCNN: Towards real-time object detection with region proposal networks. Adv. Neural Info. Process. Syst. (2015), 9199. https://ieeexplore.ieee.org/abstract/document/7485869.Google ScholarGoogle Scholar
  10. [10] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770778.Google ScholarGoogle Scholar
  11. [11] Hu Peiyun and Ramanan Deva. 2016. Bottom-up and top-down reasoning with hierarchical rectified Gaussians. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 56005609.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Huang Junjie, Zhu Zheng, Guo Feng, and Huang Guan. 2020. The devil is in the details: Delving into unbiased data processing for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20).Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Insafutdinov Eldar, Pishchulin Leonid, Andres Bjoern, Andriluka Mykhaylo, and Schiele Bernt. 2016. Deepercut: A deeper, stronger, and faster multi-person pose estimation model. In Proceedings of the European Conference on Computer Vision. Springer, 3450.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Ke Lipeng, Chang Ming-Ching, Qi Honggang, and Lyu Siwei. 2018. Multi-scale structure-aware network for human pose estimation. https://openaccess.thecvf.com/content_ECCV_2018/html/Lipeng_Ke_Multi-Scale_Structure-Aware_Network_ECCV_2018_paper.html.Google ScholarGoogle Scholar
  15. [15] Lifshitz Ita, Fetaya Ethan, and Ullman Shimon. 2016. Human pose estimation using deep consensus voting. In Proceedings of the European Conference on Computer Vision. Springer, 246260.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Lin Tsung Yi, Goyal Priya, Girshick Ross, He Kaiming, and Dollar Piotr. 2017. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell.99 (2017), 29993007. https://openaccess.thecvf.com/content_iccv_2017/html/Lin_Focal_Loss_for_ICCV_2017_paper.html.Google ScholarGoogle Scholar
  17. [17] Lin Tsung-Yi, Maire Michael, Belongie Serge, Hays James, Perona Pietro, Ramanan Deva, Dollár Piotr, and Zitnick C. Lawrence. 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740755.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Moon Gyeongsik, Chang Ju Yong, and Lee Kyoung Mu. 2019. PoseFix: Model-agnostic general human pose refinement network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19).Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Newell Alejandro, Yang Kaiyu, and Deng Jia. 2016. Stacked hourglass networks for human pose estimation. In Proceedings of the European Conference on Computer Vision. Springer, 483499.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Papandreou George, Zhu Tyler, Kanazawa Nori, Toshev Alexander, Tompson Jonathan, Bregler Chris, and Murphy Kevin. 2017. Towards accurate multi-person pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 49034911.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Pishchulin Leonid, Insafutdinov Eldar, Tang Siyu, Andres Bjoern, Andriluka Mykhaylo, Gehler Peter V., and Schiele Bernt. 2016. Deepcut: Joint subset partition and labeling for multi person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 49294937.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Russakovsky Olga, Deng Jia, Su Hao, Krause Jonathan, Satheesh Sanjeev, Ma Sean, Huang Zhiheng, Karpathy Andrej, Khosla Aditya, Bernstein Michael et al. 2015. Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115, 3 (2015), 211252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Sandler Mark, Howard Andrew, Zhu Menglong, Zhmoginov Andrey, and Chen Liang-Chieh. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 45104520.Google ScholarGoogle Scholar
  24. [24] Shi Wenzhe, Caballero Jose, Huszár Ferenc, Totz Johannes, Aitken Andrew P., Bishop Rob, Rueckert Daniel, and Wang Zehan. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 18741883.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Shotton Jamie, Fitzgibbon Andrew, Cook Mat, Sharp Toby, Finocchio Mark, Moore Richard, Kipman Alex, and Blake Andrew. 2011. Real-time human pose recognition in parts from single depth images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). Ieee, 12971304.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Sun Ke, Xiao Bin, Liu Dong, and Wang Jingdong. 2019. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 56935703.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Sun Xiao, Xiao Bin, Wei Fangyin, Liang Shuang, and Wei Yichen. 2018. Integral human pose regression. In Proceedings of the European Conference on Computer Vision (ECCV’18). 529545.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Tang Wei, Yu Pei, and Wu Ying. 2018. Deeply learned compositional models for human pose estimation. https://openaccess.thecvf.com/content_ECCV_2018/html/Wei_Tang_Deeply_Learned_Compositional_ECCV_2018_paper.html.Google ScholarGoogle Scholar
  29. [29] Tompson Jonathan J., Jain Arjun, LeCun Yann, and Bregler Christoph. 2014. Joint training of a convolutional network and a graphical model for human pose estimation. https://proceedings.neurips.cc/paper/2014/hash/e744f91c29ec99f0e662c9177946c627-Abstract.html.Google ScholarGoogle Scholar
  30. [30] Wang Chunyu, Wang Yizhou, and Yuille Alan L.. 2013. An approach to pose-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 915922.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Wang Zhicheng, Li Wenbo, Yin Binyi, Peng Qixiang, Xiao Tianzi, Du Yuming, Li Zeming, Zhang Xiangyu, Yu Gang, and Sun Jian. 2018. Mscoco keypoints challenge 2018. In Proceedings of the Joint Recognition Challenge Workshop at the European Conference on Computer Vision (ECCV’18), Vol. 5.Google ScholarGoogle Scholar
  32. [32] Xiao Bin, Wu Haiping, and Wei Yichen. 2018. Simple baselines for human pose estimation and tracking. In Proceedings of the European Conference on Computer Vision (ECCV’18). 466481.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Yang Wei, Li Shuang, Ouyang Wanli, Li Hongsheng, and Wang Xiaogang. 2017. Learning feature pyramids for human pose estimation. In Proceedings of the IEEE International Conference on Computer Vision. 12811290.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Yu Changqian, Xiao Bin, Gao Changxin, Yuan Lu, Zhang Lei, Sang Nong, and Wang Jingdong. 2021. Lite-HRNet: A lightweight high-resolution network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21).Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Zhang Feng, Zhu Xiatian, Dai Hanbin, Ye Mao, and Zhu Ce. 2020. Distribution-aware coordinate representation for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20).Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Zhang Xiangyu, Zhou Xinyu, Lin Mengxiao, and Sun Jian. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 68486856.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Zhang Zhe, Tang Jie, and Wu Gangshan. 2019. Simple and lightweight human pose estimation. Retrieved from https://arXiv:1911.10346.Google ScholarGoogle Scholar

Index Terms

  1. FasterPose: A Faster Simple Baseline for Human Pose Estimation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 4
          November 2022
          497 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/3514185
          • Editor:
          • Abdulmotaleb El Saddik
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 March 2022
          • Accepted: 1 November 2021
          • Revised: 1 October 2021
          • Received: 1 June 2021
          Published in tomm Volume 18, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!