skip to main content
research-article

Mask-Guided Deformation Adaptive Network for Human Parsing

Authors Info & Claims
Published:14 March 2022Publication History
Skip Abstract Section

Abstract

Due to the challenges of densely compacted body parts, nonrigid clothing items, and severe overlap in crowd scenes, human parsing needs to focus more on multilevel feature representations compared to general scene parsing tasks. Based on this observation, we propose to introduce the auxiliary task of human mask and edge detection to facilitate human parsing. Different from human parsing, which exploits the discriminative features of each category, human mask and edge detection emphasizes the boundaries of semantic parsing regions and the difference between foreground humans and background clutter, which benefits the parsing predictions of crowd scenes and small human parts. Specifically, we extract human mask and edge labels from the human parsing annotations and train a shared encoder with three independent decoders for the three mutually beneficial tasks. Furthermore, the decoder feature maps of the human mask prediction branch are further exploited as attention maps, indicating human regions to facilitate the decoding process of human parsing and human edge detection. In addition to these auxiliary tasks, we further alleviate the problem of deformed clothing items under various human poses by tracking the deformation patterns with the deformable convolution. Extensive experiments show that the proposed method can achieve superior performance against state-of-the-art methods on both single and multiple human parsing datasets. Codes and trained models are available https://github.com/ViktorLiang/MGDAN.

REFERENCES

  1. [1] Bilinski Piotr and Prisacariu Victor. 2018. Dense decoder shortcut connections for single-pass semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 65966605.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Chen Liang-Chieh, Barron Jonathan T., Papandreou George, Murphy Kevin, and Yuille Alan L.. 2016. Semantic image segmentation with task-specific edge detection using CNNs and a discriminatively trained domain transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 45454554.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Chen Liang-Chieh, Zhu Yukun, Papandreou George, Schroff Florian, and Adam Hartwig. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Chen Xianjie, Mottaghi Roozbeh, Liu Xiaobai, Fidler Sanja, Urtasun Raquel, and Yuille Alan. 2014. Detect what you can: Detecting and representing objects using holistic models and body parts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 19791986.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Dai J., Qi H., Xiong Y., Li Y., Zhang G., Hu H., and Wei Y.. 2017. Deformable convolutional networks. In Proceedings of IEEE International Conference on Computer Vision. IEEE, 764773.Google ScholarGoogle Scholar
  6. [6] Fang Hao-Shu, Lu Guansong, Fang Xiaolin, Xie Jianwen, Tai Yu-Wing, and Lu Cewu. 2018. Weakly and semi supervised human body part parsing via pose-guided knowledge transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 7078.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Felzenszwalb Pedro F., Girshick Ross B., McAllester David A., and Ramanan Deva. 2009. Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (2009), 16271645.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Fu Jun, Liu Jing, Tian Haijie, Li Yong, Bao Yongjun, Fang Zhiwei, and Lu Hanqing. 2019. Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 31413149.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Gong Ke, Gao Yiming, Liang Xiaodan, Shen Xiaohui, Wang Meng, and Lin Liang. 2019. Graphonomy: Universal human parsing via graph transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 74427451.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Gong Ke, Liang Xiaodan, Li Yicheng, Chen Yimin, Yang Ming, and Lin Liang. 2018. Instance-level human parsing via part grouping network. In Proceedings of the European Conference on Computer Vision. Springer, Cham, 805822.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Gong Ke, Liang Xiaodan, Zhang Dongyu, Shen Xiaohui, and Lin Liang. 2017. Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 67576765.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] He Haoyu, Zhang Jing, Zhang Qiming, and Tao Dacheng. 2020. Grapy-ML: Graph pyramid mutual learning for cross-dataset human parsing. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] He K., Zhang X., Ren S., and Sun J.. 2016. Deep residual learning for image recognition. In CVPR. IEEE, 770778.Google ScholarGoogle Scholar
  14. [14] Hu Jie, Shen Li, and Sun Gang. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 71327141.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Jaderberg Max, Simonyan Karen, Zisserman Andrew, and Kavukcuoglu Koray. 2015. Spatial transformer networks. In NeurIPS. Curran Associates, Inc., Montreal, Quebec, Canada.Google ScholarGoogle Scholar
  16. [16] Ji Ruyi, Du Dawei, Zhang Libo, Wen Longyin, Wu Yanjun, Zhao Chen, Huang Feiyue, and Lyu Siwei. 2020. Learning semantic neural tree for human parsing. In ECCV, Vedaldi Andrea, Bischof Horst, Brox Thomas, and Frahm Jan-Michael (Eds.).Google ScholarGoogle Scholar
  17. [17] Kalayeh Mahdi M., Basaran Emrah, Gokmen Muhittin, Kamasak Mustafa E., and Shah Mubarak. 2018. Human semantic parsing for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 10621071.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Li P., Xu Y., Wei Y., and Yang Y.. 2020. Self-correction for human parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence (Early Access) (2020), 11.Google ScholarGoogle Scholar
  19. [19] Li T., Liang Z., Zhao S., Gong J., and Shen J.. 2020. Self-learning with rectification strategy for human parsing. In CVPR. 92609269.Google ScholarGoogle Scholar
  20. [20] Li Yanwei, Chen Xinze, Zhu Zheng, Xie Lingxi, Huang Guan, Du Dalong, and Wang Xingang. 2019. Attention-guided unified network for panoptic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 70197028.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Liang X., Gong K., Shen X., and Lin L.. 2019. Look into person: Joint body parsing pose estimation network and a new benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence 41 (2019), 871885.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Liang X., Lin L., Wei Y., Shen X., Yang J., and Yan S.. 2018. Proposal-free network for instance-level object segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (2018), 29782991.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Liang X., Lin L., Yang W., Luo P., Huang J., and Yan S.. 2016. Clothes co-parsing via joint image segmentation and labeling with application to clothing retrieval. IEEE Transactions on Multimedia 18 (2016), 11751186.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Liang X., Liu S., Shen X., Yang J., Liu L., Dong J., Lin L., and Yan S.. 2015. Deep human parsing with active template regression. IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (2015), 24022414.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Lin Guosheng, Liu Fayao, Milan Anton, Shen Chunhua, and Reid Ian. 2019. RefineNet: Multi-path refinement networks for dense prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (2019), 12281242.Google ScholarGoogle Scholar
  26. [26] Lin Yutian, Zheng Liang, Zheng Zhedong, Wu Yu, and Yang Yang. 2019. Improving person re-identification by attribute and identity learning. Pattern Recognition 95 (2019), 151161.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Liu S., Feng J., Domokos C., Xu H., Huang J., Hu Z., and Yan S.. 2014. Fashion parsing with weak color-category labels. IEEE Transactions on Multimedia 16 (2014), 253265.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Liu S., Liang X., Liu L., Shen X., Yang J., Xu C., Lin L., Cao Xiaochun, and Yan S.. 2015. Matching-CNN meets KNN: Quasi-parametric human parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 14191427.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Luo Ping, Wang Xiaogang, and Tang Xiaoou. 2013. Pedestrian parsing via deep decompositional network. In Proceedings of IEEE International Conference on Computer Vision. IEEE, 23807504.Google ScholarGoogle Scholar
  30. [30] Luo Yawei, Zheng Zhedong, Zheng Liang, Guan Tao, Yu Junqing, and Yang Yi. 2018. Macro-micro adversarial network for human parsing. In Proceedings of the European Conference on Computer Vision. Springer, Cham, Munich, Germany, 424440.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Nie Xuecheng, Feng Jiashi, and Yan Shuicheng. 2018. Mutual learning to adapt for joint human parsing and pose estimation. In Proceedings of the European Conference on Computer Vision. Springer, Cham, Munich, Germany, 519534.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Peng Chao, Zhang Xiangyu, Yu Gang, Luo Guiming, and Sun Jian. 2017. Large kernel matters–improve semantic segmentation by global convolutional network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 43534361.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Qin Xuebin, Zhang Zichen, Huang Chenyang, Gao Chao, Dehghan Masood, and Jagersand Martin. 2019. BASNet: Boundary-Aware salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 74717481.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Quispe Rodolfo and Pedrini Helio. 2019. Enhanced person re-identification based on saliency and semantic parsing with deep neural network models. Image and Vision Computing 92 (2019), 103809.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Ruan Tao, Liu Ting, Huang Zilong, Wei Yunchao, Wei Shikui, and Zhao Yao. 2019. Devil in the details: Towards accurate single and multiple human parsing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 48144821.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Ruan Tao, Liu Ting, Huang Zilong, Wei Yunchao, Wei Shikui, Zhao Yao, and Huang Thomas. 2019. Devil in the details: Towards accurate single and multiple human parsing. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI, 48144821.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Shahroudy A., Ng T., Yang Q., and Wang G.. 2016. Multimodal multipart learning for action recognition in depth videos. IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (2016), 21232129.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Sutskever Ilya, Martens James, Dahl George, and Hinton Geoffrey. 2013. On the importance of initialization and momentum in deep learning. In Proceedings of International Conference on Machine Learning. PMLR, Atlanta, Georgia, 11391147.Google ScholarGoogle Scholar
  39. [39] Takikawa Towaki, Acuna David, Jampani Varun, and Fidler Sanja. 2019. Gated-SCNN: Gated shape CNNs for semantic segmentation. In Proceedings of IEEE International Conference on Computer Vision. IEEE, 52285237.Google ScholarGoogle Scholar
  40. [40] Wang Wenguan, Zhang Zhijie, Qi Siyuan, Shen Jianbing, Pang Yanwei, and Shao Ling. 2019. Learning compositional neural information fusion for human parsing. In Proceedings of IEEE International Conference on Computer Vision. IEEE, 57025712.Google ScholarGoogle Scholar
  41. [41] Wang W., Zhou T., Qi S., Shen J., and Zhu S. C.. 2021. Hierarchical human semantic parsing with comprehensive part-relation modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence (Early Access) (2021), 11. Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Wang W., Zhu H., Dai J., Pang Y., Shen J., and Shao L.. 2020. Hierarchical human parsing with typed part-relation reasoning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 89268936.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Wang Yang, Tran Duan, Liao Zicheng, and Forsyth David A.. 2012. Discriminative hierarchical part-based models for human parsing and action recognition. Journal of Machine Learning Research 13 (2012), 30753102.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Wu Y., Lin Y., Dong X., Yan Y., Bian W., and Yang Y.. 2019. Progressive learning for person re-identification with one example. IEEE Transactions on Image Processing 28 (2019), 28722881.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Xia Fangting, Zhu Jun, Wang Peng, and Yuille Alan L.. 2016. Pose-Guided human parsing by an and/or graph using pose-context features. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI, 36323640.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Xie Saining and Tu Zhuowen. 2017. Holistically-nested edge detection. International Journal of Computer Vision 125 (2017), 318.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Xiong Yuwen, Liao Renjie, Zhao Hengshuang, Hu Rui, Bai Min, Yumer Ersin, and Urtasun Raquel. 2019. UPSNet: A unified panoptic segmentation network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 88108818.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Yamaguchi K., Kiapour M. H., Ortiz L. E., and Berg T. L.. 2015. Retrieving similar styles to parse clothing. IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (2015), 10281040.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Yu Zhiding, Feng Chen, Liu Ming-Yu, and Ramalingam Srikumar. 2017. CASENet: Deep category-aware semantic edge detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 17611770.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Zhang Xiaomei, Chen Y., Zhu B., Wang Jinqiao, and Tang Ming. 2020. Blended grammar network for human parsing. In Proceedings of the European Conference on Computer Vision.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Zhang X., Chen Y., Zhu B., Wang J., and Tang M.. 2020. Part-aware context network for human parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 89688977.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Zhao H., Shi J., Qi X., Wang X., and Jia J.. 2017. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 62306239.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Zhao J., Li J., Nie X., Zhao F., Chen Y., Wang Z., Feng J., and Yan S.. 2017. Self-Supervised neural aggregation networks for human parsing. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’17). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Zhao Ting and Wu Xiangqian. 2019. Pyramid feature attention network for saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 30803089.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Zhu Bingke, Chen Yingying, Tang Ming, and Wang Jinqiao. 2018. Progressive cognitive human parsing. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Zhu Xizhou, Hu Han, Lin Stephen, and Dai Jifeng. 2019. Deformable ConvNets V2: More deformable, better results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 93009308.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Mask-Guided Deformation Adaptive Network for Human Parsing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 1
        January 2022
        517 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3505205
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 March 2022
        • Accepted: 1 May 2021
        • Revised: 1 March 2021
        • Received: 1 August 2020
        Published in tomm Volume 18, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)267
        • Downloads (Last 6 weeks)13

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!