skip to main content
research-article
Open Access

Multi-human Parsing with a Graph-based Generative Adversarial Model

Published:16 April 2021Publication History
Skip Abstract Section

Abstract

Human parsing is an important task in human-centric image understanding in computer vision and multimedia systems. However, most existing works on human parsing mainly tackle the single-person scenario, which deviates from real-world applications where multiple persons are present simultaneously with interaction and occlusion. To address such a challenging multi-human parsing problem, we introduce a novel multi-human parsing model named MH-Parser, which uses a graph-based generative adversarial model to address the challenges of close-person interaction and occlusion in multi-human parsing. To validate the effectiveness of the new model, we collect a new dataset named Multi-Human Parsing (MHP), which contains multiple persons with intensive person interaction and entanglement. Experiments on the new MHP dataset and existing datasets demonstrate that the proposed method is effective in addressing the multi-human parsing problem compared with existing solutions in the literature.

References

  1. Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Süsstrunk. 2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 11 (2012), 2274--2282.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Pablo Arbelaez, Michael Maire, Charless Fowlkes, and Jitendra Malik. 2011. Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33, 5 (2011), 898–916.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning (ICML’17). 214–223.Google ScholarGoogle Scholar
  4. Anurag Arnab, Sadeep Jayasumana, Shuai Zheng, and Philip H. S. Torr. 2016. Higher order conditional random fields in deep neural networks. In Proceedings of the European Conference on Computer Vision. Springer, 524–540.Google ScholarGoogle Scholar
  5. Anurag Arnab and Philip H. S. Torr. 2017. Pixelwise instance segmentation with a dynamically instantiated network. arXiv preprint arXiv:1704.02386 (2017).Google ScholarGoogle Scholar
  6. Aleksandar Bojchevski, Oleksandr Shchur, Daniel Zügner, and Stephan Günnemann. 2018. NetGAN: Generating graphs via random walks. arXiv preprint arXiv:1803.00816 (2018).Google ScholarGoogle Scholar
  7. Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7291–7299.Google ScholarGoogle ScholarCross RefCross Ref
  8. Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2016. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv preprint arXiv:1606.00915 (2016).Google ScholarGoogle Scholar
  9. Xianjie Chen, Roozbeh Mottaghi, Xiaobai Liu, Sanja Fidler, Raquel Urtasun, and Alan Yuille. 2014. Detect what you can: Detecting and representing objects using holistic models and body parts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 1971–1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Xiao Chu, Wanli Ouyang, Wei Yang, and Xiaogang Wang. 2015. Multi-task recurrent neural network for immediacy prediction. In Proceedings of the IEEE International Conference on Computer Vision. 3352–3360.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jifeng Dai, Kaiming He, and Jian Sun. 2016. Instance-aware semantic segmentation via multi-task network cascades. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 3150–3158.Google ScholarGoogle ScholarCross RefCross Ref
  12. Bert De Brabandere, Davy Neven, and Luc Van Gool. 2017. Semantic instance segmentation with a discriminative loss function. arXiv preprint arXiv:1708.02551 (2017).Google ScholarGoogle Scholar
  13. Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the International Conference on Advances in Neural Information Processing Systems (NIPS’16). 3844–3852.Google ScholarGoogle Scholar
  14. Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2015. The PASCAL visual object classes challenge: A retrospective. Int. J. Comput. Vis. 111, 1 (2015), 98–136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Vittorio Ferrari, Manuel Marin-Jimenez, and Andrew Zisserman. 2008. Progressive search space reduction for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’08). 1–8.Google ScholarGoogle ScholarCross RefCross Ref
  16. Raghudeep Gadde, Varun Jampani, Martin Kiefel, Daniel Kappler, and Peter V. Gehler. 2016. Superpixel convolutional networks using bilateral inceptions. In Proceedings of the European Conference on Computer Vision. Springer, 597–613.Google ScholarGoogle Scholar
  17. Chuang Gan, Ming Lin, Yi Yang, Gerard de Melo, and Alexander G. Hauptmann. 2016. Concepts not alone: Exploring pairwise relationships for zero-shot video activity recognition. In Proceedings of the Association for the Advance of Artificial Intelligence Conference on Artificial Intelligence (AAAI’16). 3487.Google ScholarGoogle Scholar
  18. Ke Gong, Xiaodan Liang, Xiaohui Shen, and Liang Lin. 2017. Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. arXiv preprint arXiv:1703.05446 (2017).Google ScholarGoogle Scholar
  19. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the International Conference on Advances in Neural Information Processing Systems (NIPS’14). 2672–2680.Google ScholarGoogle Scholar
  20. Bharath Hariharan, Pablo Arbeláez, Ross Girshick, and Jitendra Malik. 2014. Simultaneous detection and segmentation. In Proceedings of the European Conference on Computer Vision (ECCV’14). 297–312.Google ScholarGoogle ScholarCross RefCross Ref
  21. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, 2980–2988.Google ScholarGoogle Scholar
  22. Rui Huang, Shu Zhang, Tianyu Li, and Ran He. 2017. Beyond face rotation: Global and local perception GAN for photorealistic and identity preserving frontal view synthesis. arXiv preprint arXiv:1704.04086 (2017).Google ScholarGoogle Scholar
  23. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2016. Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004 (2016).Google ScholarGoogle Scholar
  24. Hao Jiang and Kristen Grauman. 2017. Detangling people: Individuating multiple close people and their body parts via region assembly. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6021–6029.Google ScholarGoogle ScholarCross RefCross Ref
  25. Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  26. Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google ScholarGoogle Scholar
  27. Thomas N. Kipf and Max Welling. 2016. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 (2016).Google ScholarGoogle Scholar
  28. Pushmeet Kohli, Philip H. S. Torr, et al. 2009. Robust higher order potentials for enforcing label consistency. Int. J. Comput. Vis. 82, 3 (2009), 302–324.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Philipp Krähenbühl and Vladlen Koltun. 2011. Efficient inference in fully connected CRFs with Gaussian edge potentials. In Proceedings of the International Conference on Advances in Neural Information Processing Systems (NIPS’11). 109–117.Google ScholarGoogle Scholar
  30. John Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the International Conference on Machine Learning (ICML’01).Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jianshu Li, Jian Zhao, Yunpeng Chen, Sujoy Roy, Shuicheng Yan, Jiashi Feng, and Terence Sim. 2018. Multi-human parsing machines. In Proceedings of the 26th ACM International Conference on Multimedia. 45–53.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Qizhu Li, Anurag Arnab, and Philip H. S. Torr. 2017. Holistic, instance-level human parsing. arXiv preprint arXiv:1709.03612 (2017).Google ScholarGoogle Scholar
  33. Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, and Yichen Wei. 2016. Fully convolutional instance-aware semantic segmentation. arXiv preprint arXiv:1611.07709 (2016).Google ScholarGoogle Scholar
  34. Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2015. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015).Google ScholarGoogle Scholar
  35. Xiaodan Liang, Si Liu, Xiaohui Shen, Jianchao Yang, Luoqi Liu, Jian Dong, Liang Lin, and Shuicheng Yan. 2015. Deep human parsing with active template regression. IEEE Trans. Pattern Anal. Mach. Intell. 37, 12 (2015), 2402–2414.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Xiaodan Liang, Xiaohui Shen, Donglai Xiang, Jiashi Feng, Liang Lin, and Shuicheng Yan. 2016. Semantic object parsing with local-global long short-term memory. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 3185–3193.Google ScholarGoogle ScholarCross RefCross Ref
  37. Xiaodan Liang, Yunchao Wei, Xiaohui Shen, Jianchao Yang, Liang Lin, and Shuicheng Yan. 2015. Proposal-free network for instance-level object segmentation. arXiv preprint arXiv:1509.02636 (2015).Google ScholarGoogle Scholar
  38. Xiaodan Liang, Chunyan Xu, Xiaohui Shen, Jianchao Yang, Si Liu, Jinhui Tang, Liang Lin, and Shuicheng Yan. 2015. Human parsing with contextualized convolutional neural network. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). 1386–1394.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740–755.Google ScholarGoogle Scholar
  40. Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130 (2017).Google ScholarGoogle Scholar
  41. Si Liu, Xiaodan Liang, Luoqi Liu, Xiaohui Shen, Jianchao Yang, Changsheng Xu, Liang Lin, Xiaochun Cao, and Shuicheng Yan. 2015. Matching-CNN meets KNN: Quasi-parametric human parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1419–1427.Google ScholarGoogle ScholarCross RefCross Ref
  42. Pauline Luc, Camille Couprie, Soumith Chintala, and Jakob Verbeek. 2016. Semantic segmentation using adversarial networks. arXiv preprint arXiv:1611.08408 (2016).Google ScholarGoogle Scholar
  43. Franco Manessi, Alessandro Rozza, and Mario Manzo. 2017. Dynamic graph convolutional networks. arXiv preprint arXiv:1704.06199 (2017).Google ScholarGoogle Scholar
  44. Davy Neven, Bert De Brabandere, Stamatios Georgoulis, Marc Proesmans, and Luc Van Gool. 2017. Fast scene understanding for autonomous driving. arXiv preprint arXiv:1708.02550 (2017).Google ScholarGoogle Scholar
  45. Alejandro Newell, Zhiao Huang, and Jia Deng. 2016. Associative embedding: End-to-end learning for joint detection and grouping. arXiv preprint arXiv:1611.05424 (2016).Google ScholarGoogle Scholar
  46. Zhang Ning, Paluri Manohar, Taigman Yaniv, Fergus Rob, and Bourdev Lubomir. 2015. Beyond frontal faces: Improving person recognition using multiple cues. arXiv preprint arXiv:1501.05703 (2015).Google ScholarGoogle Scholar
  47. Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).Google ScholarGoogle Scholar
  48. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the International Conference on Advances in Neural Information Processing Systems (NIPS’15). 91–99.Google ScholarGoogle Scholar
  49. Girshick Ross, Radosavovic Ilija, Gkioxari Georgia, Dollár Piotr, and He Kaiming. 2018. Detectron. Retrieved from: https://github.com/facebookresearch/detectron.Google ScholarGoogle Scholar
  50. Chris Russell, Pushmeet Kohli, Philip H. S. Torr, et al. 2009. Associative hierarchical CRFs for object class image segmentation. In Proceedings of the IEEE 12th International Conference on Computer Vision (ICCV’06). IEEE, 739–746.Google ScholarGoogle Scholar
  51. Benjamin Sapp and Ben Taskar. 2013. MODEC: Multimodal decomposable models for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13).Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Vibhav Vineet, Jonathan Warrell, Lubor Ladicky, and Philip H. S. Torr. 2011. Human instance segmentation from video using detector-based conditional random fields. In Proceedings of the British Machine Vision Conference (BMVC’11), Vol. 2. 12–15.Google ScholarGoogle Scholar
  53. S. Vichy N. Vishwanathan, Nicol N. Schraudolph, Risi Kondor, and Karsten M. Borgwardt. 2010. Graph kernels. J. Mach. Learn. Res. 11, Apr. (2010), 1201–1242.Google ScholarGoogle Scholar
  54. Hongwei Wang, Jia Wang, Jialin Wang, Miao Zhao, Weinan Zhang, Fuzheng Zhang, Xing Xie, and Minyi Guo. 2018. GraphGAN: Graph representation learning with generative adversarial nets. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  55. Kota Yamaguchi, M. Hadi Kiapour, Luis E. Ortiz, and Tamara L. Berg. 2012. Parsing clothing in fashion photographs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). 3570–3577.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Zhang Zhanpeng, Luo Ping, Chen Change Loy, and Tang Xiaoou. 2016. From facial expression recognition to interpersonal relation prediction. arXiv preprint arXiv:1609.06426v2 (2016).Google ScholarGoogle Scholar
  57. Jian Zhao, Jianshu Li, Yu Cheng, Terence Sim, Shuicheng Yan, and Jiashi Feng. 2018. Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In Proceedings of the 26th ACM International Conference on Multimedia. 792–800.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multi-human Parsing with a Graph-based Generative Adversarial Model

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in

                  Full Access

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader

                  HTML Format

                  View this article in HTML Format .

                  View HTML Format
                  About Cookies On This Site

                  We use cookies to ensure that we give you the best experience on our website.

                  Learn more

                  Got it!