skip to main content
research-article

Pose- and Attribute-consistent Person Image Synthesis

Authors Info & Claims
Published:17 February 2023Publication History
Skip Abstract Section

Abstract

Person Image Synthesis aims at transferring the appearance of the source person image into a target pose. Existing methods cannot handle large pose variations and therefore suffer from two critical problems: (1) synthesis distortion due to the entanglement of pose and appearance information among different body components and (2) failure in preserving original semantics (e.g., the same outfit). In this article, we explicitly address these two problems by proposing a Pose- and Attribute-consistent Person Image Synthesis Network (PAC-GAN). To reduce pose and appearance matching ambiguity, we propose a component-wise transferring model consisting of two stages. The former stage focuses only on synthesizing target poses, while the latter renders target appearances by explicitly transferring the appearance information from the source image to the target image in a component-wise manner. In this way, source-target matching ambiguity is eliminated due to the component-wise disentanglement of pose and appearance synthesis. Second, to maintain attribute consistency, we represent the input image as an attribute vector and impose a high-level semantic constraint using this vector to regularize the target synthesis. Extensive experimental results on the DeepFashion dataset demonstrate the superiority of our method over the state of the art, especially for maintaining pose and attribute consistencies under large pose variations.

REFERENCES

  1. [1] Balakrishnan Guha, Zhao Amy, Dalca Adrian V., Durand Fredo, and Guttag John. 2018. Synthesizing images of humans in unseen poses. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 83408348.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Cao Zhe, Simon Tomas, Wei Shih-En, and Sheikh Yaser. 2017. Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 72917299.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Dong Haoye, Liang Xiaodan, Gong Ke, Lai Hanjiang, Zhu Jia, and Yin Jian. 2018. Soft-gated warping-GAN for pose-guided person image synthesis. In Advances in neural information processing systems (NeurIPS).Google ScholarGoogle Scholar
  4. [4] Dong Haoye, Liang Xiaodan, Shen Xiaohui, Wu Bowen, Chen Bing-Cheng, and Yin Jian. 2019. FW-GAN: Flow-navigated warping GAN for video virtual try-on. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 11611170.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Esser Patrick, Sutter Ekaterina, and Ommer Björn. 2018. A variational u-net for conditional appearance and shape generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 88578866.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Gao Chen, Liu Si, He Ran, Yan Shuicheng, and Li Bo. 2020. Recapture as you want. arXiv preprint arXiv:2006.01435.Google ScholarGoogle Scholar
  7. [7] Ge Pu, Huang Qiushi, Xiang Wei, Jing Xue, Li Yule, Li Yiyong, and Sun Zhun. 2021. Focus and retain: Complement the broken pose in human image synthesis. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 33703379.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Gong Ke, Liang Xiaodan, Zhang Dongyu, Shen Xiaohui, and Lin Liang. 2017. Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 932940.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 770778.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Heusel Martin, Ramsauer Hubert, Unterthiner Thomas, Nessler Bernhard, and Hochreiter Sepp. 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in neural information processing systems (NeurIPS).Google ScholarGoogle Scholar
  11. [11] Horn Berthold K. P. and Schunck Brian G.. 1981. Determining optical flow. Artificial Intelligence 17, 1–3 (1981), 185203.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Huang Siyu, Xiong Haoyi, Cheng Zhi-Qi, Wang Qingzhong, Zhou Xingran, Wen Bihan, Huan Jun, and Dou Dejing. 2020. Generating person images with appearance-aware pose stylizer. arXiv preprint arXiv:2007.09077.Google ScholarGoogle Scholar
  13. [13] Huang Xun and Belongie Serge. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 15011510.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Isola Phillip, Zhu Jun-Yan, Zhou Tinghui, and Efros Alexei A.. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11251134.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Jetchev Nikolay and Bergmann Urs. 2017. The conditional analogy GAN: Swapping fashion articles on people images. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops (ICCV Workshops). 22872292.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Jia Jian, Huang Houjing, Yang Wenjie, Chen Xiaotang, and Huang Kaiqi. 2020. Rethinking of pedestrian attribute recognition: Realistic datasets with efficient method. arXiv preprint arXiv:2005.11909.Google ScholarGoogle Scholar
  17. [17] Karras Tero, Laine Samuli, and Aila Timo. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 44014410.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.Google ScholarGoogle Scholar
  19. [19] Kingma Diederik P. and Welling Max. 2013. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114.Google ScholarGoogle Scholar
  20. [20] Lathuilière Stéphane, Sangineto Enver, Siarohin Aliaksandr, and Sebe Nicu. 2020. Attention-based fusion for multi-source human image generation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 439448.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Li Dangwei, Zhang Zhang, Chen Xiaotang, Ling Haibin, and Huang Kaiqi. 2016. A richly annotated dataset for pedestrian attribute recognition. arXiv preprint arXiv:1603.07054.Google ScholarGoogle Scholar
  22. [22] Li Kun, Zhang Jinsong, Liu Yebin, Lai Yu-Kun, and Dai Qionghai. 2020. PoNA: Pose-guided non-local attention for human pose transfer. IEEE Transactions on Image Processing 29 (2020), 95849599.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Li Yining, Huang Chen, and Loy Chen Change. 2019. Dense intrinsic appearance flow for human pose transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 36933702.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Liu Wen, Piao Zhixin, Min Jie, Luo Wenhan, Ma Lin, and Gao Shenghua. 2019. Liquid warping GAN: A unified framework for human motion imitation, appearance transfer and novel view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 59045913.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Liu Ziwei, Luo Ping, Qiu Shi, Wang Xiaogang, and Tang Xiaoou. 2016. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10961104.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Long Jonathan, Shelhamer Evan, and Darrell Trevor. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 34313440.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Ma Liqian, Jia Xu, Sun Qianru, Schiele Bernt, Tuytelaars Tinne, and Gool Luc Van. 2017. Pose guided person image generation. In Advances in neural information processing systems (NeurIPS).Google ScholarGoogle Scholar
  28. [28] Ma Liqian, Sun Qianru, Georgoulis Stamatios, Gool Luc Van, Schiele Bernt, and Fritz Mario. 2018. Disentangled person image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 99108.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Men Yifang, Mao Yiming, Jiang Yuning, Ma Wei-Ying, and Lian Zhouhui. 2020. Controllable person image synthesis with attribute-decomposed GAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 50845093.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Mirza Mehdi and Osindero Simon. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784.Google ScholarGoogle Scholar
  31. [31] Neverova Natalia, Guler Riza Alp, and Kokkinos Iasonas. 2018. Dense pose transfer. In Proceedings of the European Conference on Computer Vision (ECCV). 123138.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Park Taesung, Liu Ming-Yu, Wang Ting-Chun, and Zhu Jun-Yan. 2019. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 23372346.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Paszke Adam, Gross Sam, Massa Francisco, Lerer Adam, Bradbury James, Chanan Gregory, Killeen Trevor, Lin Zeming, Gimelshein Natalia, Antiga Luca, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems (NeurIPS).Google ScholarGoogle Scholar
  34. [34] Pumarola Albert, Agudo Antonio, Sanfeliu Alberto, and Moreno-Noguer Francesc. 2018. Unsupervised person image synthesis in arbitrary poses. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 86208628.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Ren Yurui, Yu Xiaoming, Chen Junming, Li Thomas H., and Li Ge. 2020. Deep image spatial transformation for person image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 76907699.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Ronneberger Olaf, Fischer Philipp, and Brox Thomas. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (MICCAI). 234241.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Russakovsky Olga, Deng Jia, Su Hao, Krause Jonathan, Satheesh Sanjeev, Ma Sean, Huang Zhiheng, Karpathy Andrej, Khosla Aditya, Bernstein Michael, et al. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Salimans Tim, Goodfellow Ian, Zaremba Wojciech, Cheung Vicki, Radford Alec, and Chen Xi. 2016. Improved techniques for training GANs. In Advances in neural information processing systems (NeurIPS).Google ScholarGoogle Scholar
  39. [39] Siarohin Aliaksandr, Sangineto Enver, Lathuiliere Stéphane, and Sebe Nicu. 2018. Deformable GANs for pose-based human image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 34083416.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.Google ScholarGoogle Scholar
  41. [41] Song Sijie, Zhang Wei, Liu Jiaying, and Mei Tao. 2019. Unsupervised person image generation with semantic parsing transformation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 23572366.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Tang Hao, Bai Song, Torr Philip H. S., and Sebe Nicu. 2020. Bipartite graph reasoning GANs for person image generation. arXiv preprint arXiv:2008.04381.Google ScholarGoogle Scholar
  43. [43] Tang Hao, Bai Song, Zhang Li, Torr Philip H. S., and Sebe Nicu. 2020. Xinggan for person image generation. In Proceedings of the European Conference on Computer Vision (ECCV). 717734.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Tang Hao, Xu Dan, Liu Gaowen, Wang Wei, Sebe Nicu, and Yan Yan. 2019. Cycle in cycle generative adversarial networks for keypoint-guided image generation. In Proceedings of the 27th ACM international conference on multimedia (ACM MM). 20522060.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Walker Jacob, Marino Kenneth, Gupta Abhinav, and Hebert Martial. 2017. The pose knows: Video forecasting by generating pose futures. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 33323341.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Wang Ting-Chun, Liu Ming-Yu, Zhu Jun-Yan, Tao Andrew, Kautz Jan, and Catanzaro Bryan. 2018. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 87988807.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Wang Zhou, Bovik Alan C., Sheikh Hamid R., and Simoncelli Eero P.. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13 (2004), 600612.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Yang Lingbo, Wang Pan, Zhang Xinfeng, Wang Shanshe, Gao Zhanning, Ren Peiran, Xie Xuansong, Ma Siwei, and Gao Wen. 2020. Region-adaptive texture enhancement for detailed person image synthesis. In IEEE International Conference on Multimedia and Expo (ICME). 16.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Yue Huanjing, Yang Jingyu, Sun Xiaoyan, Wu Feng, and Hou Chunping. 2017. Contrast enhancement based on intrinsic image decomposition. IEEE Transactions on Image Processing 26 (2017), 39813994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Zhang Richard, Isola Phillip, Efros Alexei A., Shechtman Eli, and Wang Oliver. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 586595.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Zhou Qin, Fan Heng, Yang Hua, Su Hang, Zheng Shibao, Wu Shuang, and Ling Haibin. 2019. Robust and efficient graph correspondence transfer for person re-identification. IEEE Transactions on Image Processing 30 (2019), 16231638.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Zhou Tinghui, Tulsiani Shubham, Sun Weilun, Malik Jitendra, and Efros Alexei A.. 2016. View synthesis by appearance flow. In Proceedings of the European Conference on Computer Vision (ECCV). 286301.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Zhu Jun-Yan, Park Taesung, Isola Phillip, and Efros Alexei A.. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 22232232.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Zhu Peihao, Abdal Rameen, Qin Yipeng, and Wonka Peter. 2020. Sean: Image synthesis with semantic region-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 51045113.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Zhu Zhen, Huang Tengteng, Shi Baoguang, Yu Miao, Wang Bofei, and Bai Xiang. 2019. Progressive pose attention transfer for person image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 23472356.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Pose- and Attribute-consistent Person Image Synthesis

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 2s
        April 2023
        545 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3572861
        • Editor:
        • Abdulmotaleb El Saddik
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 February 2023
        • Online AM: 4 August 2022
        • Accepted: 19 July 2022
        • Revised: 8 July 2022
        • Received: 9 November 2021
        Published in tomm Volume 19, Issue 2s

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
      • Article Metrics

        • Downloads (Last 12 months)296
        • Downloads (Last 6 weeks)20

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!