Abstract
Person Image Synthesis aims at transferring the appearance of the source person image into a target pose. Existing methods cannot handle large pose variations and therefore suffer from two critical problems: (1) synthesis distortion due to the entanglement of pose and appearance information among different body components and (2) failure in preserving original semantics (e.g., the same outfit). In this article, we explicitly address these two problems by proposing a Pose- and Attribute-consistent Person Image Synthesis Network (PAC-GAN). To reduce pose and appearance matching ambiguity, we propose a component-wise transferring model consisting of two stages. The former stage focuses only on synthesizing target poses, while the latter renders target appearances by explicitly transferring the appearance information from the source image to the target image in a component-wise manner. In this way, source-target matching ambiguity is eliminated due to the component-wise disentanglement of pose and appearance synthesis. Second, to maintain attribute consistency, we represent the input image as an attribute vector and impose a high-level semantic constraint using this vector to regularize the target synthesis. Extensive experimental results on the DeepFashion dataset demonstrate the superiority of our method over the state of the art, especially for maintaining pose and attribute consistencies under large pose variations.
- [1] . 2018. Synthesizing images of humans in unseen poses. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8340–8348.Google Scholar
Cross Ref
- [2] . 2017. Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7291–7299.Google Scholar
Cross Ref
- [3] . 2018. Soft-gated warping-GAN for pose-guided person image synthesis. In Advances in neural information processing systems (NeurIPS).Google Scholar
- [4] . 2019. FW-GAN: Flow-navigated warping GAN for video virtual try-on. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 1161–1170.Google Scholar
Cross Ref
- [5] . 2018. A variational u-net for conditional appearance and shape generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8857–8866.Google Scholar
Cross Ref
- [6] . 2020. Recapture as you want. arXiv preprint arXiv:2006.01435.Google Scholar
- [7] . 2021. Focus and retain: Complement the broken pose in human image synthesis. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 3370–3379.Google Scholar
Cross Ref
- [8] . 2017. Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 932–940.Google Scholar
Cross Ref
- [9] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 770–778.Google Scholar
Cross Ref
- [10] . 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in neural information processing systems (NeurIPS).Google Scholar
- [11] . 1981. Determining optical flow. Artificial Intelligence 17, 1–3 (1981), 185–203.Google Scholar
Digital Library
- [12] . 2020. Generating person images with appearance-aware pose stylizer. arXiv preprint arXiv:2007.09077.Google Scholar
- [13] . 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 1501–1510.Google Scholar
Cross Ref
- [14] . 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1125–1134.Google Scholar
Cross Ref
- [15] . 2017. The conditional analogy GAN: Swapping fashion articles on people images. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops (ICCV Workshops). 2287–2292.Google Scholar
Cross Ref
- [16] . 2020. Rethinking of pedestrian attribute recognition: Realistic datasets with efficient method. arXiv preprint arXiv:2005.11909.Google Scholar
- [17] . 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4401–4410.Google Scholar
Cross Ref
- [18] . 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.Google Scholar
- [19] . 2013. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114.Google Scholar
- [20] . 2020. Attention-based fusion for multi-source human image generation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 439–448.Google Scholar
Cross Ref
- [21] . 2016. A richly annotated dataset for pedestrian attribute recognition. arXiv preprint arXiv:1603.07054.Google Scholar
- [22] . 2020. PoNA: Pose-guided non-local attention for human pose transfer. IEEE Transactions on Image Processing 29 (2020), 9584–9599.Google Scholar
Cross Ref
- [23] . 2019. Dense intrinsic appearance flow for human pose transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3693–3702.Google Scholar
Cross Ref
- [24] . 2019. Liquid warping GAN: A unified framework for human motion imitation, appearance transfer and novel view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5904–5913.Google Scholar
Cross Ref
- [25] . 2016. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1096–1104.Google Scholar
Cross Ref
- [26] . 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3431–3440.Google Scholar
Cross Ref
- [27] . 2017. Pose guided person image generation. In Advances in neural information processing systems (NeurIPS).Google Scholar
- [28] . 2018. Disentangled person image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 99–108.Google Scholar
Cross Ref
- [29] . 2020. Controllable person image synthesis with attribute-decomposed GAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5084–5093.Google Scholar
Cross Ref
- [30] . 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784.Google Scholar
- [31] . 2018. Dense pose transfer. In Proceedings of the European Conference on Computer Vision (ECCV). 123–138.Google Scholar
Digital Library
- [32] . 2019. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2337–2346.Google Scholar
Cross Ref
- [33] . 2019. Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems (NeurIPS).Google Scholar
- [34] . 2018. Unsupervised person image synthesis in arbitrary poses. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8620–8628.Google Scholar
Cross Ref
- [35] . 2020. Deep image spatial transformation for person image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7690–7699.Google Scholar
Cross Ref
- [36] . 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (MICCAI). 234–241.Google Scholar
Cross Ref
- [37] . 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211–252.Google Scholar
Digital Library
- [38] . 2016. Improved techniques for training GANs. In Advances in neural information processing systems (NeurIPS).Google Scholar
- [39] . 2018. Deformable GANs for pose-based human image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3408–3416.Google Scholar
Cross Ref
- [40] . 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.Google Scholar
- [41] . 2019. Unsupervised person image generation with semantic parsing transformation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2357–2366.Google Scholar
Cross Ref
- [42] . 2020. Bipartite graph reasoning GANs for person image generation. arXiv preprint arXiv:2008.04381.Google Scholar
- [43] . 2020. Xinggan for person image generation. In Proceedings of the European Conference on Computer Vision (ECCV). 717–734.Google Scholar
Digital Library
- [44] . 2019. Cycle in cycle generative adversarial networks for keypoint-guided image generation. In Proceedings of the 27th ACM international conference on multimedia (ACM MM). 2052–2060.Google Scholar
Digital Library
- [45] . 2017. The pose knows: Video forecasting by generating pose futures. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 3332–3341.Google Scholar
Cross Ref
- [46] . 2018. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8798–8807.Google Scholar
Cross Ref
- [47] . 2004. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13 (2004), 600–612.Google Scholar
Digital Library
- [48] . 2020. Region-adaptive texture enhancement for detailed person image synthesis. In IEEE International Conference on Multimedia and Expo (ICME). 1–6.Google Scholar
Cross Ref
- [49] . 2017. Contrast enhancement based on intrinsic image decomposition. IEEE Transactions on Image Processing 26 (2017), 3981–3994.Google Scholar
Digital Library
- [50] . 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 586–595.Google Scholar
Cross Ref
- [51] . 2019. Robust and efficient graph correspondence transfer for person re-identification. IEEE Transactions on Image Processing 30 (2019), 1623–1638.Google Scholar
Digital Library
- [52] . 2016. View synthesis by appearance flow. In Proceedings of the European Conference on Computer Vision (ECCV). 286–301.Google Scholar
Cross Ref
- [53] . 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2223–2232.Google Scholar
Cross Ref
- [54] . 2020. Sean: Image synthesis with semantic region-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5104–5113.Google Scholar
Cross Ref
- [55] . 2019. Progressive pose attention transfer for person image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2347–2356.Google Scholar
Cross Ref
Index Terms
Pose- and Attribute-consistent Person Image Synthesis
Recommendations
TIPS: Text-Induced Pose Synthesis
Computer Vision – ECCV 2022AbstractIn computer vision, human pose synthesis and transfer deal with probabilistic image generation of a person in a previously unseen pose from an already available observation of that person. Though researchers have recently proposed several methods ...
Dense Pose Transfer
Computer Vision – ECCV 2018AbstractIn this work we integrate ideas from surface-based modeling with neural synthesis: we propose a combination of surface-based pose estimation and deep generative models that allows us to perform accurate pose transfer, i.e. synthesize a new image ...
High-capacity coverless image steganographic scheme based on image synthesis
AbstractIn this paper, a coverless image steganographic scheme with high capacity is proposed. It can hide a color secret image into a color stego image with the same size. Furthermore, the stego image is synthesized without the aid of any ...
Highlights- The proposed scheme can synthesize stego images without the aid of cover images.






Comments