skip to main content
research-article

A Multi-Level Consistency Network for High-Fidelity Virtual Try-On

Published:16 March 2023Publication History
Skip Abstract Section

Abstract

The 2D virtual try-on task aims to transfer a target clothing image to the corresponding region of a person image. Although an extensive amount of research has been conducted due to its immense applications, this task still remains a great challenge to handle some complicated issues (e.g., non-rigid shapes, large occlusions and arbitrary poses). To this end, we propose a novel network with structural and textural consistency-preserving mechanism for producing high-fidelity try-on images. Specifically, we first generate the semantic layout of a clothing-agnostic person to obtain the segmentation map, which is used as the transforming conditions of the target clothes. Based on a recurrent network structure, the transform lookup is performed to iteratively update a dense flow. Then, we adopt a thin-plate-spline-based warping method to estimate the coarse offset flow for all key-point positions. Guided by this sparse flow, a multi-scale deformable convolution module is designed to further iteratively predict the fine offsets for densely sampled positions, by which the clothing item and person shape can be accurately aligned. Finally, we develop a refinement module to effectively fuse the global and local features, which can render accurate geometric structures of the body parts and maintain texture sharpness of the clothes. Extensive experiments on benchmark datasets demonstrate that our method outperforms other state-of-the-art methods in terms of quantitative and qualitative try-on results. The code is available on: https://github.com/TJU-WEIHAO/MLCN.

REFERENCES

  1. [1] Brock Andrew, Donahue Jeff, and Simonyan Karen. 2018. Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  2. [2] Cao Zhe, Hidalogo Gines, Simon Tomas, Wei Shih-En, and Sheikh Y.. 2019. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 1 (2019), 172186.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Chen Shuyu, Su Wanchao, Gao Lin, Xia Shihong, and Fu Hongbo. 2020. DeepFaceDrawing: Deep generation of face images from sketches. ACM Transactions on Graphics 39, 4 (2020), 116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Choi Seunghwan, Park Sunghyun, Lee Minsoo, and Choo Jaegul. 2021. VITON-HD: High-resolution virtual try-on via misalignment-aware normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1413114140.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Dowson D. C. and Landau B. V.. 1982. The Frechet distance between multivariate normal distributions. Journal of Multivariate Analysis (1982).Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Duchon. Jean1977. Splines minimizing rotation-invariant seminorms in Sobolev spaces. Constructive Theory of Functions of Several Variables 572 (1977), 85100.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Feng Zunlei, Yu Zhenyun, Jing Yongcheng, Wu Sai, Song Mingli, Yang Yezhou, and Jiang Junxiao. 2019. Interpretable partitioned embedding for intelligent multi-item fashion outfit composition. ACM Trans. Multimedia Comput. Commun. Appl. 15, 2 (2019).Google ScholarGoogle Scholar
  8. [8] Fincato Matteo, Cornia Marcella, Landi Federico, Cesari Fabio, and Cucchiara Rita. 2022. Transform, warp, and dress: A new transformation-guided model for virtual try-On. ACM Trans. Multimedia Comput. Commun. Appl. 18, 2 (2022).Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Gao Xin, Liu Zhenjing, Feng Zunlei, and Shen Chengji. 2021. Shape controllable virtual try-on for underwear models. In ACM Multimedia.Google ScholarGoogle Scholar
  10. [10] Ge Chongjian, Song Yibing, Ge Yuying, and HanYang. 2021. Disentangled cycle consistency for highly-realistic virtual try-on. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1692816938.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Ge Yuying, Song Yibing, Zhang Ruimao, Ge Chongjian, Liu Wei, and Luo Ping. 2021. Parser-free virtual try-on via distilling appearance flows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 84858493.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Gong Ke, Liang Xiaodan, Li Yicheng, Chen Yimin, Yang Ming, and Lin Liang. 2018. Instance-level human parsing via part grouping network. In Proceedings of the European Conference on Computer Vision. 770785.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Goodfellow Ian J., Pouget-Abadir Jean, and Mirza Mehdi. 2014. Generative adversarial networks. In Advances in Neural Information Processing Systems. 26722680.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Gundogdu Erhan, Constantin Victor, Seifoddini Amrollah, Dang Minh, Salzmann Mathieu, and Fua Pascal. 2019. GarNet: A two-stream network for fast and accurate 3D cloth draping. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 87388747.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Han Xintong, Hu Xiaojun, Huang Weilin, and Scott Matthew R.. 2019. ClothFlow: A flow-based model for clothed person generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1047010479.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Han Xintong, Wu Zuxuan, Huang Weilin, and Scott Matthew. 2019. FiNet: Compatible and diverse fashion image inpainting. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 44804490.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Han Xintong, Wu Zuxuan, Wu Zhe, Yu Ruichi, and Davis. Larry S.2018. VITON: An image-based virtual try-on network. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 75437552.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Jandial Surgan, Chopra Ayush, Ayush Kumar, and Hemani Mayur. 2020. SieveNet: A unified framework for robust image-based virtual try-on. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 21712179.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Jetchev Nikolay and Bergmann Urs. 2017. The conditional analogy GAN: Swapping fashion articles on people images. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 22872292.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Karras Tero, Aittala Miika, and Laine Samuli. 2021. Alias-free generative adversarial networks. In Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  21. [21] Karras Tero, Laine Samuli, and Aila Timo. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 43964405.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Lee Hyug Jae, Lee Rokkyu, Kang Minseok, and Cho Myounghoon. 2019. LA-VITON: A network for looking-attractive virtual try-on. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 22872292.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Li Honglin, Mao Xiaoyang, Xu Mengdi, and Jin Xiaogang. 2021. Deep-based self-refined face-top coordination. ACM Trans. Multimedia Comput. Commun. Appl. 17, 3 (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] MatiurRahman Minar, ThanhTuan Thai, and Heejune Ahn. 2020. 3D reconstruction of clothes using a human body model and its application to image-based virtual try-on. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.Google ScholarGoogle Scholar
  25. [25] Minar Matiur Rahman, Tuan Thai Thanh, Ahn Heejune, Rosin Paul, and Lai Yu-Kun. 2020. CP-VTON+: Clothing shapeand texture preserving image-based virtual try-on. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.Google ScholarGoogle Scholar
  26. [26] Neverova Natalia, Culer Riza Alp, and Kokkinos Iasonas. 2018. Dense pose transfer. In European Conference on ComputerVision. 128143.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Park Tarsung, Liu Mingyu, Wang Tingchun, and Zhu Junyan. 2019. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 23322341.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Zhu Tinghui Zhou, Phillip Isola, Junyan, and Efros. Alexei A.2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 59675976.Google ScholarGoogle Scholar
  29. [29] Pons-Moll Gerard, Pujades Sergi, Hu Sonny, and Black Michael J.. 2017. ClothCap: Seamless 4D clothing capture and retargeting. ACM Transactions on Graphics 36, 4 (2017), 115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Raj Amit, Sangkloy Patsorn, Chang Huiwen, Hays James, Ceylan Duygu, and Lu Jingwan. 2018. SwapNet: Image based garment transfer. In European Conference on Computer Vision. 679695.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Reed Scott, Akata Zeynep, Yan Xinchen, Logeswaran Lajanugen, Schiele Bernt, and Lee Honglak. 2016. Generative adversarial text to image synthesis. In 33rd International Conference on Machine Learning. 10601069.Google ScholarGoogle Scholar
  32. [32] Sekine Masahiro, Sugita Kaoru, Perbet Frank, and BjornStengera. 2014. Virtual fitting by single-shot body shape estimation. In International Conference on 3D Body Scanning Technologies.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Simonyan Karen and Zisserman Andrew. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  34. [34] Wang Bochao, Zheng Huabin, Liang Xiaodan, Chen Yimin, Lin Liang, and Yang Meng. 2018. Toward characteristic preserving image-based virtual try-on network. In European Conference on Computer Vision. 607623.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Wang Shengyu, Bau David, and Zhu Junyan. 2021. Sketch your own GAN. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1405014060.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Wang Zhou, Bovik Alan C., Sheikh Hamid R., and Simoncelli Eero P.. 2004. Image quality assessment: From error visibility to structural similarity. International Conference on Learning Representations 13, 4 (2004), 600612.Google ScholarGoogle Scholar
  37. [37] Xu Tao, Zhang Pengchuan, Huang Qiuyuan, and Zhang Han. 2018. AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13161324.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Yang Han, Zhang Ruimao, Guo Xiaobao, Liu Wei, Zuo Wangmeng, and Luo Ping. 2020. Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 78507859.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Yang Xin, Song Xuemeng, Feng Fuli, Wen Haokun, Duan Ling-Yu, and Nie Liqiang. 2021. Attribute-wise explainable fashion compatibility modeling. ACM Trans. Multimedia Comput. Commun. Appl. 17, 1 (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Yu Ruiyun, Wang Xiaoqi, and Xie Xiaohui. 2019. VTNFP: An image-based virtual try-on network with body and clothing feature preservation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1051010619.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Zanfir Mihai, Popa Alin-Ionut, Zanfir Andrei, and Sminchisescu Cristian. 2018. Human appearance transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 53915399.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Zhang Feifei, Xu Mingliang, and Xu Changsheng. 2022. Tell, imagine, and search: End-to-end learning for composing text and image to image retrieval. ACM Trans. Multimedia Comput. Commun. Appl. 18, 2 (2022).Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Zhang Richard, Isola Phillip, Efros Alexei A., Shechtman Eli, and Wang Oliver. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 586595.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Zhou Zhenglong, Shu Bo, Zhuo Shaojie, Deng Xiaoming, Tan Ping, and Lin. Stephen2012. Image-based clothes animation for virtual fitting. In SIGGRAPH Asia 2012 Technical Briefs.Google ScholarGoogle Scholar
  45. [45] Zhu Shizhan, Fidler Sanja, Urtasun Raquel, Lin Dahua, and Loy Chen-Change. 2017. Be your own Prada: Fashion synthesis with structural coherence. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 16891697.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Multi-Level Consistency Network for High-Fidelity Virtual Try-On

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 5
      September 2023
      262 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3585398
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 March 2023
      • Online AM: 19 January 2023
      • Accepted: 12 January 2023
      • Revised: 8 December 2022
      • Received: 9 July 2022
      Published in tomm Volume 19, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)220
      • Downloads (Last 6 weeks)24

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!