skip to main content
research-article

Distribution Aligned Multimodal and Multi-domain Image Stylization

Authors Info & Claims
Published:22 July 2021Publication History
Skip Abstract Section

Abstract

Multimodal and multi-domain stylization are two important problems in the field of image style transfer. Currently, there are few methods that can perform multimodal and multi-domain stylization simultaneously. In this study, we propose a unified framework for multimodal and multi-domain style transfer with the support of both exemplar-based reference and randomly sampled guidance. The key component of our method is a novel style distribution alignment module that eliminates the explicit distribution gaps between various style domains and reduces the risk of mode collapse. The multimodal diversity is ensured by either guidance from multiple images or random style codes, while the multi-domain controllability is directly achieved by using a domain label. We validate our proposed framework on painting style transfer with various artistic styles and genres. Qualitative and quantitative comparisons with state-of-the-art methods demonstrate that our method can generate high-quality results of multi-domain styles and multimodal instances from reference style guidance or a random sampled style.

References

  1. Asha Anoosheh, Eirikur Agustsson, Radu Timofte, and Luc Van Gool. 2018. ComboGAN: Unrestrained scalability for image domain translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’18). IEEE, Los Alamitos, CA, 896–8967. DOI:https://doi.org/10.1109/CVPRW.2018.00122Google ScholarGoogle ScholarCross RefCross Ref
  2. Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer Normalization. Retrieved from https://arXiv:stat.ML/1607.06450.Google ScholarGoogle Scholar
  3. Jianmin Bao, Dong Chen, Fang Wen, Houqiang Li, and Gang Hua. 2017. CVAE-GAN: Fine-grained image generation through asymmetric training. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, Los Alamitos, CA, 2764–2773. DOI:https://doi.org/10.1109/ICCV.2017.299Google ScholarGoogle ScholarCross RefCross Ref
  4. Simyung Chang, SeongUk Park, John Yang, and Nojun Kwak. 2019. Sym-parameterized dynamic inference for mixed-domain image translation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19). IEEE, Los Alamitos, CA, 4803–4811. DOI:https://doi.org/10.1109/ICCV.2019.00490Google ScholarGoogle ScholarCross RefCross Ref
  5. Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, Los Alamitos, CA, 8789–8797. DOI:https://doi.org/10.1109/CVPR.2018.00916Google ScholarGoogle ScholarCross RefCross Ref
  6. Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2020. StarGAN v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’20). IEEE, Los Alamitos, CA, 8185–8194. DOI:https://doi.org/10.1109/CVPR42600.2020.00821Google ScholarGoogle ScholarCross RefCross Ref
  7. Yingying Deng, Fan Tang, Weiming Dong, Haibin Huang, Ma Chongyang, and Changsheng Xu. 2021. Arbitrary video style transfer via multi-channel correlation. In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI’21).Google ScholarGoogle Scholar
  8. Yingying Deng, Fan Tang, Weiming Dong, Wen Sun, Feiyue Huang, and Changsheng Xu. 2020. Arbitrary style transfer via multi-adaptation network. In Proceedings of the 28th ACM International Conference on Multimedia (MM’20). ACM, New York, NY, 27192727. DOI:https://doi.org/10.1145/3394171.3414015 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lars Doyle, Forest Anderson, Ehren Choy, and David Mould. 2019. Automated pebble mosaic stylization of images. Comput. Visual Media 5, 1 (2019), 33–44. DOI:https://doi.org/10.1007/s41095-019-0129-0Google ScholarGoogle ScholarCross RefCross Ref
  10. Alexei A. Efros and William T. Freeman. 2001. Image quilting for texture synthesis and transfer. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’01). ACM, New York, NY, 341–346. DOI:https://doi.org/10.1145/383259.383296 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE, Los Alamitos, CA, 2414–2423. DOI:https://doi.org/10.1109/CVPR.2016.265Google ScholarGoogle ScholarCross RefCross Ref
  12. X. Huang and S. Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, Los Alamitos, CA, 1510–1519. DOI:https://doi.org/10.1109/ICCV.2017.167Google ScholarGoogle Scholar
  13. Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. 2018. Multimodal unsupervised image-to-image translation. In Proceedings of the European Conference on Computer Vision (ECCV’18). Springer, Cham, Switzerland, 179–196. DOI:https://doi.org/10.1007/978-3-030-01219-9_11Google ScholarGoogle ScholarCross RefCross Ref
  14. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, Los Alamitos, CA, 5967–5976. DOI:https://doi.org/10.1109/CVPR.2017.632Google ScholarGoogle Scholar
  15. Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. Retrieved from https://arXiv:cs.LG/1412.6980.Google ScholarGoogle Scholar
  16. Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. Retrieved from https://arXiv:stat.ML/1312.6114.Google ScholarGoogle Scholar
  17. Dmytro Kotovenko, Artsiom Sanakoyeu, Sabine Lang, and Bjorn Ommer. 2019. Content and style disentanglement for artistic style transfer. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19). IEEE, Los Alamitos, CA, 4421–4430. DOI:https://doi.org/10.1109/ICCV.2019.00452Google ScholarGoogle ScholarCross RefCross Ref
  18. Dmytro Kotovenko, Artsiom Sanakoyeu, Pingchuan Ma, Sabine Lang, and Björn Ommer. 2019. A content transformation block for image style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, Los Alamitos, CA, 10024–10033. DOI:https://doi.org/10.1109/CVPR.2019.01027Google ScholarGoogle ScholarCross RefCross Ref
  19. Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. 2018. Diverse image-to-image translation via disentangled representations. In Proceedings of the European Conference on Computer Vision (ECCV’18). Springer, Cham, Switzerland, 36–52. DOI:https://doi.org/10.1007/978-3-030-01246-5_3Google ScholarGoogle ScholarCross RefCross Ref
  20. Hsin-Ying Lee, Hung-Yu Tseng, Qi Mao, Jia-Bin Huang, Yu-Ding Lu, Maneesh Singh, and Ming-Hsuan Yang. 2020. DRIT++: diverse image-to-image translation via disentangled representations. Int. J. Comput. Vision 128 (2020), 24022417. DOI:https://doi.org/10.1007/s11263-019-01284-zGoogle ScholarGoogle ScholarCross RefCross Ref
  21. Xueting Li, Sifei Liu, Jan Kautz, and Ming-Hsuan Yang. 2019. Learning linear transformations for fast image and video style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, Los Alamitos, CA, 3804–3812. DOI:https://doi.org/10.1109/CVPR.2019.00393Google ScholarGoogle ScholarCross RefCross Ref
  22. Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, and Ming-Hsuan Yang. 2017. Universal style transfer via feature transforms. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates, Inc., Red Hook, NY, 385395. DOI:https://doi.org/10.5555/3294771.3294808 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Alexander H. Liu, Yen-Cheng Liu, Yu-Ying Yeh, and Yu-Chiang Frank Wang. 2018. A unified feature disentangler for multi-domain image translation and manipulation. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS’18). Curran Associates, Red Hook, NY, 2590–2599. http://papers.nips.cc/paper/7525-a-unified-feature-disentangler-for-multi-domain-image-translation-and-manipulation.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ming-Yu Liu, Thomas Breuel, and Jan Kautz. 2017. Unsupervised image-to-image translation networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates, Red Hook, NY, 700–708. DOI:https://doi.org/10.5555/3294771.3294838 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, and Jan Kautz. 2019. Few-shot unsupervised image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19). IEEE, Los Alamitos, CA, 10550–10559. DOI:https://doi.org/10.1109/ICCV.2019.01065Google ScholarGoogle ScholarCross RefCross Ref
  26. Yahui Liu, Marco De Nadai, Jian Yao, Nicu Sebe, Bruno Lepri, and Xavier Alameda-Pineda. 2020. GMM-UNIT: Unsupervised Multi-Domain and Multi-Modal Image-to-Image Translation via Attribute Gaussian Mixture Modeling. Retrieved from https://arXiv:cs.CV/2003.06788.Google ScholarGoogle Scholar
  27. Liqian Ma, Xu Jia, Stamatios Georgoulis, Tinne Tuytelaars, and Luc Van Gool. 2019. Exemplar guided unsupervised image-to-image translation with semantic consistency. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). Retrieved from https://openreview.net/forum?id=S1lTg3RqYQ.Google ScholarGoogle Scholar
  28. Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey. 2016. Adversarial Autoencoders. Retrieved from https://arXiv:cs.LG/1511.05644.Google ScholarGoogle Scholar
  29. Qi Mao, Hsin-Ying Lee, Hung-Yu Tseng, Siwei Ma, and Ming-Hsuan Yang. 2019. Mode seeking generative adversarial networks for diverse image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, Los Alamitos, CA, 1429–1437. DOI:https://doi.org/10.1109/CVPR.2019.00152Google ScholarGoogle ScholarCross RefCross Ref
  30. Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K. Lau, Zhen Wang, and Stephen Paul Smolley. 2017. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, Los Alamitos, CA, 2813–2821. DOI:https://doi.org/10.1109/ICCV.2017.304Google ScholarGoogle ScholarCross RefCross Ref
  31. Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. Retrieved from https://arXiv:cs.LG/1411.1784.Google ScholarGoogle Scholar
  32. Augustus Odena, Christopher Olah, and Jonathon Shlens. 2017. Conditional image synthesis with auxiliary classifier GANs. In Proceedings of the 34th International Conference on Machine Learning (ICML’17), Vol. 70. JMLR, Cambridge, MA, 2642–2651. DOI:https://doi.org/10.5555/3305890.3305954 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Dae Young Park and Kwang Hee Lee. 2019. Arbitrary style transfer with style-attentional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, Los Alamitos, CA, 5873–5881. DOI:https://doi.org/10.1109/CVPR.2019.00603Google ScholarGoogle ScholarCross RefCross Ref
  34. Andrés Romero, Pablo Arbeláez, Luc Van Gool, and Radu Timofte. 2019. SMIT: Stochastic multi-label image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’19). IEEE, Los Alamitos, CA, 3285–3294. DOI:https://doi.org/10.1109/ICCVW.2019.00410Google ScholarGoogle ScholarCross RefCross Ref
  35. Artsiom Sanakoyeu, Dmytro Kotovenko, Sabine Lang, and Björn Ommer. 2018. A style-aware content loss for real-time HD style transfer. In Proceedings of the European Conference on Computer Vision (ECCV’18). Springer, Cham, Switzerland, 715–731. DOI:https://doi.org/10.1007/978-3-030-01237-3_43Google ScholarGoogle ScholarCross RefCross Ref
  36. Lu Sheng, Ziyi Lin, Jing Shao, and Xiaogang Wang. 2018. Avatar-net: Multi-scale zero-shot style transfer by feature decoration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, Los Alamitos, CA, 8242–8250. DOI:https://doi.org/10.1109/CVPR.2018.00860Google ScholarGoogle ScholarCross RefCross Ref
  37. Kihyuk Sohn, Xinchen Yan, and Honglak Lee. 2015. Learning structured output representation using deep conditional generative models. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS’15), Vol. 2. MIT Press, Cambridge, MA, 34833491. DOI:https://doi.org/10.5555/2969442.2969628 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE, Los Alamitos, CA, 2818–2826. DOI:https://doi.org/10.1109/CVPR.2016.308Google ScholarGoogle ScholarCross RefCross Ref
  39. Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. 2017. Instance Normalization: The Missing Ingredient for Fast Stylization. Retrieved from https://arXiv:cs.CV/1607.08022.Google ScholarGoogle Scholar
  40. Fu-En Yang, Jing-Cheng Chang, Chung-Chi Tsai, and Yu-Chiang Frank Wang. 2020. A multi-domain and multi-modal representation disentangler for cross-domain image manipulation and classification. IEEE Trans. Image Process. 29 (2020), 2795–2807. DOI:https://doi.org/10.1109/TIP.2019.2952707Google ScholarGoogle ScholarCross RefCross Ref
  41. Yuan Yao, Jianqiang Ren, Xuansong Xie, Weidong Liu, Yong-Jin Liu, and Jun Wang. 2019. Attention-aware multi-stroke style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, Los Alamitos, CA, 1467–1475. DOI:https://doi.org/10.1109/CVPR.2019.00156Google ScholarGoogle ScholarCross RefCross Ref
  42. Xiaoming Yu, Yuanqi Chen, Shan Liu, Thomas Li, and Ge Li. 2019. Multi-mapping Image-to-image translation via learning disentanglement. In Proceedings of the 33rd International Conference on Neural Information Processing Systems (NIPS’19). Curran Associates, Red Hook, NY, 2994–3004. Retrieved from http://papers.neurips.cc/paper/8564-multi-mapping-image-to-image-translation-via-learning-disentanglement.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, Los Alamitos, CA, 586–595. DOI:https://doi.org/10.1109/CVPR.2018.00068Google ScholarGoogle ScholarCross RefCross Ref
  44. Shuyang Zhang, Runze Liang, and Miao Wang. 2019. ShadowGAN: Shadow synthesis for virtual objects with conditional adversarial networks. Comput. Visual Media 5, 1 (2019), 105–115. DOI:https://doi.org/10.1007/s41095-019-0136-1Google ScholarGoogle ScholarCross RefCross Ref
  45. Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, Los Alamitos, CA, 2242–2251. DOI:https://doi.org/10.1109/ICCV.2017.244Google ScholarGoogle Scholar

Index Terms

  1. Distribution Aligned Multimodal and Multi-domain Image Stylization

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 3
        August 2021
        443 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3476118
        Issue’s Table of Contents

        Copyright © 2021 Association for Computing Machinery.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 July 2021
        • Accepted: 1 February 2021
        • Revised: 1 January 2021
        • Received: 1 September 2020
        Published in tomm Volume 17, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!