Abstract
Multimodal and multi-domain stylization are two important problems in the field of image style transfer. Currently, there are few methods that can perform multimodal and multi-domain stylization simultaneously. In this study, we propose a unified framework for multimodal and multi-domain style transfer with the support of both exemplar-based reference and randomly sampled guidance. The key component of our method is a novel style distribution alignment module that eliminates the explicit distribution gaps between various style domains and reduces the risk of mode collapse. The multimodal diversity is ensured by either guidance from multiple images or random style codes, while the multi-domain controllability is directly achieved by using a domain label. We validate our proposed framework on painting style transfer with various artistic styles and genres. Qualitative and quantitative comparisons with state-of-the-art methods demonstrate that our method can generate high-quality results of multi-domain styles and multimodal instances from reference style guidance or a random sampled style.
- Asha Anoosheh, Eirikur Agustsson, Radu Timofte, and Luc Van Gool. 2018. ComboGAN: Unrestrained scalability for image domain translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’18). IEEE, Los Alamitos, CA, 896–8967. DOI:https://doi.org/10.1109/CVPRW.2018.00122Google Scholar
Cross Ref
- Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer Normalization. Retrieved from https://arXiv:stat.ML/1607.06450.Google Scholar
- Jianmin Bao, Dong Chen, Fang Wen, Houqiang Li, and Gang Hua. 2017. CVAE-GAN: Fine-grained image generation through asymmetric training. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, Los Alamitos, CA, 2764–2773. DOI:https://doi.org/10.1109/ICCV.2017.299Google Scholar
Cross Ref
- Simyung Chang, SeongUk Park, John Yang, and Nojun Kwak. 2019. Sym-parameterized dynamic inference for mixed-domain image translation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19). IEEE, Los Alamitos, CA, 4803–4811. DOI:https://doi.org/10.1109/ICCV.2019.00490Google Scholar
Cross Ref
- Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, Los Alamitos, CA, 8789–8797. DOI:https://doi.org/10.1109/CVPR.2018.00916Google Scholar
Cross Ref
- Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2020. StarGAN v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’20). IEEE, Los Alamitos, CA, 8185–8194. DOI:https://doi.org/10.1109/CVPR42600.2020.00821Google Scholar
Cross Ref
- Yingying Deng, Fan Tang, Weiming Dong, Haibin Huang, Ma Chongyang, and Changsheng Xu. 2021. Arbitrary video style transfer via multi-channel correlation. In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI’21).Google Scholar
- Yingying Deng, Fan Tang, Weiming Dong, Wen Sun, Feiyue Huang, and Changsheng Xu. 2020. Arbitrary style transfer via multi-adaptation network. In Proceedings of the 28th ACM International Conference on Multimedia (MM’20). ACM, New York, NY, 27192727. DOI:https://doi.org/10.1145/3394171.3414015 Google Scholar
Digital Library
- Lars Doyle, Forest Anderson, Ehren Choy, and David Mould. 2019. Automated pebble mosaic stylization of images. Comput. Visual Media 5, 1 (2019), 33–44. DOI:https://doi.org/10.1007/s41095-019-0129-0Google Scholar
Cross Ref
- Alexei A. Efros and William T. Freeman. 2001. Image quilting for texture synthesis and transfer. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’01). ACM, New York, NY, 341–346. DOI:https://doi.org/10.1145/383259.383296 Google Scholar
Digital Library
- Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE, Los Alamitos, CA, 2414–2423. DOI:https://doi.org/10.1109/CVPR.2016.265Google Scholar
Cross Ref
- X. Huang and S. Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, Los Alamitos, CA, 1510–1519. DOI:https://doi.org/10.1109/ICCV.2017.167Google Scholar
- Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. 2018. Multimodal unsupervised image-to-image translation. In Proceedings of the European Conference on Computer Vision (ECCV’18). Springer, Cham, Switzerland, 179–196. DOI:https://doi.org/10.1007/978-3-030-01219-9_11Google Scholar
Cross Ref
- Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, Los Alamitos, CA, 5967–5976. DOI:https://doi.org/10.1109/CVPR.2017.632Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. Retrieved from https://arXiv:cs.LG/1412.6980.Google Scholar
- Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. Retrieved from https://arXiv:stat.ML/1312.6114.Google Scholar
- Dmytro Kotovenko, Artsiom Sanakoyeu, Sabine Lang, and Bjorn Ommer. 2019. Content and style disentanglement for artistic style transfer. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19). IEEE, Los Alamitos, CA, 4421–4430. DOI:https://doi.org/10.1109/ICCV.2019.00452Google Scholar
Cross Ref
- Dmytro Kotovenko, Artsiom Sanakoyeu, Pingchuan Ma, Sabine Lang, and Björn Ommer. 2019. A content transformation block for image style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, Los Alamitos, CA, 10024–10033. DOI:https://doi.org/10.1109/CVPR.2019.01027Google Scholar
Cross Ref
- Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. 2018. Diverse image-to-image translation via disentangled representations. In Proceedings of the European Conference on Computer Vision (ECCV’18). Springer, Cham, Switzerland, 36–52. DOI:https://doi.org/10.1007/978-3-030-01246-5_3Google Scholar
Cross Ref
- Hsin-Ying Lee, Hung-Yu Tseng, Qi Mao, Jia-Bin Huang, Yu-Ding Lu, Maneesh Singh, and Ming-Hsuan Yang. 2020. DRIT++: diverse image-to-image translation via disentangled representations. Int. J. Comput. Vision 128 (2020), 24022417. DOI:https://doi.org/10.1007/s11263-019-01284-zGoogle Scholar
Cross Ref
- Xueting Li, Sifei Liu, Jan Kautz, and Ming-Hsuan Yang. 2019. Learning linear transformations for fast image and video style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, Los Alamitos, CA, 3804–3812. DOI:https://doi.org/10.1109/CVPR.2019.00393Google Scholar
Cross Ref
- Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, and Ming-Hsuan Yang. 2017. Universal style transfer via feature transforms. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates, Inc., Red Hook, NY, 385395. DOI:https://doi.org/10.5555/3294771.3294808 Google Scholar
Digital Library
- Alexander H. Liu, Yen-Cheng Liu, Yu-Ying Yeh, and Yu-Chiang Frank Wang. 2018. A unified feature disentangler for multi-domain image translation and manipulation. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS’18). Curran Associates, Red Hook, NY, 2590–2599. http://papers.nips.cc/paper/7525-a-unified-feature-disentangler-for-multi-domain-image-translation-and-manipulation.pdf. Google Scholar
Digital Library
- Ming-Yu Liu, Thomas Breuel, and Jan Kautz. 2017. Unsupervised image-to-image translation networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates, Red Hook, NY, 700–708. DOI:https://doi.org/10.5555/3294771.3294838 Google Scholar
Digital Library
- Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, and Jan Kautz. 2019. Few-shot unsupervised image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19). IEEE, Los Alamitos, CA, 10550–10559. DOI:https://doi.org/10.1109/ICCV.2019.01065Google Scholar
Cross Ref
- Yahui Liu, Marco De Nadai, Jian Yao, Nicu Sebe, Bruno Lepri, and Xavier Alameda-Pineda. 2020. GMM-UNIT: Unsupervised Multi-Domain and Multi-Modal Image-to-Image Translation via Attribute Gaussian Mixture Modeling. Retrieved from https://arXiv:cs.CV/2003.06788.Google Scholar
- Liqian Ma, Xu Jia, Stamatios Georgoulis, Tinne Tuytelaars, and Luc Van Gool. 2019. Exemplar guided unsupervised image-to-image translation with semantic consistency. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). Retrieved from https://openreview.net/forum?id=S1lTg3RqYQ.Google Scholar
- Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey. 2016. Adversarial Autoencoders. Retrieved from https://arXiv:cs.LG/1511.05644.Google Scholar
- Qi Mao, Hsin-Ying Lee, Hung-Yu Tseng, Siwei Ma, and Ming-Hsuan Yang. 2019. Mode seeking generative adversarial networks for diverse image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, Los Alamitos, CA, 1429–1437. DOI:https://doi.org/10.1109/CVPR.2019.00152Google Scholar
Cross Ref
- Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K. Lau, Zhen Wang, and Stephen Paul Smolley. 2017. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, Los Alamitos, CA, 2813–2821. DOI:https://doi.org/10.1109/ICCV.2017.304Google Scholar
Cross Ref
- Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. Retrieved from https://arXiv:cs.LG/1411.1784.Google Scholar
- Augustus Odena, Christopher Olah, and Jonathon Shlens. 2017. Conditional image synthesis with auxiliary classifier GANs. In Proceedings of the 34th International Conference on Machine Learning (ICML’17), Vol. 70. JMLR, Cambridge, MA, 2642–2651. DOI:https://doi.org/10.5555/3305890.3305954 Google Scholar
Digital Library
- Dae Young Park and Kwang Hee Lee. 2019. Arbitrary style transfer with style-attentional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, Los Alamitos, CA, 5873–5881. DOI:https://doi.org/10.1109/CVPR.2019.00603Google Scholar
Cross Ref
- Andrés Romero, Pablo Arbeláez, Luc Van Gool, and Radu Timofte. 2019. SMIT: Stochastic multi-label image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’19). IEEE, Los Alamitos, CA, 3285–3294. DOI:https://doi.org/10.1109/ICCVW.2019.00410Google Scholar
Cross Ref
- Artsiom Sanakoyeu, Dmytro Kotovenko, Sabine Lang, and Björn Ommer. 2018. A style-aware content loss for real-time HD style transfer. In Proceedings of the European Conference on Computer Vision (ECCV’18). Springer, Cham, Switzerland, 715–731. DOI:https://doi.org/10.1007/978-3-030-01237-3_43Google Scholar
Cross Ref
- Lu Sheng, Ziyi Lin, Jing Shao, and Xiaogang Wang. 2018. Avatar-net: Multi-scale zero-shot style transfer by feature decoration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, Los Alamitos, CA, 8242–8250. DOI:https://doi.org/10.1109/CVPR.2018.00860Google Scholar
Cross Ref
- Kihyuk Sohn, Xinchen Yan, and Honglak Lee. 2015. Learning structured output representation using deep conditional generative models. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS’15), Vol. 2. MIT Press, Cambridge, MA, 34833491. DOI:https://doi.org/10.5555/2969442.2969628 Google Scholar
Digital Library
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE, Los Alamitos, CA, 2818–2826. DOI:https://doi.org/10.1109/CVPR.2016.308Google Scholar
Cross Ref
- Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. 2017. Instance Normalization: The Missing Ingredient for Fast Stylization. Retrieved from https://arXiv:cs.CV/1607.08022.Google Scholar
- Fu-En Yang, Jing-Cheng Chang, Chung-Chi Tsai, and Yu-Chiang Frank Wang. 2020. A multi-domain and multi-modal representation disentangler for cross-domain image manipulation and classification. IEEE Trans. Image Process. 29 (2020), 2795–2807. DOI:https://doi.org/10.1109/TIP.2019.2952707Google Scholar
Cross Ref
- Yuan Yao, Jianqiang Ren, Xuansong Xie, Weidong Liu, Yong-Jin Liu, and Jun Wang. 2019. Attention-aware multi-stroke style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, Los Alamitos, CA, 1467–1475. DOI:https://doi.org/10.1109/CVPR.2019.00156Google Scholar
Cross Ref
- Xiaoming Yu, Yuanqi Chen, Shan Liu, Thomas Li, and Ge Li. 2019. Multi-mapping Image-to-image translation via learning disentanglement. In Proceedings of the 33rd International Conference on Neural Information Processing Systems (NIPS’19). Curran Associates, Red Hook, NY, 2994–3004. Retrieved from http://papers.neurips.cc/paper/8564-multi-mapping-image-to-image-translation-via-learning-disentanglement.pdf. Google Scholar
Digital Library
- Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, Los Alamitos, CA, 586–595. DOI:https://doi.org/10.1109/CVPR.2018.00068Google Scholar
Cross Ref
- Shuyang Zhang, Runze Liang, and Miao Wang. 2019. ShadowGAN: Shadow synthesis for virtual objects with conditional adversarial networks. Comput. Visual Media 5, 1 (2019), 105–115. DOI:https://doi.org/10.1007/s41095-019-0136-1Google Scholar
Cross Ref
- Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, Los Alamitos, CA, 2242–2251. DOI:https://doi.org/10.1109/ICCV.2017.244Google Scholar
Index Terms
Distribution Aligned Multimodal and Multi-domain Image Stylization
Recommendations
Building multi-domain conversational systems from single domain resources
Current advances in the development of mobile and smart devices have generated a growing demand for natural human-machine interaction and favored the intelligent assistant metaphor, in which a single interface gives access to a wide range of ...
v3MFND: A Deep Multi-domain Multimodal Fake News Detection Model for Vietnamese
Intelligent Information and Database SystemsAbstractFake news become a critical problem on the Internet, especially social media. During the worldwide COVID-19 epidemic, social networking sites (SNSs) are primary sources to spread false news, which are incredibly difficult to detect and regulate ...
Towards an intelligent framework for multimodal affective data analysis
An increasingly large amount of multimodal content is posted on social media websites such as YouTube and Facebook everyday. In order to cope with the growth of such so much multimodal data, there is an urgent need to develop an intelligent multi-modal ...






Comments