skip to main content
10.1145/3240508.3240661acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

User-Guided Deep Anime Line Art Colorization with Conditional Adversarial Networks

Published: 15 October 2018 Publication History

Abstract

Scribble colors based line art colorization is a challenging computer vision problem since neither greyscale values nor semantic information is presented in line arts, and the lack of authentic illustration-line art training pairs also increases difficulty of model generalization. Recently, several Generative Adversarial Nets (GANs) based methods have achieved great success. They can generate colorized illustrations conditioned on given line art and color hints. However, these methods fail to capture the authentic illustration distributions and are hence perceptually unsatisfying in the sense that they often lack accurate shading. To address these challenges, we propose a novel deep conditional adversarial architecture for scribble based anime line art colorization. Specifically, we integrate the conditional framework with WGAN-GP criteria as well as the perceptual loss to enable us to robustly train a deep network that makes the synthesized images more natural and real. We also introduce a local features network that is independent of synthetic data. With GANs conditioned on features from such network, we notably increase the generalization capability over "in the wild" line arts. Furthermore, we collect two datasets that provide high-quality colorful illustrations and authentic line arts for training and benchmarking. With the proposed model trained on our illustration dataset, we demonstrate that images synthesized by the presented approach are considerably more realistic and precise than alternative approaches.

References

[1]
Xiaobo An and Fabio Pellacini. 2008. AppProp: all-pairs appearance-space edit propagation. In ACM Transactions on Graphics (TOG), Vol. 27. ACM, 40.
[2]
Gwern Branwen Aaron Gokaslan Anonymous, the Danbooru community. 2018. Danbooru2017: A Large-Scale Crowdsourced and Tagged Anime Illustration Dataset. https://www.gwern.net/Danbooru2017 . (January 2018). https://www.gwern.net/Danbooru2017
[3]
Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein gan. arXiv preprint arXiv:1701.07875 (2017).
[4]
Huiwen Chang, Ohad Fried, Yiming Liu, Stephen DiVerdi, and Adam Finkelstein. 2015. Palette-based photo recoloring. ACM Transactions on Graphics (TOG), Vol. 34, 4 (2015), 139.
[5]
Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, and Shi-Min Hu. 2009. Sketch2photo: Internet image montage. In ACM Transactions on Graphics (TOG), Vol. 28. ACM, 124.
[6]
Wengling Chen and James Hays. 2018. SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis. arXiv preprint arXiv:1801.02753 (2018).
[7]
Xiaowu Chen, Dongqing Zou, Qinping Zhao, and Ping Tan. 2012. Manifold preserving edit propagation. ACM Transactions on Graphics (TOG), Vol. 31, 6 (2012), 132.
[8]
Zezhou Cheng, Qingxiong Yang, and Bin Sheng. 2015. Deep colorization. In Proceedings of the IEEE International Conference on Computer Vision . 415--423.
[9]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 248--255.
[10]
Aditya Deshpande, Jason Rock, and David Forsyth. 2015. Learning large-scale automatic image colorization. Proceedings of the IEEE International Conference on Computer Vision. 567--575.
[11]
Mathias Eitz, Ronald Richter, Kristian Hildebrand, Tamy Boubekeur, and Marc Alexa. 2011. Photosketcher: interactive sketch-based image synthesis. IEEE Computer Graphics and Applications, Vol. 31, 6 (2011), 56--66.
[12]
Yuki Endo, Satoshi Iizuka, Yoshihiro Kanamori, and Jun Mitani. 2016. DeepProp: extracting deep features from a single image for edit propagation. In Computer Graphics Forum, Vol. 35. Wiley Online Library, 189--201.
[13]
Kevin Frans. 2017. Outline Colorization through Tandem Adversarial Networks. arXiv preprint arXiv:1704.08834 (2017).
[14]
Chie Furusawa, Kazuyuki Hiroshiba, Keisuke Ogaki, and Yuri Odagiri. 2017. Comicolorization: semi-automatic manga colorization. In SIGGRAPH Asia 2017 Technical Briefs. ACM, 12.
[15]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems . 2672--2680.
[16]
Sergio Guadarrama, Ryan Dahl, David Bieber, Mohammad Norouzi, Jonathon Shlens, and Kevin Murphy. 2017. Pixcolor: Pixel recursive colorization. arXiv preprint arXiv:1705.07208 (2017).
[17]
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. 2017. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems . 5769--5779.
[18]
Paulina Hensman and Kiyoharu Aizawa. 2017. cGAN-based Manga Colorization Using a Single Training Image. arXiv preprint arXiv:1706.06918 (2017).
[19]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems. 6629--6640.
[20]
Yi-Chin Huang, Yi-Shin Tung, Jun-Cheng Chen, Sung-Wen Wang, and Ja-Ling Wu. 2005. An adaptive edge detection based colorization algorithm and its applications. In Proceedings of the 13th annual ACM international conference on Multimedia. ACM, 351--354.
[21]
Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2016. Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Transactions on Graphics (TOG), Vol. 35, 4 (2016), 110.
[22]
Hikaru Ikuta, Keisuke Ogaki, and Yuri Odagiri. 2016. Blending Texture Features from Multiple Reference Images for Style Transfer. In SIGGRAPH Asia Technical Briefs .
[23]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. arXiv preprint (2017).
[24]
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision. Springer, 694--711.
[25]
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).
[26]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. Computer Science (2014).
[27]
Orest Kupyn, Volodymyr Budzan, Mykola Mykhailych, Dmytro Mishkin, and Jiri Matas. 2017. DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks. arXiv preprint arXiv:1711.07064 (2017).
[28]
Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. 2016. Learning representations for automatic colorization. In European Conference on Computer Vision. Springer, 577--593.
[29]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE, Vol. 86, 11 (1998), 2278--2324.
[30]
Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et almbox. 2016. Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint (2016).
[31]
Anat Levin, Dani Lischinski, and Yair Weiss. 2004. Colorization using optimization. In ACM Transactions on Graphics (ToG), Vol. 23. ACM, 689--694.
[32]
Xujie Li, Hanli Zhao, Guizhi Nie, and Hui Huang. 2015. Image recoloring using geodesic distance based color harmonization. Computational Visual Media, Vol. 1, 2 (2015), 143--155.
[33]
Yuanzhen Li, Edward Adelson, and Aseem Agarwala. 2008. ScribbleBoost: Adding Classification to Edge-Aware Interpolation of Local Image and Video Adjustments. In Computer Graphics Forum, Vol. 27. Wiley Online Library, 1255--1264.
[34]
Yifan Liu, Zengchang Qin, Zhenbo Luo, and Hua Wang. 2017. Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks. arXiv preprint arXiv:1705.01908 (2017).
[35]
Qing Luan, Fang Wen, Daniel Cohen-Or, Lin Liang, Ying-Qing Xu, and Heung-Yeung Shum. 2007. Natural image colorization. In Proceedings of the 18th Eurographics conference on Rendering Techniques. Eurographics Association, 309--320.
[36]
Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, and Olivier Bousquet. 2017. Are GANs Created Equal? A Large-Scale Study. arXiv preprint arXiv:1711.10337 (2017).
[37]
Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
[38]
Seungjun Nah, Tae Hyun Kim, and Kyoung Mu Lee. 2017. Deep multi-scale convolutional neural network for dynamic scene deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2.
[39]
Preferred Networks. 2017. paintschainer. (2017). https://paintschainer.preferred.tech/
[40]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. (2017).
[41]
Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. 2016. Context encoders: Feature learning by inpainting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2536--2544.
[42]
Yingge Qu, Tien-Tsin Wong, and Pheng-Ann Heng. 2006. Manga colorization. In ACM Transactions on Graphics (TOG), Vol. 25. ACM, 1214--1220.
[43]
Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).
[44]
Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396 (2016).
[45]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.
[46]
Masaki Saito and Yusuke Matsui. 2015. Illustration2Vec: a semantic vector representation of illustrations. In SIGGRAPH Asia 2015 Technical Briefs. 380--383.
[47]
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans. In Advances in Neural Information Processing Systems . 2234--2242.
[48]
Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. 2016. Scribbler: Controlling Deep Image Synthesis with Sketch and Color. (2016).
[49]
Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. 2017. Scribbler: Controlling deep image synthesis with sketch and color. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2.
[50]
Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1874--1883.
[51]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.
[52]
Aaron van den Oord, Nal Kalchbrenner, Lasse Espeholt, Oriol Vinyals, Alex Graves, et almbox. 2016. Conditional image generation with pixelcnn decoders. In Advances in Neural Information Processing Systems. 4790--4798.
[53]
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2017. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. arXiv preprint arXiv:1711.11585 (2017).
[54]
Holger Winnemöller, Jan Eric Kyprianidis, and Sven C. Olsen. 2012. XDoG: An eXtended difference-of-Gaussians compendium including advanced image stylization. Computers & Graphics, Vol. 36, 6 (2012), 740--753.
[55]
Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on. IEEE, 5987--5995.
[56]
Kun Xu, Yong Li, Tao Ju, Shi-Min Hu, and Tian-Qiang Liu. 2009. Efficient affinity-based edit propagation using kd tree. In ACM Transactions on Graphics (TOG), Vol. 28. ACM, 118.
[57]
Liron Yatziv and Guillermo Sapiro. 2006. Fast image and video colorization using chrominance blending. IEEE transactions on image processing, Vol. 15, 5 (2006), 1120--1129.
[58]
Fisher Yu, Vladlen Koltun, and Thomas Funkhouser. 2017. Dilated residual networks. In Computer Vision and Pattern Recognition, Vol. 1.
[59]
Lvmin Zhang, Yi Ji, and Xin Lin. 2017a. Style transfer for anime sketches with enhanced residual u-net and auxiliary classifier GAN. arXiv preprint arXiv:1706.03319 (2017).
[60]
Richard Zhang, Phillip Isola, and Alexei A Efros. 2016. Colorful image colorization. In European Conference on Computer Vision. Springer, 649--666.
[61]
Richard Zhang, Jun-Yan Zhu, Phillip Isola, Xinyang Geng, Angela S Lin, Tianhe Yu, and Alexei A Efros. 2017b. Real-time user-guided image colorization with learned deep priors. arXiv preprint arXiv:1705.02999 (2017).
[62]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '18: Proceedings of the 26th ACM international conference on Multimedia
October 2018
2167 pages
ISBN:9781450356657
DOI:10.1145/3240508
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. edit propagation
  2. gans
  3. interactive colorization

Qualifiers

  • Research-article

Funding Sources

  • National Natural Sci- ence Foundation of China (NSFC)

Conference

MM '18
Sponsor:
MM '18: ACM Multimedia Conference
October 22 - 26, 2018
Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)67
  • Downloads (Last 6 weeks)11
Reflects downloads up to 21 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Advancing Sequential Manga Colorization for AR Through Data SynthesisIEEE Access10.1109/ACCESS.2025.352688213(7526-7537)Online publication date: 2025
  • (2025)Motion-guided semantic alignment for line art animation colorizationPattern Recognition10.1016/j.patcog.2024.111055158(111055)Online publication date: Feb-2025
  • (2025)Image colorization: A survey and datasetInformation Fusion10.1016/j.inffus.2024.102720114(102720)Online publication date: Feb-2025
  • (2024)Study on the Utilization of Artificial Intelligence Technology in Webtoon ProductionJournal of Digital Contents Society10.9728/dcs.2024.25.6.139925:6(1399-1409)Online publication date: 30-Jun-2024
  • (2024)Re:draw - context aware translation as a controllable method for artistic productionProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/842(7609-7617)Online publication date: 3-Aug-2024
  • (2024)Research on Color Intelligent Matching and Optimization of Traditional Art Colors in Modern Visual Communication DesignsApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-32969:1Online publication date: 18-Nov-2024
  • (2024)LVCD: Reference-based Lineart Video Colorization with Diffusion ModelsACM Transactions on Graphics10.1145/368791043:6(1-11)Online publication date: 19-Dec-2024
  • (2024)Towards Photorealistic Video Colorization via Gated Color-Guided Image Diffusion ModelsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681356(10891-10900)Online publication date: 28-Oct-2024
  • (2024)AnimeDiffusion: Anime Diffusion ColorizationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.335756830:10(6956-6969)Online publication date: Oct-2024
  • (2024)Self-Driven Dual-Path Learning for Reference-Based Line Art Colorization Under Limited DataIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.329511534:3(1388-1402)Online publication date: Mar-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media