skip to main content
research-article

Language-based colorization of scene sketches

Published: 08 November 2019 Publication History

Abstract

Being natural, touchless, and fun-embracing, language-based inputs have been demonstrated effective for various tasks from image generation to literacy education for children. This paper for the first time presents a language-based system for interactive colorization of scene sketches, based on semantic comprehension. The proposed system is built upon deep neural networks trained on a large-scale repository of scene sketches and cartoonstyle color images with text descriptions. Given a scene sketch, our system allows users, via language-based instructions, to interactively localize and colorize specific foreground object instances to meet various colorization requirements in a progressive way. We demonstrate the effectiveness of our approach via comprehensive experimental results including alternative studies, comparison with the state-of-the-art methods, and generalization user studies. Given the unique characteristics of language-based inputs, we envision a combination of our interface with a traditional scribble-based interface for a practical multimodal colorization system, benefiting various applications. The dataset and source code can be found at https://github.com/SketchyScene/SketchySceneColorization.

Supplementary Material

ZIP File (a233-zou.zip)
Supplemental files.

References

[1]
Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. SegNet: a Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis & Machine Intelligence 12 (2017), 2481--2495.
[2]
Hyojin Bahng, Seungjoo Yoo, Wonwoong Cho, David Keetae Park, Ziming Wu, Xiaojuan Ma, and Jaegul Choo. 2018. Coloring with words: Guiding image colorization through text-based palette generation. In ECCV. 431--447.
[3]
Huiwen Chang, Ohad Fried, Yiming Liu, Stephen DiVerdi, and Adam Finkelstein. 2015. Palette-Based Photo Recoloring. ACM Transactions on Graphics 34, 4 (2015), 139.
[4]
Jianbo Chen, Yelong Shen, Jianfeng Gao, Jingjing Liu, and Xiaodong Liu. 2018b. Language-Based Image Editing With Recurrent Attentive Models. In CVPR. IEEE.
[5]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2018a. Deeplab: Semantic Image Segmentation With Deep Convolutional Nets, Atrous Convolution, and Fully Connected Crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (2018), 834--848.
[6]
Liang-Chieh Chen, Yukun Zhu,George Papandreou, Florian Schroff, and Hartwig Adam. 2018c. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In ECCV.
[7]
Wengling Chen and James Hays. 2018. Sketchygan: Towards diverse and realistic sketch to image synthesis. In CVPR. 9416--9425.
[8]
Ming-Ming Cheng, Shuai Zheng, Wen-Yan Lin, Vibhav Vineet, Paul Sturgess, Nigel Crook, Niloy J Mitra, and Philip Torr. 2014. ImageSpirit: Verbal guided image parsing. ACM Transactions on Graphics (TOG) 34, 1 (2014), 3.
[9]
Yuanzheng Ci, Xinzhu Ma, Zhihui Wang, Haojie Li, and Zhongxuan Luo. 2018. User-Guided Deep Anime Line Art Colorization With Conditional Adversarial Networks. In ACM Multimedia. 1536--1544.
[10]
Faming Fang, Tingting Wang, Tieyong Zeng, and Guixu Zhang. 2019. A Superpixel-based Variational Model for Image Colorization. IEEE Transactions on Visualization and Computer Graphics (2019).
[11]
Chie Furusawa, Kazuyuki Hiroshiba, Keisuke Ogaki, and Yuri Odagiri. 2017. Comicolorization: semi-automatic manga colorization. In SIGGRAPH Asia 2017 Technical Briefs. ACM, 12.
[12]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In ICCV. 2961--2969.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.
[14]
Mingming He, Dongdong Chen, Jing Liao, Pedro V Sander, and Lu Yuan. 2018. Deep Exemplar-Based Colorization. ACM Transactions on Graphics 37, 4 (2018), 47.
[15]
Ronghang Hu, Marcus Rohrbach, and Trevor Darrell. 2016a. Segmentation From Natural Language Expressions. In ECCV. Springer, 108--124.
[16]
Ronghang Hu, Huazhe Xu, Marcus Rohrbach, Jiashi Feng, Kate Saenko, and Trevor Darrell. 2016b. Natural Language Object Retrieval. In CVPR. 4555--4564.
[17]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In CVPR. 1125--1134.
[18]
Hyunhoon Jung, Hee Jae Kim, Seongeun So, Jinjoong Kim, and Changhoon Oh. 2019. TurtleTalk: An Educational Programming Game for Children with Voice User Interface. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (CHI EA '19).
[19]
Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).
[20]
Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In AAAI.
[21]
Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016).
[22]
Gierad P Laput, Mira Dontcheva, Gregg Wilensky, Walter Chang, Aseem Agarwala, Jason Linder, and Eytan Adar. 2013. Pixeltone: A multimodal interface for image editing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2185--2194.
[23]
Jianan Li, Yunchao Wei, Xiaodan Liang, Fang Zhao, Jianshu Li, Tingfa Xu, and Jiashi Feng. 2017. Deep attribute-preserving metric learning for natural language object retrieval. In ACM Multimedia. 181--189.
[24]
Mengtian Li, Zhe Lin, Radomir Mech, Ersin Yumer, and Deva Ramanan. 2019. Photo-Sketching: Inferring Contour Drawings from Images. In WACV. IEEE, 1403--1412.
[25]
Ruiyu Li, Kai-Can Li, Yi-Chun Kuo, Michelle Shu, Xiaojuan Qi, Xiaoyong Shen, and Jiaya Jia. 2018. Referring Image Segmentation via Recurrent Refinement Networks. In CVPR. 5745--5753.
[26]
Chenxi Liu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, and Alan Yuille. 2017a. Recurrent Multimodal Interaction for Referring Image Segmentation. In ICCV. IEEE.
[27]
Yifan Liu, Zengchang Qin, Zhenbo Luo, and Hua Wang. 2017b. Auto-Painter: Cartoon Image Generation From Sketch by Using Conditional Generative Adversarial Networks. ArXiv Preprint ArXiv:1705.01908 (2017).
[28]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In CVPR. 3431--3440.
[29]
Silvia Lovato and Anne Marie Piper. 2015. "Siri, is This You?": Understanding Young Children's Interactions with Voice Input Systems. In Proceedings of the 14th International Conference on Interaction Design and Children (IDC '15). 335--338.
[30]
Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2016. Hierarchical Question-Image Co-Attention for Visual Question Answering. In NIPS. 289--297.
[31]
Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan L. Yuille, and Kevin Murphy. 2016. Generation and Comprehension of Unambiguous Object Descriptions. In CVPR. 11--20.
[32]
George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM 38, 11 (1995), 39--41.
[33]
Will Monroe, Noah D. Goodman, and Christopher Potts. 2016. Learning to Generate Compositional Color Descriptions. In EMNLP.
[34]
Will Monroe, Robert XD Hawkins, Noah D Goodman, and Christopher Potts. 2017. Colors in context: A pragmatic neural model for grounded language understanding. Transactions of the Association for Computational Linguistics 5 (2017), 325--338.
[35]
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic image synthesis with spatially-adaptive normalization. In CVPR. 2337--2346.
[36]
Yingge Qu, Tien-Tsin Wong, and Pheng-Ann Heng. 2006. Manga colorization. In ACM Transactions on Graphics (TOG), Vol. 25. 1214--1220.
[37]
Hayes Raffle, Cati Vaucelle, Ruibing Wang, and Hiroshi Ishii. 2007. Jabberstamp: Embedding Sound and Voice in Traditional Drawings. In Proceedings of the 6th International Conference on Interaction Design and Children (IDC '07). 137--144.
[38]
Patsorn Sangkloy, Nathan Burnell, Cusuh Ham, and James Hays. 2016. The Sketchy Database: Learning to Retrieve Badly Drawn Bunnies. ACM Transactions on Graphics (proceedings of SIGGRAPH) (2016).
[39]
Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. 2017. Scribbler: Controlling deep image synthesis with sketch and color. In CVPR. 5400--5409.
[40]
Hengcan Shi, Hongliang Li, Fanman Meng, and Qingbo Wu. 2018. Key-Word-Aware Network for Referring Expression Image Segmentation. In ECCV. 38--54.
[41]
Dong Wang, Changqing Zou, Guiqing Li, Chengying Gao, Zhuo Su, and Ping Tan. 2017. L0 Gradient-Preserving Color Transfer. Comput. Graph. Forum 36, 7 (2017), 93--103.
[42]
Holger Winnemöller. 2011. Xdog: advanced image stylization with extended difference-of-gaussians. In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Non-Photorealistic Animation and Rendering. 147--156.
[43]
Chufeng Xiao, Chu Han, Zhuming Zhang, Jing Qin, Tien-Tsin Wong, Guoqiang Han, and Shengfeng He. 2019a. Example-Based Colourization Via Dense Encoding Pyramids. In Computer Graphics Forum. Wiley Online Library.
[44]
Yi Xiao, Peiyao Zhou, Yan Zheng, and Chi-Sing Leung. 2019b. Interactive Deep Colorization Using Simultaneous Global and Local Inputs. In ICASSP. IEEE, 1887--1891.
[45]
Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, and Xiaodong He. 2018. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In CVPR. 1316--1324.
[46]
Xinchen Yan, Jimei Yang, Kihyuk Sohn, and Honglak Lee. 2016. Attribute2image: Conditional image generation from visual attributes. In ECCV. Springer, 776--791.
[47]
Taizan Yonetsuji. 2017. Paints Chainer. https://github.com/pfnet/PaintsChainer. (2017).
[48]
Dongfei Yu, Jianlong Fu, Tao Mei, and Yong Rui. 2017. Multi-level Attention Networks for Visual Question Answering. In CVPR.
[49]
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris N Metaxas. 2017a. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In ICCV. 5907--5915.
[50]
Lvmin Zhang, Chengze Li, Tien-Tsin Wong, Yi Ji, and Chunping Liu. 2018. Two-stage sketch colorization. In SIGGRAPH Asia 2018 Technical Papers. ACM, 261.
[51]
Richard Zhang, Jun-Yan Zhu, Phillip Isola, Xinyang Geng, Angela S Lin, Tianhe Yu, and Alexei A Efros. 2017b. Real-Time User-Guided Image Colorization With Learned Deep Priors. ACM Transactions on Graphics (TOG) 9, 4 (2017).
[52]
Changqing Zou, Qian Yu, Ruofei Du, Haoran Mo, Yi-Zhe Song, Tao Xiang, Chengying Gao, Baoquan Chen, and Hao Zhang. 2018. SketchyScene: Richly-Annotated Scene Sketches. In ECCV. Springer, 438--454.

Cited By

View all
  • (2024)Sketch colorization with finite color space priorJournal of Image and Graphics10.11834/jig.23018929:4(978-988)Online publication date: 2024
  • (2024)LVCD: Reference-based Lineart Video Colorization with Diffusion ModelsACM Transactions on Graphics10.1145/368791043:6(1-11)Online publication date: 19-Dec-2024
  • (2024)Reference-based attention mechanism and region segmentation line art colorizationInternational Conference on Remote Sensing, Mapping, and Image Processing (RSMIP 2024)10.1117/12.3029682(46)Online publication date: 21-Jun-2024
  • Show More Cited By

Index Terms

  1. Language-based colorization of scene sketches

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Graphics
    ACM Transactions on Graphics  Volume 38, Issue 6
    December 2019
    1292 pages
    ISSN:0730-0301
    EISSN:1557-7368
    DOI:10.1145/3355089
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 November 2019
    Published in TOG Volume 38, Issue 6

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. deep neural networks
    2. image segmentation
    3. language-based editing
    4. scene sketch
    5. sketch colorization

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)77
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 21 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Sketch colorization with finite color space priorJournal of Image and Graphics10.11834/jig.23018929:4(978-988)Online publication date: 2024
    • (2024)LVCD: Reference-based Lineart Video Colorization with Diffusion ModelsACM Transactions on Graphics10.1145/368791043:6(1-11)Online publication date: 19-Dec-2024
    • (2024)Reference-based attention mechanism and region segmentation line art colorizationInternational Conference on Remote Sensing, Mapping, and Image Processing (RSMIP 2024)10.1117/12.3029682(46)Online publication date: 21-Jun-2024
    • (2024) CustomSketching : Sketch Concept Extraction for Sketch‐based Image Synthesis and Editing Computer Graphics Forum10.1111/cgf.1524743:7Online publication date: 7-Nov-2024
    • (2024)Controllable Anime Image Editing via Probability of Attribute TagsComputer Graphics Forum10.1111/cgf.1524543:7Online publication date: 24-Oct-2024
    • (2024)AnimeDiffusion: Anime Diffusion ColorizationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.335756830:10(6956-6969)Online publication date: Oct-2024
    • (2024)SKETCH2MANGA: Shaded Manga Screening from Sketch with Diffusion Models2024 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP51287.2024.10647842(2389-2395)Online publication date: 27-Oct-2024
    • (2024)Scene Sketch-to-Image Synthesis Based on Multi-Object ControlICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446608(3775-3779)Online publication date: 14-Apr-2024
    • (2024)Learning Inclusion Matching for Animation Paint Bucket Colorization2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02413(25544-25553)Online publication date: 16-Jun-2024
    • (2024)LatentColorization: Latent Diffusion-Based Speaker Video ColorizationIEEE Access10.1109/ACCESS.2024.340624912(81105-81121)Online publication date: 2024
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media