Abstract
Generating a realistic human class image from a sketch is a unique and challenging problem considering that the human body has a complex structure that must be preserved. Additionally, input sketches often lack important details that are crucial in the generation process, hence making the problem more complicated. In this article, we present an effective method for synthesizing realistic images from human sketches. Our framework incorporates human poses corresponding to locations of key semantic components (e.g., arm, eyes, nose), seeing that its a strong prior for generating human class images. Our sketch-image synthesis framework consists of three stages: semantic keypoint extraction, coarse image generation, and image refinement. First, we extract the semantic keypoints using Part Affinity Fields (PAFs) and a convolutional autoencoder. Then, we integrate the sketch with semantic keypoints to generate a coarse image of a human. Finally, in the image refinement stage, the coarse image is enhanced by a Generative Adversarial Network (GAN) that adopts an architecture carefully designed to avoid checkerboard artifacts and to generate photo-realistic results. We evaluate our method on 6,300 sketch-image pairs and show that our proposed method generates realistic images and compares favorably against state-of-the-art image synthesis methods.
- David Berthelot, Thomas Schumm, and Luke Metz. 2017. BEGAN: Boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717.Google Scholar
- Yang Cao, Changhu Wang, Liqing Zhang, and Lei Zhang. 2011. Edgel index for large-scale sketch-based image search. CVPR 2011. IEEE, 761--768.Google Scholar
Digital Library
- Yang Cao, Hai Wang, Changhu Wang, Zhiwei Li, Liqing Zhang, and Lei Zhang. 2010. Mindfinder: Interactive sketch-based image search on millions of images. In Proceedings of the 18th ACM International Conference on Multimedia. ACM, 1605--1608.Google Scholar
Digital Library
- Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2016. Realtime multi-person 2d pose estimation using part affinity fields. arXiv preprint arXiv:1611.08050.Google Scholar
- Qifeng Chen and Vladlen Koltun. 2017. Photographic image synthesis with cascaded refinement networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17), Vol. 1. 3.Google Scholar
Cross Ref
- Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, and Shi-Min Hu. 2009. Sketch2photo: Internet image montage. In ACM Transactions on Graphics, Vol. 28. ACM, 124.Google Scholar
Digital Library
- Wengling Chen and James Hays. 2018. SketchyGAN: Towards diverse and realistic sketch to image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9416--9425.Google Scholar
Cross Ref
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). Ieee, 248--255.Google Scholar
Cross Ref
- Mathias Eitz, Kristian Hildebrand, Tamy Boubekeur, and Marc Alexa. 2010. An evaluation of descriptors for large-scale image retrieval from sketched feature lines. Comput. Graph. 34, 5 (2010), 482--498.Google Scholar
Digital Library
- Mathias Eitz, Kristian Hildebrand, Tamy Boubekeur, and Marc Alexa. 2011. Sketch-based image retrieval: Benchmark and bag-of-features descriptors. IEEE Trans. Visual. Comput. Graph. 17, 11 (2011), 1624--1636.Google Scholar
Digital Library
- Mathias Eitz, Ronald Richter, Kristian Hildebrand, Tamy Boubekeur, and Marc Alexa. 2011. Photosketcher: Interactive sketch-based image synthesis. IEEE Comput. Graph. Appl. 31, 6 (2011), 56--66.Google Scholar
Digital Library
- Leon Gatys, Alexander S. Ecker, and Matthias Bethge. 2015. Texture synthesis using convolutional neural networks. In Advances in Neural Information Processing Systems. MIT Press, 262--270.Google Scholar
- Arnab Ghosh, Richard Zhang, Puneet K. Dokania, Oliver Wang, Alexei A. Efros, Philip H. S. Torr, and Eli Shechtman. 2019. Interactive sketch 8 fill: Multiclass sketch-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision. 1171--1180.Google Scholar
Cross Ref
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. MIT Press, 2672--2680.Google Scholar
Digital Library
- Photoshop Gravity. 2016. Create filter gallery photocopy effect with single step in photoshop. Retrieved from https://www.youtube.com/watch?v=QNmniB_5Nz0.Google Scholar
- Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C. Courville. 2017. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems. MIT Press, 5767--5777.Google Scholar
Digital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google Scholar
Cross Ref
- Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems. MIT Press, 6626--6637.Google Scholar
- Rui Hu, Mark Barnard, and John Collomosse. 2010. Gradient field descriptor for sketch-based retrieval and localization. In Proceedings of the 17th IEEE International Conference on Image Processing (ICIP’10). IEEE, 1025--1028.Google Scholar
Cross Ref
- Rui Hu and John Collomosse. 2013. A performance evaluation of gradient field hog descriptor for sketch-based image retrieval. Comput. Vision Image Understand. 117, 7 (2013), 790--806.Google Scholar
Digital Library
- Rui Hu, Tinghuai Wang, and John Collomosse. 2011. A bag-of-regions approach to sketch-based image retrieval. In Proceedings of the 18th IEEE International Conference on Image Processing (ICIP’11). IEEE, 3661--3664.Google Scholar
Cross Ref
- Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1125--1134.Google Scholar
- Stuart James, Manuel J. Fonseca, and John Collomosse. 2014. Reenact: Sketch-based choreographic design from archival dance footage. In Proceedings of International Conference on Multimedia Retrieval. ACM, 313.Google Scholar
Digital Library
- Youngjoo Jo and Jongyoul Park. 2019. SC-FEGAN: Face editing generative adversarial network with user’s sketch and color. In Proceedings of the IEEE International Conference on Computer Vision. 1745--1753.Google Scholar
Cross Ref
- Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision. Springer, 694--711.Google Scholar
Cross Ref
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. MIT Press, 1097--1105.Google Scholar
Digital Library
- Orest Kupyn, Volodymyr Budzan, Mykola Mykhailych, Dmytro Mishkin, and Jiri Matas. 2017. DeblurGAN: Blind motion deblurring using conditional adversarial networks. arXiv preprint arXiv:1711.07064.Google Scholar
- Christoph Lassner, Gerard Pons-Moll, and Peter V. Gehler. 2017. A generative model of people in clothing. In Proceedings of the IEEE International Conference on Computer Vision. 853--862.Google Scholar
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436.Google Scholar
- Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew P. Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’17), Vol. 2. 4.Google Scholar
Cross Ref
- Ke Li, Kaiyue Pang, Yi-Zhe Song, Timothy Hospedales, Honggang Zhang, and Yichuan Hu. 2016. Fine-grained sketch-based image retrieval: The role of part-aware attributes. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’16). IEEE, 1--9.Google Scholar
Cross Ref
- Yining Li, Chen Huang, and Chen Change Loy. 2019. Dense intrinsic appearance flow for human pose transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 3693--3702.Google Scholar
Cross Ref
- Xiaodan Liang, Ke Gong, Xiaohui Shen, and Liang Lin. 2018. Look into person: Joint body parsing 8 pose estimation network and a new benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 41, 4 (2018), 871--885.Google Scholar
Digital Library
- Yen-Liang Lin, Cheng-Yu Huang, Hao-Jeng Wang, and Winston Hsu. 2013. 3D sub-query expansion for improving sketch-based multi-view image retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 3495--3502.Google Scholar
Digital Library
- Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. 2016. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1096--1104.Google Scholar
Cross Ref
- Yongyi Lu, Shangzhe Wu, Yu-Wing Tai, and Chi-Keung Tang. 2017. Sketch-to-image generation using deep contextual completion. arXiv preprint arXiv :abs/1711.08972.Google Scholar
- Yongyi Lu, Shangzhe Wu, Yu-Wing Tai, and Chi-Keung Tang. 2017. Sketch-to-image generation using deep contextual completion. arXiv preprint arXiv:1711.08972 (2017).Google Scholar
- Yongyi Lu, Shangzhe Wu, Yu-Wing Tai, and Chi-Keung Tang. 2018. Image generation from sketch constraint using contextual GAN. In Proceedings of the European Conference on Computer Vision (ECCV’18). 205--220.Google Scholar
Cross Ref
- Vinod Nair and Geoffrey E. Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). 807--814.Google Scholar
Digital Library
- Augustus Odena, Vincent Dumoulin, and Chris Olah. 2016. Deconvolution and checkerboard artifacts. Distill 1, 10 (2016), e3.Google Scholar
Cross Ref
- Yuxin Peng and Jinwei Qi. 2019. CM-GANs: Cross-modal generative adversarial networks for common representation learning. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1 (2019), 22.Google Scholar
Digital Library
- Tran Minh Quan, David G. C. Hildebrand, and Won-Ki Jeong. 2016. Fusionnet: A deep fully residual convolutional neural network for image segmentation in connectomics. arXiv preprint arXiv:1612.05360.Google Scholar
- C. Si, W. Wang, L. Wang, and T. Tan. 2018. Multistage adversarial losses for pose-based human image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 118--126.Google Scholar
- Edgar Simo-Serra, Satoshi Iizuka, Kazuma Sasaki, and Hiroshi Ishikawa. 2016. Learning to simplify: Fully convolutional networks for rough sketch cleanup. ACM Trans. Graph. 35, 4 (2016), 121.Google Scholar
Digital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.Google Scholar
- Sijie Song, Wei Zhang, Jiaying Liu, and Tao Mei. 2019. Unsupervised person image generation with semantic parsing transformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2357--2366.Google Scholar
Cross Ref
- Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1 (2014), 1929--1958.Google Scholar
Digital Library
- Wenchen Sun, Fangai Liu, and Weizhi Xu. 2019. Unlabeled samples generated by GAN improve the person re-identification baseline. In Proceedings of the 2019 5th International Conference on Computer and Technology Applications. ACM, 117--123.Google Scholar
Digital Library
- Victor Lempitsky Dmitry Ulyanov Andrea Vedaldi. 2016. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022.Google Scholar
- Jacob Walker, Kenneth Marino, Harikrishna Mulam, and Martial Hebert. 2017. The pose knows: Video forecasting by generating pose futures. In Proceedings of the IEEE International Conference on Computer Vision. 3332--3341. DOI:https://doi.org/10.1109/ICCV.2017.361Google Scholar
Cross Ref
- Changhu Wang, Zhiwei Li, and Lei Zhang. 2010. Mindfinder: Image search by interactive sketching and tagging. In Proceedings of the 19th International Conference on World Wide Web. ACM, 1309--1312.Google Scholar
Digital Library
- Fang Wang, Le Kang, and Yi Li. 2015. Sketch-based 3D shape retrieval using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1875--1883.Google Scholar
Cross Ref
- Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 13, 4 (2004), 600--612.Google Scholar
Digital Library
- Holger Winnemöller, Jan Eric Kyprianidis, and Sven C. Olsen. 2012. XDoG: An extended difference-of-Gaussians compendium including advanced image stylization. Comput. Graph. 36, 6 (2012), 740--753.Google Scholar
Digital Library
- Saining Xie and Zhuowen Tu. 2015. Holistically-nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision. 1395--1403.Google Scholar
Digital Library
- Bing Xu, Naiyan Wang, Tianqi Chen, and Mu Li. 2015. Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853.Google Scholar
- Binxin Yang, Xuejin Chen, Richang Hong, Zihan Chen, Yuhang Li, and Zheng-Jun Zha. 2020. Joint sketch-attribute learning for fine-grained face synthesis. In Proceedings of the International Conference on Multimedia Modeling. Springer, 790--801.Google Scholar
Cross Ref
- Qian Yu, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales, and Chen-Change Loy. 2016. Sketch me that shoe. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 799--807.Google Scholar
Cross Ref
- Jun-Yan Zhu, Yong Jae Lee, and Alexei A. Efros. 2014. Averageexplorer: Interactive exploration and alignment of visual data collections. ACM Trans. Graph. 33, 4 (2014), 160.Google Scholar
Digital Library
- Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2223--2232.Google Scholar
Index Terms
Sketch-guided Deep Portrait Generation
Recommendations
Anime-to-real clothing: Cosplay costume generation via image-to-image translation
AbstractCosplay has grown from its origins at fan conventions into a billion-dollar global dress phenomenon. To facilitate the imagination and reinterpretation of animated images as real garments, this paper presents an automatic costume-image generation ...
GGADN: Guided generative adversarial dehazing network
AbstractImage dehazing has always been a challenging topic in image processing. The development of deep learning methods, especially the generative adversarial networks (GAN), provides a new way for image dehazing. In recent years, many deep learning ...
Image Manipulation with Perceptual Discriminators
Computer Vision – ECCV 2018AbstractSystems that perform image manipulation using deep convolutional networks have achieved remarkable realism. Perceptual losses and losses based on adversarial discriminators are the two main classes of learning objectives behind these advances. In ...






Comments