skip to main content
research-article

Sketch-guided Deep Portrait Generation

Authors Info & Claims
Published:05 July 2020Publication History
Skip Abstract Section

Abstract

Generating a realistic human class image from a sketch is a unique and challenging problem considering that the human body has a complex structure that must be preserved. Additionally, input sketches often lack important details that are crucial in the generation process, hence making the problem more complicated. In this article, we present an effective method for synthesizing realistic images from human sketches. Our framework incorporates human poses corresponding to locations of key semantic components (e.g., arm, eyes, nose), seeing that its a strong prior for generating human class images. Our sketch-image synthesis framework consists of three stages: semantic keypoint extraction, coarse image generation, and image refinement. First, we extract the semantic keypoints using Part Affinity Fields (PAFs) and a convolutional autoencoder. Then, we integrate the sketch with semantic keypoints to generate a coarse image of a human. Finally, in the image refinement stage, the coarse image is enhanced by a Generative Adversarial Network (GAN) that adopts an architecture carefully designed to avoid checkerboard artifacts and to generate photo-realistic results. We evaluate our method on 6,300 sketch-image pairs and show that our proposed method generates realistic images and compares favorably against state-of-the-art image synthesis methods.

References

  1. David Berthelot, Thomas Schumm, and Luke Metz. 2017. BEGAN: Boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717.Google ScholarGoogle Scholar
  2. Yang Cao, Changhu Wang, Liqing Zhang, and Lei Zhang. 2011. Edgel index for large-scale sketch-based image search. CVPR 2011. IEEE, 761--768.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Yang Cao, Hai Wang, Changhu Wang, Zhiwei Li, Liqing Zhang, and Lei Zhang. 2010. Mindfinder: Interactive sketch-based image search on millions of images. In Proceedings of the 18th ACM International Conference on Multimedia. ACM, 1605--1608.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2016. Realtime multi-person 2d pose estimation using part affinity fields. arXiv preprint arXiv:1611.08050.Google ScholarGoogle Scholar
  5. Qifeng Chen and Vladlen Koltun. 2017. Photographic image synthesis with cascaded refinement networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17), Vol. 1. 3.Google ScholarGoogle ScholarCross RefCross Ref
  6. Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, and Shi-Min Hu. 2009. Sketch2photo: Internet image montage. In ACM Transactions on Graphics, Vol. 28. ACM, 124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Wengling Chen and James Hays. 2018. SketchyGAN: Towards diverse and realistic sketch to image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9416--9425.Google ScholarGoogle ScholarCross RefCross Ref
  8. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). Ieee, 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  9. Mathias Eitz, Kristian Hildebrand, Tamy Boubekeur, and Marc Alexa. 2010. An evaluation of descriptors for large-scale image retrieval from sketched feature lines. Comput. Graph. 34, 5 (2010), 482--498.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mathias Eitz, Kristian Hildebrand, Tamy Boubekeur, and Marc Alexa. 2011. Sketch-based image retrieval: Benchmark and bag-of-features descriptors. IEEE Trans. Visual. Comput. Graph. 17, 11 (2011), 1624--1636.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Mathias Eitz, Ronald Richter, Kristian Hildebrand, Tamy Boubekeur, and Marc Alexa. 2011. Photosketcher: Interactive sketch-based image synthesis. IEEE Comput. Graph. Appl. 31, 6 (2011), 56--66.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Leon Gatys, Alexander S. Ecker, and Matthias Bethge. 2015. Texture synthesis using convolutional neural networks. In Advances in Neural Information Processing Systems. MIT Press, 262--270.Google ScholarGoogle Scholar
  13. Arnab Ghosh, Richard Zhang, Puneet K. Dokania, Oliver Wang, Alexei A. Efros, Philip H. S. Torr, and Eli Shechtman. 2019. Interactive sketch 8 fill: Multiclass sketch-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision. 1171--1180.Google ScholarGoogle ScholarCross RefCross Ref
  14. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. MIT Press, 2672--2680.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Photoshop Gravity. 2016. Create filter gallery photocopy effect with single step in photoshop. Retrieved from https://www.youtube.com/watch?v=QNmniB_5Nz0.Google ScholarGoogle Scholar
  16. Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C. Courville. 2017. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems. MIT Press, 5767--5777.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  18. Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems. MIT Press, 6626--6637.Google ScholarGoogle Scholar
  19. Rui Hu, Mark Barnard, and John Collomosse. 2010. Gradient field descriptor for sketch-based retrieval and localization. In Proceedings of the 17th IEEE International Conference on Image Processing (ICIP’10). IEEE, 1025--1028.Google ScholarGoogle ScholarCross RefCross Ref
  20. Rui Hu and John Collomosse. 2013. A performance evaluation of gradient field hog descriptor for sketch-based image retrieval. Comput. Vision Image Understand. 117, 7 (2013), 790--806.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Rui Hu, Tinghuai Wang, and John Collomosse. 2011. A bag-of-regions approach to sketch-based image retrieval. In Proceedings of the 18th IEEE International Conference on Image Processing (ICIP’11). IEEE, 3661--3664.Google ScholarGoogle ScholarCross RefCross Ref
  22. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1125--1134.Google ScholarGoogle Scholar
  23. Stuart James, Manuel J. Fonseca, and John Collomosse. 2014. Reenact: Sketch-based choreographic design from archival dance footage. In Proceedings of International Conference on Multimedia Retrieval. ACM, 313.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Youngjoo Jo and Jongyoul Park. 2019. SC-FEGAN: Face editing generative adversarial network with user’s sketch and color. In Proceedings of the IEEE International Conference on Computer Vision. 1745--1753.Google ScholarGoogle ScholarCross RefCross Ref
  25. Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision. Springer, 694--711.Google ScholarGoogle ScholarCross RefCross Ref
  26. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. MIT Press, 1097--1105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Orest Kupyn, Volodymyr Budzan, Mykola Mykhailych, Dmytro Mishkin, and Jiri Matas. 2017. DeblurGAN: Blind motion deblurring using conditional adversarial networks. arXiv preprint arXiv:1711.07064.Google ScholarGoogle Scholar
  28. Christoph Lassner, Gerard Pons-Moll, and Peter V. Gehler. 2017. A generative model of people in clothing. In Proceedings of the IEEE International Conference on Computer Vision. 853--862.Google ScholarGoogle Scholar
  29. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436.Google ScholarGoogle Scholar
  30. Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew P. Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’17), Vol. 2. 4.Google ScholarGoogle ScholarCross RefCross Ref
  31. Ke Li, Kaiyue Pang, Yi-Zhe Song, Timothy Hospedales, Honggang Zhang, and Yichuan Hu. 2016. Fine-grained sketch-based image retrieval: The role of part-aware attributes. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’16). IEEE, 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  32. Yining Li, Chen Huang, and Chen Change Loy. 2019. Dense intrinsic appearance flow for human pose transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 3693--3702.Google ScholarGoogle ScholarCross RefCross Ref
  33. Xiaodan Liang, Ke Gong, Xiaohui Shen, and Liang Lin. 2018. Look into person: Joint body parsing 8 pose estimation network and a new benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 41, 4 (2018), 871--885.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yen-Liang Lin, Cheng-Yu Huang, Hao-Jeng Wang, and Winston Hsu. 2013. 3D sub-query expansion for improving sketch-based multi-view image retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 3495--3502.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. 2016. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1096--1104.Google ScholarGoogle ScholarCross RefCross Ref
  36. Yongyi Lu, Shangzhe Wu, Yu-Wing Tai, and Chi-Keung Tang. 2017. Sketch-to-image generation using deep contextual completion. arXiv preprint arXiv :abs/1711.08972.Google ScholarGoogle Scholar
  37. Yongyi Lu, Shangzhe Wu, Yu-Wing Tai, and Chi-Keung Tang. 2017. Sketch-to-image generation using deep contextual completion. arXiv preprint arXiv:1711.08972 (2017).Google ScholarGoogle Scholar
  38. Yongyi Lu, Shangzhe Wu, Yu-Wing Tai, and Chi-Keung Tang. 2018. Image generation from sketch constraint using contextual GAN. In Proceedings of the European Conference on Computer Vision (ECCV’18). 205--220.Google ScholarGoogle ScholarCross RefCross Ref
  39. Vinod Nair and Geoffrey E. Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). 807--814.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Augustus Odena, Vincent Dumoulin, and Chris Olah. 2016. Deconvolution and checkerboard artifacts. Distill 1, 10 (2016), e3.Google ScholarGoogle ScholarCross RefCross Ref
  41. Yuxin Peng and Jinwei Qi. 2019. CM-GANs: Cross-modal generative adversarial networks for common representation learning. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1 (2019), 22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Tran Minh Quan, David G. C. Hildebrand, and Won-Ki Jeong. 2016. Fusionnet: A deep fully residual convolutional neural network for image segmentation in connectomics. arXiv preprint arXiv:1612.05360.Google ScholarGoogle Scholar
  43. C. Si, W. Wang, L. Wang, and T. Tan. 2018. Multistage adversarial losses for pose-based human image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 118--126.Google ScholarGoogle Scholar
  44. Edgar Simo-Serra, Satoshi Iizuka, Kazuma Sasaki, and Hiroshi Ishikawa. 2016. Learning to simplify: Fully convolutional networks for rough sketch cleanup. ACM Trans. Graph. 35, 4 (2016), 121.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.Google ScholarGoogle Scholar
  46. Sijie Song, Wei Zhang, Jiaying Liu, and Tao Mei. 2019. Unsupervised person image generation with semantic parsing transformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2357--2366.Google ScholarGoogle ScholarCross RefCross Ref
  47. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1 (2014), 1929--1958.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Wenchen Sun, Fangai Liu, and Weizhi Xu. 2019. Unlabeled samples generated by GAN improve the person re-identification baseline. In Proceedings of the 2019 5th International Conference on Computer and Technology Applications. ACM, 117--123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Victor Lempitsky Dmitry Ulyanov Andrea Vedaldi. 2016. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022.Google ScholarGoogle Scholar
  50. Jacob Walker, Kenneth Marino, Harikrishna Mulam, and Martial Hebert. 2017. The pose knows: Video forecasting by generating pose futures. In Proceedings of the IEEE International Conference on Computer Vision. 3332--3341. DOI:https://doi.org/10.1109/ICCV.2017.361Google ScholarGoogle ScholarCross RefCross Ref
  51. Changhu Wang, Zhiwei Li, and Lei Zhang. 2010. Mindfinder: Image search by interactive sketching and tagging. In Proceedings of the 19th International Conference on World Wide Web. ACM, 1309--1312.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Fang Wang, Le Kang, and Yi Li. 2015. Sketch-based 3D shape retrieval using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1875--1883.Google ScholarGoogle ScholarCross RefCross Ref
  53. Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 13, 4 (2004), 600--612.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Holger Winnemöller, Jan Eric Kyprianidis, and Sven C. Olsen. 2012. XDoG: An extended difference-of-Gaussians compendium including advanced image stylization. Comput. Graph. 36, 6 (2012), 740--753.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Saining Xie and Zhuowen Tu. 2015. Holistically-nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision. 1395--1403.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Bing Xu, Naiyan Wang, Tianqi Chen, and Mu Li. 2015. Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853.Google ScholarGoogle Scholar
  57. Binxin Yang, Xuejin Chen, Richang Hong, Zihan Chen, Yuhang Li, and Zheng-Jun Zha. 2020. Joint sketch-attribute learning for fine-grained face synthesis. In Proceedings of the International Conference on Multimedia Modeling. Springer, 790--801.Google ScholarGoogle ScholarCross RefCross Ref
  58. Qian Yu, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales, and Chen-Change Loy. 2016. Sketch me that shoe. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 799--807.Google ScholarGoogle ScholarCross RefCross Ref
  59. Jun-Yan Zhu, Yong Jae Lee, and Alexei A. Efros. 2014. Averageexplorer: Interactive exploration and alignment of visual data collections. ACM Trans. Graph. 33, 4 (2014), 160.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2223--2232.Google ScholarGoogle Scholar

Index Terms

  1. Sketch-guided Deep Portrait Generation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!