skip to main content
research-article

Automatic Comic Generation with Stylistic Multi-page Layouts and Emotion-driven Text Balloon Generation

Authors Info & Claims
Published:29 May 2021Publication History
Skip Abstract Section

Abstract

In this article, we propose a fully automatic system for generating comic books from videos without any human intervention. Given an input video along with its subtitles, our approach first extracts informative keyframes by analyzing the subtitles and stylizes keyframes into comic-style images. Then, we propose a novel automatic multi-page layout framework that can allocate the images across multiple pages and synthesize visually interesting layouts based on the rich semantics of the images (e.g., importance and inter-image relation). Finally, as opposed to using the same type of balloon as in previous works, we propose an emotion-aware balloon generation method to create different types of word balloons by analyzing the emotion of subtitles and audio. Our method is able to vary balloon shapes and word sizes in balloons in response to different emotions, leading to more enriched reading experience. Once the balloons are generated, they are placed adjacent to their corresponding speakers via speaker detection. Our results show that our method, without requiring any user inputs, can generate high-quality comic pages with visually rich layouts and balloons. Our user studies also demonstrate that users prefer our generated results over those by state-of-the-art comic generation systems.

Skip Supplemental Material Section

Supplemental Material

References

  1. Paul Buitelaar, Ian D. Wood, Sapna Negi, Mihael Arcan, John P. McCrae, Andrejs Abele, Cécile Robin, Vladimir Andryushechkin, Housam Ziad, Hesam Sagha, Maximilian Schmitt, Björn W. Schuller, J. Fernando Sánchez-Rada, Carlos A. Iglesias, Carlos Navarro, Andreas Giefer, Nicolaus Heise, Vincenzo Masucci, Francesco A. Danza, Ciro Caterino, Pavel Smrž, Michal Hradiš, Filip Povolný, Marek Klimeš, Pavel Matějka, and Giovanni Tummarello. 2018. MixedEmotions: An open-source toolbox for multimodal emotion analysis. IEEE Trans. Multimedia 20, 9 (2018), 2454–2465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ying Cao, Antoni B. Chan, and Rynson W. H. Lau. 2012. Automatic stylistic manga layout. ACM Trans. Graph. 31, 6 (2012), 1–10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Shizhe Chen, Bei Liu, Jianlong Fu, Ruihua Song, Qin Jin, Pingping Lin, Xiaoyu Qi, Chunting Wang, and Jin Zhou. 2019. Neural storyboard artist: Visualizing stories with coherent image sequences. In Proceedings of the 27th ACM International Conference on Multimedia. 2236–2244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Wei Ta Chu, Chia Hsiang Yu, and Hsin Han Wang. 2015. Optimized comics-based storytelling for temporal image sequences. IEEE Trans. Multimedia 17, 2 (2015), 201–215.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Katja Filippova. 2013. Multi-sentence compression: Finding shortest paths in word graphs. In Proceedings of the Conference on International Conference on Computational Linguistics (COLING’10). 322–330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Forceville, T. Veale, and K. Feyaerts. 2010. Balloonics: The Visuals of Balloons in Comics. 232–236 pages.Google ScholarGoogle Scholar
  7. Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2414--2423.Google ScholarGoogle ScholarCross RefCross Ref
  8. Shuyang Gu, Congliang Chen, Jing Liao, and Lu Yuan. 2018. Arbitrary style transfer with deep feature reshuffle. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8222--8231.Google ScholarGoogle ScholarCross RefCross Ref
  9. D. E. Goldberg, G. R. Harik, and F. G. Lobo. 1999. The compact genetic algorithm. IEEE Trans. Evol. Comput 3, 4 (1999), 287--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Richang Hong, Xiao Tong Yuan, Mengdi Xu, Meng Wang, Shuicheng Yan, and Tat Seng Chua. 2010. Movie2Comics: A feast of multimedia artwork. In Proceedings of the International Conference on Multimedea. 611–614. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Guangmei Jing, Yongtao Hu, Yanwen Guo, Yizhou Yu, and Wenping Wang. 2015. Content-aware Video2Comics with manga-style layout. IEEE Trans. Multimedia 17, 12 (2015), 2122–2133.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Justin Johnson, Alexandre Alahi, and Li Feifei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision. 694--711.Google ScholarGoogle ScholarCross RefCross Ref
  13. David Kurlander, Tim Skelly, and David Salesin. 1996. Comic chat. In Proceedings of the Conference on Computer Graphics and Interactive Techniques. 225–236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yitong Li, Zhe Gan, Yelong Shen, Jingjing Liu, Yu Cheng, Yuexin Wu, Lawrence Carin, David Carlson, and Jianfeng Gao. 2019. Storygan: A sequential conditional gan for story visualization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6329–6338.Google ScholarGoogle ScholarCross RefCross Ref
  15. Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 3 (2001), 145–175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Dae Young Park and Kwang Hee Lee. 2019. Arbitrary style transfer with style-attentional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).Google ScholarGoogle ScholarCross RefCross Ref
  17. Jacqueline Preu. 2007. From movie to comic, informed by the screenplay. In Proceedings of the ACM Association for Computing Machinery’s Special Interest Group on Computer Graphics and Interactive Techniques (SIGGRAPH’07). 99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Zhong Qu, Lidan Lin, Tengfei Gao, and Yongkun Wang. 2013. An improved keyframe extraction method based on HSV colour space. J. Softw. 8, 7 (2013).Google ScholarGoogle ScholarCross RefCross Ref
  19. Hareesh Ravi, Lezi Wang, Carlos Muniz, Leonid Sigal, Dimitris Metaxas, and Mubbasir Kapadia. 2018. Show me a story: Towards coherent neural story illustration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7613–7621.Google ScholarGoogle ScholarCross RefCross Ref
  20. Dong Sung Ryu, Soo Hyun Park, Jeong Won Lee, Do Hoon Lee, and Hwan Gue Cho. 2008. CINETOON: A semi-automated system for rendering black/white comic books from video streams. In Proceedings of the IEEE International Conference on Computer and Information Technology Workshops. 336–341. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Tomoya Sawada, Masahiro Toyoura, and Xiaoyang Mao. 2013. Film comic generation with eye tracking. In Proceedings of the International Conference on Multimedia Modeling. 467–478.Google ScholarGoogle ScholarCross RefCross Ref
  22. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.Google ScholarGoogle ScholarCross RefCross Ref
  23. Masahiro Toyoura, Mamoru Kunihiro, and Xiaoyang Mao. 2012. Film comic reflecting camera-works. In Proceedings of the International Conference on Advances in Multimedia Modeling (MMM’12). 406–417. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Meng Wang, Richang Hong, Xiao Tong Yuan, Shuicheng Yan, and Tat Seng Chua. 2012. Movie2Comics: Towards a lively video content presentation. IEEE Trans. Multimedia 14, 3 (2012), 858–870. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Holger Winnemöller. 2011. XDoG: Advanced image stylization with eXtended Difference-of-Gaussians. In Proceedings of the ACM Siggraph/Eurographics Symposium on Non-Photorealistic Animation and Rendering. 147–156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Holger Winnemöller, Jan Eric Kyprianidis, and Sven C. Olsen. 2012. XDoG: An eXtended difference-of-Gaussians compendium including advanced image stylization. Comput. Graph. 36, 6 (2012), 740–753. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ke Zhang, Wei Lun Chao, Fei Sha, and Kristen Grauman. 2016. Video summarization with long short-term memory. (2016), 766–782.Google ScholarGoogle Scholar
  28. Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2921–2929.Google ScholarGoogle Scholar

Index Terms

  1. Automatic Comic Generation with Stylistic Multi-page Layouts and Emotion-driven Text Balloon Generation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 2
      May 2021
      410 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3461621
      Issue’s Table of Contents

      Copyright © 2021 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 May 2021
      • Accepted: 1 November 2020
      • Revised: 1 September 2020
      • Received: 1 September 2019
      Published in tomm Volume 17, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!