Abstract
In this article, we propose a fully automatic system for generating comic books from videos without any human intervention. Given an input video along with its subtitles, our approach first extracts informative keyframes by analyzing the subtitles and stylizes keyframes into comic-style images. Then, we propose a novel automatic multi-page layout framework that can allocate the images across multiple pages and synthesize visually interesting layouts based on the rich semantics of the images (e.g., importance and inter-image relation). Finally, as opposed to using the same type of balloon as in previous works, we propose an emotion-aware balloon generation method to create different types of word balloons by analyzing the emotion of subtitles and audio. Our method is able to vary balloon shapes and word sizes in balloons in response to different emotions, leading to more enriched reading experience. Once the balloons are generated, they are placed adjacent to their corresponding speakers via speaker detection. Our results show that our method, without requiring any user inputs, can generate high-quality comic pages with visually rich layouts and balloons. Our user studies also demonstrate that users prefer our generated results over those by state-of-the-art comic generation systems.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Automatic Comic Generation with Stylistic Multi-page Layouts and Emotion-driven Text Balloon Generation
- Paul Buitelaar, Ian D. Wood, Sapna Negi, Mihael Arcan, John P. McCrae, Andrejs Abele, Cécile Robin, Vladimir Andryushechkin, Housam Ziad, Hesam Sagha, Maximilian Schmitt, Björn W. Schuller, J. Fernando Sánchez-Rada, Carlos A. Iglesias, Carlos Navarro, Andreas Giefer, Nicolaus Heise, Vincenzo Masucci, Francesco A. Danza, Ciro Caterino, Pavel Smrž, Michal Hradiš, Filip Povolný, Marek Klimeš, Pavel Matějka, and Giovanni Tummarello. 2018. MixedEmotions: An open-source toolbox for multimodal emotion analysis. IEEE Trans. Multimedia 20, 9 (2018), 2454–2465. Google Scholar
Digital Library
- Ying Cao, Antoni B. Chan, and Rynson W. H. Lau. 2012. Automatic stylistic manga layout. ACM Trans. Graph. 31, 6 (2012), 1–10. Google Scholar
Digital Library
- Shizhe Chen, Bei Liu, Jianlong Fu, Ruihua Song, Qin Jin, Pingping Lin, Xiaoyu Qi, Chunting Wang, and Jin Zhou. 2019. Neural storyboard artist: Visualizing stories with coherent image sequences. In Proceedings of the 27th ACM International Conference on Multimedia. 2236–2244. Google Scholar
Digital Library
- Wei Ta Chu, Chia Hsiang Yu, and Hsin Han Wang. 2015. Optimized comics-based storytelling for temporal image sequences. IEEE Trans. Multimedia 17, 2 (2015), 201–215.Google Scholar
Digital Library
- Katja Filippova. 2013. Multi-sentence compression: Finding shortest paths in word graphs. In Proceedings of the Conference on International Conference on Computational Linguistics (COLING’10). 322–330. Google Scholar
Digital Library
- C. Forceville, T. Veale, and K. Feyaerts. 2010. Balloonics: The Visuals of Balloons in Comics. 232–236 pages.Google Scholar
- Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2414--2423.Google Scholar
Cross Ref
- Shuyang Gu, Congliang Chen, Jing Liao, and Lu Yuan. 2018. Arbitrary style transfer with deep feature reshuffle. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8222--8231.Google Scholar
Cross Ref
- D. E. Goldberg, G. R. Harik, and F. G. Lobo. 1999. The compact genetic algorithm. IEEE Trans. Evol. Comput 3, 4 (1999), 287--297. Google Scholar
Digital Library
- Richang Hong, Xiao Tong Yuan, Mengdi Xu, Meng Wang, Shuicheng Yan, and Tat Seng Chua. 2010. Movie2Comics: A feast of multimedia artwork. In Proceedings of the International Conference on Multimedea. 611–614. Google Scholar
Digital Library
- Guangmei Jing, Yongtao Hu, Yanwen Guo, Yizhou Yu, and Wenping Wang. 2015. Content-aware Video2Comics with manga-style layout. IEEE Trans. Multimedia 17, 12 (2015), 2122–2133.Google Scholar
Digital Library
- Justin Johnson, Alexandre Alahi, and Li Feifei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision. 694--711.Google Scholar
Cross Ref
- David Kurlander, Tim Skelly, and David Salesin. 1996. Comic chat. In Proceedings of the Conference on Computer Graphics and Interactive Techniques. 225–236. Google Scholar
Digital Library
- Yitong Li, Zhe Gan, Yelong Shen, Jingjing Liu, Yu Cheng, Yuexin Wu, Lawrence Carin, David Carlson, and Jianfeng Gao. 2019. Storygan: A sequential conditional gan for story visualization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6329–6338.Google Scholar
Cross Ref
- Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 3 (2001), 145–175. Google Scholar
Digital Library
- Dae Young Park and Kwang Hee Lee. 2019. Arbitrary style transfer with style-attentional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).Google Scholar
Cross Ref
- Jacqueline Preu. 2007. From movie to comic, informed by the screenplay. In Proceedings of the ACM Association for Computing Machinery’s Special Interest Group on Computer Graphics and Interactive Techniques (SIGGRAPH’07). 99. Google Scholar
Digital Library
- Zhong Qu, Lidan Lin, Tengfei Gao, and Yongkun Wang. 2013. An improved keyframe extraction method based on HSV colour space. J. Softw. 8, 7 (2013).Google Scholar
Cross Ref
- Hareesh Ravi, Lezi Wang, Carlos Muniz, Leonid Sigal, Dimitris Metaxas, and Mubbasir Kapadia. 2018. Show me a story: Towards coherent neural story illustration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7613–7621.Google Scholar
Cross Ref
- Dong Sung Ryu, Soo Hyun Park, Jeong Won Lee, Do Hoon Lee, and Hwan Gue Cho. 2008. CINETOON: A semi-automated system for rendering black/white comic books from video streams. In Proceedings of the IEEE International Conference on Computer and Information Technology Workshops. 336–341. Google Scholar
Digital Library
- Tomoya Sawada, Masahiro Toyoura, and Xiaoyang Mao. 2013. Film comic generation with eye tracking. In Proceedings of the International Conference on Multimedia Modeling. 467–478.Google Scholar
Cross Ref
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.Google Scholar
Cross Ref
- Masahiro Toyoura, Mamoru Kunihiro, and Xiaoyang Mao. 2012. Film comic reflecting camera-works. In Proceedings of the International Conference on Advances in Multimedia Modeling (MMM’12). 406–417. Google Scholar
Digital Library
- Meng Wang, Richang Hong, Xiao Tong Yuan, Shuicheng Yan, and Tat Seng Chua. 2012. Movie2Comics: Towards a lively video content presentation. IEEE Trans. Multimedia 14, 3 (2012), 858–870. Google Scholar
Digital Library
- Holger Winnemöller. 2011. XDoG: Advanced image stylization with eXtended Difference-of-Gaussians. In Proceedings of the ACM Siggraph/Eurographics Symposium on Non-Photorealistic Animation and Rendering. 147–156. Google Scholar
Digital Library
- Holger Winnemöller, Jan Eric Kyprianidis, and Sven C. Olsen. 2012. XDoG: An eXtended difference-of-Gaussians compendium including advanced image stylization. Comput. Graph. 36, 6 (2012), 740–753. Google Scholar
Digital Library
- Ke Zhang, Wei Lun Chao, Fei Sha, and Kristen Grauman. 2016. Video summarization with long short-term memory. (2016), 766–782.Google Scholar
- Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2921–2929.Google Scholar
Index Terms
Automatic Comic Generation with Stylistic Multi-page Layouts and Emotion-driven Text Balloon Generation
Recommendations
CodeToon: Story Ideation, Auto Comic Generation, and Structure Mapping for Code-Driven Storytelling
UIST '22: Proceedings of the 35th Annual ACM Symposium on User Interface Software and TechnologyRecent work demonstrated how we can design and use coding strips, a form of comic strips with corresponding code, to enhance teaching and learning in programming. However, creating coding strips is a creative, time-consuming process. Creators have to ...
Automatic stylistic manga layout
Manga layout is a core component in manga production, characterized by its unique styles. However, stylistic manga layouts are difficult for novices to produce as it requires hands-on experience and domain knowledge. In this paper, we propose an ...
Film comic reflecting camera-works
MMM'12: Proceedings of the 18th international conference on Advances in Multimedia ModelingWe propose a novel technique for automatically creating film comics reflecting the camera-works of an original movie. Camera-works are one of the most important effects contributing to the mise en scene of the movie. A skilled director can use the ...






Comments