Abstract
Artificial creativity has attracted increasing research attention in the field of multimedia and artificial intelligence. Despite the promising work on poetry/painting/music generation, creating modern Chinese poetry from images, which can significantly enrich the functionality of photo-sharing platforms, has rarely been explored. Moreover, existing generation models cannot tackle three challenges in this task: (1) Maintaining semantic consistency between images and poems; (2) preventing topic drift in the generation; (3) avoidance of certain words appearing frequently. These three points are even common challenges in other sequence generation tasks. In this article, we propose a Constrained Topic-aware Model (CTAM) to create modern Chinese poetries from images regarding the challenges above. Without image-poetry paired dataset, we construct a visual semantic vector to embed visual contents via image captions. For the topic-drift problem, we propose a topic-aware poetry generation model. Additionally, we design an Anti-frequency Decoding (AFD) scheme to constrain high-frequency characters in the generation. Experimental results show that our model achieves promising performance and is effective in poetry’s readability and semantic consistency.
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations (ICLR’15).Google Scholar
- Lalit Bahl, Peter Brown, Peter De Souza, and Robert Mercer. 1986. Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’86), Vol. 11. IEEE, 49--52.Google Scholar
Cross Ref
- Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollár, and C. Lawrence Zitnick. 2015. Microsoft COCO captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325.Google Scholar
- Simon Colton, Jacob Goodwin, and Tony Veale. 2012. Full-FACE poetry generation. In Proceedings of the International Conference on Computational Creativity (ICCC’12). 95--102.Google Scholar
- Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, and Rita Cucchiara. 2018. Paying more attention to saliency: Image captioning with saliency and context attention. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2 (2018), 48.Google Scholar
Digital Library
- Ali Farhadi, Mohsen Hejrati, Mohammad Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth. 2010. Every picture tells a story: Generating sentences from images. In Proceedings of the European Conference on Computer Vision (ECCV’10). 15--29.Google Scholar
Cross Ref
- Chuang Gan, Zhe Gan, Xiaodong He, Jianfeng Gao, and Li Deng. 2017. Stylenet: Generating attractive visual captions with styles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3137--3146.Google Scholar
Cross Ref
- Marjan Ghazvininejad, Xing Shi, Yejin Choi, and Kevin Knight. 2016. Generating topical poetry. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1183--1191.Google Scholar
Cross Ref
- Chen He and Haifeng Hu. 2019. Image captioning with visual-semantic double attention. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1 (2019), 26.Google Scholar
Digital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google Scholar
Cross Ref
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735--1780.Google Scholar
Digital Library
- Micah Hodosh, Peter Young, and Julia Hockenmaier. 2013. Framing image description as a ranking task: Data, models and evaluation metrics. J. Artific. Intell. Res. 47 (2013), 853--899.Google Scholar
Digital Library
- Jack Hopkins and Douwe Kiela. 2017. Automatically generating rhythmic verse with neural networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 168--178.Google Scholar
Cross Ref
- Nal Kalchbrenner, Ivo Danihelka, and Alex Graves. 2016. Grid long short-term memory. In Proceedings of the International Conference on Learning Representations (ICLR’16).Google Scholar
- Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR’15).Google Scholar
- Girish Kulkarni, Visruth Premraj, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C. Berg, and Tamara L. Berg. 2011. Baby talk: Understanding and generating image descriptions. In Proceedings of the 24th Conference on Computer Vision and Pattern Recognition (CVPR’11). Citeseer.Google Scholar
- Michael Denkowski Alon Lavie. 2014. Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’14). 376.Google Scholar
- Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A diversity-promoting objective function for neural conversation models. In Proceedings of Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT’16). 110--119.Google Scholar
Cross Ref
- Siming Li, Girish Kulkarni, Tamara L. Berg, Alexander C. Berg, and Yejin Choi. 2011. Composing simple image descriptions using web-scale n-grams. In Proceedings of the 15th Conference on Computational Natural Language Learning. Association for Computational Linguistics, 220--228.Google Scholar
Digital Library
- Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’04), Vol. 8.Google Scholar
- Bei Liu, Jianlong Fu, Makoto P. Kato, and Masatoshi Yoshikawa. 2018. Beyond narrative description: Generating poetry from images by multi-adversarial training. In Proceedings of the ACM Multimedia Conference on Multimedia Conference. ACM, 783--791.Google Scholar
Digital Library
- Yu Liu, Jianlong Fu, Tao Mei, and Chang Wen Chen. 2017. Let your photos talk: Generating narrative paragraph for photo stream via bidirectional attention recurrent neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’17). 1445--1452.Google Scholar
- Alexander Patrick Mathews, Lexing Xie, and Xuming He. 2016. SentiCap: Generating image descriptions with sentiments. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’16). 3574--3580.Google Scholar
- Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the International Conference on Learning Representations (ICLR’13).Google Scholar
- Yael Netzer, David Gabay, Yoav Goldberg, and Michael Elhadad. 2009. Gaiku: Generating haiku with word associations norms. In Proceedings of the Workshop on Computational Approaches to Linguistic Creativity. Association for Computational Linguistics, 32--39.Google Scholar
Cross Ref
- Hugo Gonçalo Oliveira. 2012. PoeTryMe: A versatile platform for poetry generation. Comput. Creat. Concept Invent. Gen. Intell. 1 (2012), 21.Google Scholar
- Vicente Ordonez, Girish Kulkarni, and Tamara L. Berg. 2011. Im2text: Describing images using 1 million captioned photographs. In Advances in Neural Information Processing Systems. MIT Press, 1143--1151.Google Scholar
Digital Library
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 311--318.Google Scholar
- Shengsheng Qian, Tianzhu Zhang, and Changsheng Xu. 2016. Multi-modal multi-view topic-opinion mining for social event analysis. In Proceedings of the 24th ACM International Conference on Multimedia. 2--11.Google Scholar
Digital Library
- Shengsheng Qian, Tianzhu Zhang, Changsheng Xu, and Jie Shao. 2015. Multi-modal event topic model for social event analysis. IEEE Trans. Multimedia 18, 2 (2015), 233--246.Google Scholar
Cross Ref
- Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. 2016. Sequence level training with recurrent neural networks. In Proceedings of the International Conference on Learning Representations (ICLR’16).Google Scholar
- Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, and Vaibhava Goel. 2017. Self-critical sequence training for image captioning. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’17), Vol. 1. 3.Google Scholar
Cross Ref
- Bob L. Sturm, Joao Felipe Santos, Oded Ben-Tal, and Iryna Korshunova. 2016. Music transcription modelling and composition using deep learning. arXiv preprint arXiv:1604.08723.Google Scholar
- Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. MIT Press, 3104--3112.Google Scholar
Digital Library
- Naoko Tosa, Hideto Obara, and Michihiko Minoh. 2008. Hitch haiku: An interactive supporting system for composing haiku poem. In Proceedings of the International Conference on Entertainment Computing. Springer, 209--216.Google Scholar
Digital Library
- Ramakrishna Vedantam, C. Lawrence Zitnick, and Devi Parikh. 2015. Cider: Consensus-based image description evaluation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4566--4575.Google Scholar
Cross Ref
- Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3156--3164.Google Scholar
Cross Ref
- Anqi Wang, Haifeng Hu, and Liang Yang. 2018. Image captioning with affective guiding and selective attention. ACM Trans. Multimedia Comput. Commun. Appl. 14, 3 (2018), 73.Google Scholar
Digital Library
- Cheng Wang, Haojin Yang, and Christoph Meinel. 2018. Image captioning with deep bidirectional lstms and multi-task learning. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s (2018), 40.Google Scholar
Digital Library
- Qixin Wang, Tianyi Luo, Dong Wang, and Chao Xing. 2016. Chinese song iambics generation with neural attention-based model. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’16).Google Scholar
- Zhe Wang, Wei He, Hua Wu, Haiyang Wu, Wei Li, Haifeng Wang, and Enhong Chen. 2016. Chinese poetry generation with planning based neural network. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING’16). 1051--1060.Google Scholar
- Ronald J. Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. In Reinforcement Learning. Springer, 5--32.Google Scholar
- Jie Wu, Haifeng Hu, and Yi Wu. 2018. Image captioning via semantic guidance attention and consensus selection strategy. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14, 4 (2018), 87.Google Scholar
- Lingxiang Wu, Min Xu, Jinqiao Wang, and Stuart Perry. 2020. Recall what you see continually using GridLSTM in image captioning. IEEE Trans. Multimedia 22, 3 (2020), 808--818.Google Scholar
Cross Ref
- Xiaofeng Wu, Naoko Tosa, and Ryohei Nakatsu. 2009. New hitch haiku: An interactive renku poem composition supporting tool applied for sightseeing navigation system. In Proceedings of the International Conference on Entertainment Computing. Springer, 191--196.Google Scholar
Digital Library
- Chen Xing, Wei Wu, Yu Wu, Jie Liu, Yalou Huang, Ming Zhou, and Wei-Ying Ma. 2017. Topic aware neural response generation. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’17), Vol. 17. 3351--3357.Google Scholar
- Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the International Conference on Machine Learning. 2048--2057.Google Scholar
Digital Library
- Linli Xu, Liang Jiang, Chuan Qin, Zhe Wang, and Dongfang Du. 2018. How images inspire poems: Generating classical Chinese poetry from images with memory networks. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18).Google Scholar
- Shijie Yang, Liang Li, Shuhui Wang, Weigang Zhang, Qingming Huang, and Qi Tian. 2019. SkeletonNet: A hybrid network with a skeleton-embedding process for multi-view image representation learning. IEEE Trans. Multimedia 21, 11 (2019), 2916--2929.Google Scholar
Cross Ref
- Zhilin Yang, Ye Yuan, Yuexin Wu, William W Cohen, and Ruslan R Salakhutdinov. 2016. Review networks for caption generation. In Advances in Neural Information Processing Systems. MIT Press, 2361--2369.Google Scholar
- Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. 2016. Image captioning with semantic attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4651--4659.Google Scholar
Cross Ref
- Jiyuan Zhang, Yang Feng, Dong Wang, Yang Wang, Andrew Abel, Shiyue Zhang, and Andi Zhang. 2017. Flexible and creative Chinese poetry generation using neural memory. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17). 1364--1373.Google Scholar
Cross Ref
- Xingxing Zhang and Mirella Lapata. 2014. Chinese poetry generation with recurrent neural networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 670--680.Google Scholar
Cross Ref
- Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2223--2232.Google Scholar
Index Terms
Image to Modern Chinese Poetry Creation via a Constrained Topic-aware Model
Recommendations
TPoet: Topic-Enhanced Chinese Poetry Generation
Chinese poetry generation has been a challenging part of natural language processing due to the unique literariness and aesthetics of poetry. In most cases, the content of poetry is topic related. In other words, specific thoughts or emotions are usually ...
Poetic Expression Through Scenery: Sentimental Chinese Classical Poetry Generation from Images
Database Systems for Advanced ApplicationsAbstractMost Chinese poetry generation methods only accept texts or user-specified words as input, which contradicts with the fact that ancient Chinese wrote poems inspired by visions, hearings and feelings. This paper proposes a method to generate ...
A new context-aware approach for automatic Chinese poetry generation▪
AbstractChinese poetry has been a favorite literary genre for thousands of years. Chinese ancient poetry is still being read and practiced, and many famous ancient Chinese poets are honored and adorned. Recently, deep learning has been widely ...






Comments