skip to main content
research-article

Image to Modern Chinese Poetry Creation via a Constrained Topic-aware Model

Authors Info & Claims
Published:22 May 2020Publication History
Skip Abstract Section

Abstract

Artificial creativity has attracted increasing research attention in the field of multimedia and artificial intelligence. Despite the promising work on poetry/painting/music generation, creating modern Chinese poetry from images, which can significantly enrich the functionality of photo-sharing platforms, has rarely been explored. Moreover, existing generation models cannot tackle three challenges in this task: (1) Maintaining semantic consistency between images and poems; (2) preventing topic drift in the generation; (3) avoidance of certain words appearing frequently. These three points are even common challenges in other sequence generation tasks. In this article, we propose a Constrained Topic-aware Model (CTAM) to create modern Chinese poetries from images regarding the challenges above. Without image-poetry paired dataset, we construct a visual semantic vector to embed visual contents via image captions. For the topic-drift problem, we propose a topic-aware poetry generation model. Additionally, we design an Anti-frequency Decoding (AFD) scheme to constrain high-frequency characters in the generation. Experimental results show that our model achieves promising performance and is effective in poetry’s readability and semantic consistency.

References

  1. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations (ICLR’15).Google ScholarGoogle Scholar
  2. Lalit Bahl, Peter Brown, Peter De Souza, and Robert Mercer. 1986. Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’86), Vol. 11. IEEE, 49--52.Google ScholarGoogle ScholarCross RefCross Ref
  3. Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollár, and C. Lawrence Zitnick. 2015. Microsoft COCO captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325.Google ScholarGoogle Scholar
  4. Simon Colton, Jacob Goodwin, and Tony Veale. 2012. Full-FACE poetry generation. In Proceedings of the International Conference on Computational Creativity (ICCC’12). 95--102.Google ScholarGoogle Scholar
  5. Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, and Rita Cucchiara. 2018. Paying more attention to saliency: Image captioning with saliency and context attention. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2 (2018), 48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ali Farhadi, Mohsen Hejrati, Mohammad Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth. 2010. Every picture tells a story: Generating sentences from images. In Proceedings of the European Conference on Computer Vision (ECCV’10). 15--29.Google ScholarGoogle ScholarCross RefCross Ref
  7. Chuang Gan, Zhe Gan, Xiaodong He, Jianfeng Gao, and Li Deng. 2017. Stylenet: Generating attractive visual captions with styles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3137--3146.Google ScholarGoogle ScholarCross RefCross Ref
  8. Marjan Ghazvininejad, Xing Shi, Yejin Choi, and Kevin Knight. 2016. Generating topical poetry. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1183--1191.Google ScholarGoogle ScholarCross RefCross Ref
  9. Chen He and Haifeng Hu. 2019. Image captioning with visual-semantic double attention. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1 (2019), 26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  11. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735--1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Micah Hodosh, Peter Young, and Julia Hockenmaier. 2013. Framing image description as a ranking task: Data, models and evaluation metrics. J. Artific. Intell. Res. 47 (2013), 853--899.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jack Hopkins and Douwe Kiela. 2017. Automatically generating rhythmic verse with neural networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 168--178.Google ScholarGoogle ScholarCross RefCross Ref
  14. Nal Kalchbrenner, Ivo Danihelka, and Alex Graves. 2016. Grid long short-term memory. In Proceedings of the International Conference on Learning Representations (ICLR’16).Google ScholarGoogle Scholar
  15. Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR’15).Google ScholarGoogle Scholar
  16. Girish Kulkarni, Visruth Premraj, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C. Berg, and Tamara L. Berg. 2011. Baby talk: Understanding and generating image descriptions. In Proceedings of the 24th Conference on Computer Vision and Pattern Recognition (CVPR’11). Citeseer.Google ScholarGoogle Scholar
  17. Michael Denkowski Alon Lavie. 2014. Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’14). 376.Google ScholarGoogle Scholar
  18. Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A diversity-promoting objective function for neural conversation models. In Proceedings of Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT’16). 110--119.Google ScholarGoogle ScholarCross RefCross Ref
  19. Siming Li, Girish Kulkarni, Tamara L. Berg, Alexander C. Berg, and Yejin Choi. 2011. Composing simple image descriptions using web-scale n-grams. In Proceedings of the 15th Conference on Computational Natural Language Learning. Association for Computational Linguistics, 220--228.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’04), Vol. 8.Google ScholarGoogle Scholar
  21. Bei Liu, Jianlong Fu, Makoto P. Kato, and Masatoshi Yoshikawa. 2018. Beyond narrative description: Generating poetry from images by multi-adversarial training. In Proceedings of the ACM Multimedia Conference on Multimedia Conference. ACM, 783--791.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yu Liu, Jianlong Fu, Tao Mei, and Chang Wen Chen. 2017. Let your photos talk: Generating narrative paragraph for photo stream via bidirectional attention recurrent neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’17). 1445--1452.Google ScholarGoogle Scholar
  23. Alexander Patrick Mathews, Lexing Xie, and Xuming He. 2016. SentiCap: Generating image descriptions with sentiments. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’16). 3574--3580.Google ScholarGoogle Scholar
  24. Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google ScholarGoogle Scholar
  25. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the International Conference on Learning Representations (ICLR’13).Google ScholarGoogle Scholar
  26. Yael Netzer, David Gabay, Yoav Goldberg, and Michael Elhadad. 2009. Gaiku: Generating haiku with word associations norms. In Proceedings of the Workshop on Computational Approaches to Linguistic Creativity. Association for Computational Linguistics, 32--39.Google ScholarGoogle ScholarCross RefCross Ref
  27. Hugo Gonçalo Oliveira. 2012. PoeTryMe: A versatile platform for poetry generation. Comput. Creat. Concept Invent. Gen. Intell. 1 (2012), 21.Google ScholarGoogle Scholar
  28. Vicente Ordonez, Girish Kulkarni, and Tamara L. Berg. 2011. Im2text: Describing images using 1 million captioned photographs. In Advances in Neural Information Processing Systems. MIT Press, 1143--1151.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 311--318.Google ScholarGoogle Scholar
  30. Shengsheng Qian, Tianzhu Zhang, and Changsheng Xu. 2016. Multi-modal multi-view topic-opinion mining for social event analysis. In Proceedings of the 24th ACM International Conference on Multimedia. 2--11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Shengsheng Qian, Tianzhu Zhang, Changsheng Xu, and Jie Shao. 2015. Multi-modal event topic model for social event analysis. IEEE Trans. Multimedia 18, 2 (2015), 233--246.Google ScholarGoogle ScholarCross RefCross Ref
  32. Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. 2016. Sequence level training with recurrent neural networks. In Proceedings of the International Conference on Learning Representations (ICLR’16).Google ScholarGoogle Scholar
  33. Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, and Vaibhava Goel. 2017. Self-critical sequence training for image captioning. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’17), Vol. 1. 3.Google ScholarGoogle ScholarCross RefCross Ref
  34. Bob L. Sturm, Joao Felipe Santos, Oded Ben-Tal, and Iryna Korshunova. 2016. Music transcription modelling and composition using deep learning. arXiv preprint arXiv:1604.08723.Google ScholarGoogle Scholar
  35. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. MIT Press, 3104--3112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Naoko Tosa, Hideto Obara, and Michihiko Minoh. 2008. Hitch haiku: An interactive supporting system for composing haiku poem. In Proceedings of the International Conference on Entertainment Computing. Springer, 209--216.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ramakrishna Vedantam, C. Lawrence Zitnick, and Devi Parikh. 2015. Cider: Consensus-based image description evaluation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4566--4575.Google ScholarGoogle ScholarCross RefCross Ref
  38. Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3156--3164.Google ScholarGoogle ScholarCross RefCross Ref
  39. Anqi Wang, Haifeng Hu, and Liang Yang. 2018. Image captioning with affective guiding and selective attention. ACM Trans. Multimedia Comput. Commun. Appl. 14, 3 (2018), 73.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Cheng Wang, Haojin Yang, and Christoph Meinel. 2018. Image captioning with deep bidirectional lstms and multi-task learning. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s (2018), 40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Qixin Wang, Tianyi Luo, Dong Wang, and Chao Xing. 2016. Chinese song iambics generation with neural attention-based model. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’16).Google ScholarGoogle Scholar
  42. Zhe Wang, Wei He, Hua Wu, Haiyang Wu, Wei Li, Haifeng Wang, and Enhong Chen. 2016. Chinese poetry generation with planning based neural network. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING’16). 1051--1060.Google ScholarGoogle Scholar
  43. Ronald J. Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. In Reinforcement Learning. Springer, 5--32.Google ScholarGoogle Scholar
  44. Jie Wu, Haifeng Hu, and Yi Wu. 2018. Image captioning via semantic guidance attention and consensus selection strategy. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14, 4 (2018), 87.Google ScholarGoogle Scholar
  45. Lingxiang Wu, Min Xu, Jinqiao Wang, and Stuart Perry. 2020. Recall what you see continually using GridLSTM in image captioning. IEEE Trans. Multimedia 22, 3 (2020), 808--818.Google ScholarGoogle ScholarCross RefCross Ref
  46. Xiaofeng Wu, Naoko Tosa, and Ryohei Nakatsu. 2009. New hitch haiku: An interactive renku poem composition supporting tool applied for sightseeing navigation system. In Proceedings of the International Conference on Entertainment Computing. Springer, 191--196.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Chen Xing, Wei Wu, Yu Wu, Jie Liu, Yalou Huang, Ming Zhou, and Wei-Ying Ma. 2017. Topic aware neural response generation. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’17), Vol. 17. 3351--3357.Google ScholarGoogle Scholar
  48. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the International Conference on Machine Learning. 2048--2057.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Linli Xu, Liang Jiang, Chuan Qin, Zhe Wang, and Dongfang Du. 2018. How images inspire poems: Generating classical Chinese poetry from images with memory networks. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18).Google ScholarGoogle Scholar
  50. Shijie Yang, Liang Li, Shuhui Wang, Weigang Zhang, Qingming Huang, and Qi Tian. 2019. SkeletonNet: A hybrid network with a skeleton-embedding process for multi-view image representation learning. IEEE Trans. Multimedia 21, 11 (2019), 2916--2929.Google ScholarGoogle ScholarCross RefCross Ref
  51. Zhilin Yang, Ye Yuan, Yuexin Wu, William W Cohen, and Ruslan R Salakhutdinov. 2016. Review networks for caption generation. In Advances in Neural Information Processing Systems. MIT Press, 2361--2369.Google ScholarGoogle Scholar
  52. Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. 2016. Image captioning with semantic attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4651--4659.Google ScholarGoogle ScholarCross RefCross Ref
  53. Jiyuan Zhang, Yang Feng, Dong Wang, Yang Wang, Andrew Abel, Shiyue Zhang, and Andi Zhang. 2017. Flexible and creative Chinese poetry generation using neural memory. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17). 1364--1373.Google ScholarGoogle ScholarCross RefCross Ref
  54. Xingxing Zhang and Mirella Lapata. 2014. Chinese poetry generation with recurrent neural networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 670--680.Google ScholarGoogle ScholarCross RefCross Ref
  55. Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2223--2232.Google ScholarGoogle Scholar

Index Terms

  1. Image to Modern Chinese Poetry Creation via a Constrained Topic-aware Model

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 16, Issue 2
      May 2020
      390 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3401894
      Issue’s Table of Contents

      Copyright © 2020 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 May 2020
      • Online AM: 7 May 2020
      • Accepted: 1 February 2020
      • Revised: 1 November 2019
      • Received: 1 April 2019
      Published in tomm Volume 16, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!