skip to main content
research-article

Conditional LSTM-GAN for Melody Generation from Lyrics

Authors Info & Claims
Published:16 April 2021Publication History
Skip Abstract Section

Abstract

Melody generation from lyrics has been a challenging research issue in the field of artificial intelligence and music, which enables us to learn and discover latent relationships between interesting lyrics and accompanying melodies. Unfortunately, the limited availability of a paired lyrics–melody dataset with alignment information has hindered the research progress. To address this problem, we create a large dataset consisting of 12,197 MIDI songs each with paired lyrics and melody alignment through leveraging different music sources where alignment relationship between syllables and music attributes is extracted. Most importantly, we propose a novel deep generative model, conditional Long Short-Term Memory (LSTM)–Generative Adversarial Network for melody generation from lyrics, which contains a deep LSTM generator and a deep LSTM discriminator both conditioned on lyrics. In particular, lyrics-conditioned melody and alignment relationship between syllables of given lyrics and notes of predicted melody are generated simultaneously. Extensive experimental results have proved the effectiveness of our proposed lyrics-to-melody generative model, where plausible and tuneful sequences can be inferred from lyrics.

References

  1. Geraint A. Wiggins. 2006. A preliminary framework for description, analysis and comparison of creative systems. J. Knowl. Based Syst. 19, 7 (2006), 449–458.Google ScholarGoogle Scholar
  2. L. A. Hiller and L. M. Isaacson. 1958. Musical composition with a High-Speed digital computer. J. Aud. Eng. Soc. 6, 3 (1958), 154–160.Google ScholarGoogle Scholar
  3. D. Ponsford, G. Wiggins, and C. Mellish. 1999. Statistical learning of harmonic movement. J. New Mus. Res. 28, 2 (1999), 150–177.Google ScholarGoogle Scholar
  4. Jean-Pierre Briot and François Pachet. 2017. Music generation by deep learning—Challenges and directions. arxiv:1712.04371. Retrieved from http://arxiv.org/abs/1712.04371.Google ScholarGoogle Scholar
  5. Y. Yu, S. Tang, F. Raposo, and L. Chen. Deep cross-modal correlation learning for audio and lyrics in music retrieval. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1, Article 20 (2019), 1--16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Wikipedia. Melody. Retrieved from https://en.wikipedia.org/wiki/Melody.Google ScholarGoogle Scholar
  7. Marco Scirea, Gabriella A. B. Barros, Noor Shaker, and Julian Togelius. 2015. SMUG: Scientific music generator. In Proceedings of the 6th International Conference on Computational Creativity. 204–211.Google ScholarGoogle Scholar
  8. Margareta Ackerman and David Loker. 2016. Algorithmic songwriting with ALYSIA. arxiv:1612.01058. Retrieved from http://arxiv.org/abs/1612.01058.Google ScholarGoogle Scholar
  9. Hangbo Bao, Shaohan Huang, Furu Wei, Lei Cui, Yu Wu, Chuanqi Tan, Songhao Piao, and Ming Zhou. 2018. Neural melody composition from lyrics. arxiv:1809.04318. Retrieved from http://arxiv.org/abs/1809.04318.Google ScholarGoogle Scholar
  10. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative adversarial networks. arxiv:stat.ML/1406.2661. Retrieved from https://arxiv.org/abs/1406.2661.Google ScholarGoogle Scholar
  11. M. Yuan and Y. Peng. 2020. Bridge-GAN: Interpretable representation learning for text-to-image synthesis. IEEE Trans. Circ. Syst. Vid. Technol. 30, 11 (2020), 4258–4268.Google ScholarGoogle Scholar
  12. Kangle Deng, Tianyi Fei, Xin Huang, and Yuxin Peng. IRC-GAN: Introspective recurrent convolutional GAN for text-to-video generation. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19), Sarit Kraus (Ed.). 2216–2222. https://doi.org/10.24963/ijcai.2019/307Google ScholarGoogle ScholarCross RefCross Ref
  13. Weili Nie, Nina Narodytska, and Ankit Patel. RelGAN: Relational generative adversarial networks for text generation. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19).Google ScholarGoogle Scholar
  14. Jose David Fernández Rodriguez and Francisco J. Vico. 2014. AI methods in algorithmic composition: A comprehensive survey. CoRR abs/1402.0585 (2014).Google ScholarGoogle Scholar
  15. Torsten Anders and Eduardo R. Miranda. 2011. Constraint programming systems for modeling music theories and composition. ACM Comput. Surv. 43, 4, Article 30 (2011).Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Miguel Delgado, Waldo Fajardo, and Miguel Molina-Solana. 2009. Inmamusys: Intelligent multiagent music system. Expert Syst. Appl. 36, 3, Part 1 (2009), 4574--4580.Google ScholarGoogle Scholar
  17. Darrell Conklin. 2003. Music generation from statistical models. In Proceedings of the SAISB Symposium on Artificial Intelligence and Creativity in the Arts and Sciences. 30–35.Google ScholarGoogle Scholar
  18. Arne Eigenfeldt and Philippe Pasquier. 2010. Realtime generation of harmonic progressions using controlled Markov selection. In Proceedings of the International Conference on Computational Creativity. 16–25.Google ScholarGoogle Scholar
  19. David Cope. 2005. Computer Models of Musical Creativity. The MIT Press.Google ScholarGoogle Scholar
  20. Jian Wu, Changran Hu, Yulong Wang, Xiaolin Hu, and Jun Zhu. 2017. A hierarchical recurrent neural network for symbolic melody generation. arxiv:1712.05274. Retrieved from https://arxiv.org/abs/1712.05274.Google ScholarGoogle Scholar
  21. Daniel D. Johnson. 2017. Generating polyphonic music using tied parallel networks. In Proceedings of the International Conference on Evolutionary and Biologically Inspired Music and Art. 128–143.Google ScholarGoogle ScholarCross RefCross Ref
  22. Olof Mogren. 2016. C-RNN-GAN: Continuous recurrent neural networks with adversarial training. arxiv:1611.09904. Retrieved from https://arxiv.org/abs/1611.09904.Google ScholarGoogle Scholar
  23. Satoru Fukayama, Kei Nakatsuma, Shinji Sako, Takuya Nishimoto, and Shigeki Sagayama. 2010. Automatic song composition from the lyrics exploiting prosody of the Japanese language. In Proceedings of the International Conference of Sound and Music Computing. 299–302.Google ScholarGoogle Scholar
  24. Kristine Monteith, Tony R. Martinez, and Dan Ventura. 2012. Automatic generation of melodic accompaniments for lyrics. In Proceedings of the 3rd International Conference on Computational Creativity, 2012. 87–94.Google ScholarGoogle Scholar
  25. Retrieved from http://www.musiccrashcourses.com/lessons/pitch.html/.Google ScholarGoogle Scholar
  26. Wikipedia. Duration. Retrieved from https://en.wikipedia.org/wiki/Duration_(music)/.Google ScholarGoogle Scholar
  27. Wikipedia. Rest. Retrieved from https://en.wikipedia.org/wiki/Rest_(music)/.Google ScholarGoogle Scholar
  28. Wikipedia. Syllable. Retrieved from https://en.wikipedia.org/wiki/Syllable.Google ScholarGoogle Scholar
  29. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781. Retrieved from https://arxiv.org/abs/1301.3781.Google ScholarGoogle Scholar
  30. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780. DOI:http://dx.doi.org/10.1162/neco.1997.9.8.1735Google ScholarGoogle Scholar
  31. Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arxiv:1411.1784. Retrieved from http://arxiv.org/abs/1411.1784.Google ScholarGoogle Scholar
  32. Retrieved from https://colinraffel.com/projects/lmd/.Google ScholarGoogle Scholar
  33. Retrieved from https://www.reddit.com/r/datasets/.Google ScholarGoogle Scholar
  34. Alex Smola, Arthur Gretton, Le Song, and Bernhard Schölkopf. 2007. A hilbert space embedding for distributions. In Algorithmic Learning Theory, Marcus Hutter, Rocco A. Servedio, and Eiji Takimoto (Eds.). Springer, Berlin.Google ScholarGoogle Scholar
  35. Wacha Bounliphone, Eugene Belilovsky, Matthew B. Blaschko, Ioannis Antonoglou, and Arthur Gretton. 2015. A test of relative similarity for model selection in generative models. arxiv:1511.04581. Retrieved from https://arxiv.org/abs/1511.04581.Google ScholarGoogle Scholar
  36. Sashank J. Reddi, Aaditya Ramdas, Barnabas Poczos, Aarti Singh, and Larry Wasserman. On the decreasing power of kernel and distance based nonparametric hypothesis tests in high dimensions. 2015. In Proc. AAAI. 3571--3577.Google ScholarGoogle Scholar
  37. Hsin-Pei Lee, Jhih-Sheng Fang, and Wei-Yun Ma. iComposer: An automatic songwriting system for Chinese popular music. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations).Google ScholarGoogle Scholar
  38. Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2016. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. arxiv:cs.LG/1609.05473. Retrieved from https://arxiv.org/abs/1609.05473.Google ScholarGoogle Scholar
  39. https://synthesizerv.com/en/. ([n.d.]).Google ScholarGoogle Scholar
  40. Eric Jang, Shixiang Gu, and Ben Poole. 2016. Categorical reparameterization with Gumbel-Softmax. arxiv:stat.ML/1611.01144. Retrieved from https://arxiv.org/abs/1611.01144.Google ScholarGoogle Scholar

Index Terms

  1. Conditional LSTM-GAN for Melody Generation from Lyrics

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 1
        February 2021
        392 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3453992
        Issue’s Table of Contents

        Copyright © 2021 Association for Computing Machinery.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 April 2021
        • Revised: 1 September 2020
        • Accepted: 1 September 2020
        • Received: 1 January 2020
        Published in tomm Volume 17, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!