Abstract
Melody generation from lyrics has been a challenging research issue in the field of artificial intelligence and music, which enables us to learn and discover latent relationships between interesting lyrics and accompanying melodies. Unfortunately, the limited availability of a paired lyrics–melody dataset with alignment information has hindered the research progress. To address this problem, we create a large dataset consisting of 12,197 MIDI songs each with paired lyrics and melody alignment through leveraging different music sources where alignment relationship between syllables and music attributes is extracted. Most importantly, we propose a novel deep generative model, conditional Long Short-Term Memory (LSTM)–Generative Adversarial Network for melody generation from lyrics, which contains a deep LSTM generator and a deep LSTM discriminator both conditioned on lyrics. In particular, lyrics-conditioned melody and alignment relationship between syllables of given lyrics and notes of predicted melody are generated simultaneously. Extensive experimental results have proved the effectiveness of our proposed lyrics-to-melody generative model, where plausible and tuneful sequences can be inferred from lyrics.
- Geraint A. Wiggins. 2006. A preliminary framework for description, analysis and comparison of creative systems. J. Knowl. Based Syst. 19, 7 (2006), 449–458.Google Scholar
- L. A. Hiller and L. M. Isaacson. 1958. Musical composition with a High-Speed digital computer. J. Aud. Eng. Soc. 6, 3 (1958), 154–160.Google Scholar
- D. Ponsford, G. Wiggins, and C. Mellish. 1999. Statistical learning of harmonic movement. J. New Mus. Res. 28, 2 (1999), 150–177.Google Scholar
- Jean-Pierre Briot and François Pachet. 2017. Music generation by deep learning—Challenges and directions. arxiv:1712.04371. Retrieved from http://arxiv.org/abs/1712.04371.Google Scholar
- Y. Yu, S. Tang, F. Raposo, and L. Chen. Deep cross-modal correlation learning for audio and lyrics in music retrieval. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1, Article 20 (2019), 1--16.Google Scholar
Digital Library
- Wikipedia. Melody. Retrieved from https://en.wikipedia.org/wiki/Melody.Google Scholar
- Marco Scirea, Gabriella A. B. Barros, Noor Shaker, and Julian Togelius. 2015. SMUG: Scientific music generator. In Proceedings of the 6th International Conference on Computational Creativity. 204–211.Google Scholar
- Margareta Ackerman and David Loker. 2016. Algorithmic songwriting with ALYSIA. arxiv:1612.01058. Retrieved from http://arxiv.org/abs/1612.01058.Google Scholar
- Hangbo Bao, Shaohan Huang, Furu Wei, Lei Cui, Yu Wu, Chuanqi Tan, Songhao Piao, and Ming Zhou. 2018. Neural melody composition from lyrics. arxiv:1809.04318. Retrieved from http://arxiv.org/abs/1809.04318.Google Scholar
- I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative adversarial networks. arxiv:stat.ML/1406.2661. Retrieved from https://arxiv.org/abs/1406.2661.Google Scholar
- M. Yuan and Y. Peng. 2020. Bridge-GAN: Interpretable representation learning for text-to-image synthesis. IEEE Trans. Circ. Syst. Vid. Technol. 30, 11 (2020), 4258–4268.Google Scholar
- Kangle Deng, Tianyi Fei, Xin Huang, and Yuxin Peng. IRC-GAN: Introspective recurrent convolutional GAN for text-to-video generation. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19), Sarit Kraus (Ed.). 2216–2222. https://doi.org/10.24963/ijcai.2019/307Google Scholar
Cross Ref
- Weili Nie, Nina Narodytska, and Ankit Patel. RelGAN: Relational generative adversarial networks for text generation. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19).Google Scholar
- Jose David Fernández Rodriguez and Francisco J. Vico. 2014. AI methods in algorithmic composition: A comprehensive survey. CoRR abs/1402.0585 (2014).Google Scholar
- Torsten Anders and Eduardo R. Miranda. 2011. Constraint programming systems for modeling music theories and composition. ACM Comput. Surv. 43, 4, Article 30 (2011).Google Scholar
Digital Library
- Miguel Delgado, Waldo Fajardo, and Miguel Molina-Solana. 2009. Inmamusys: Intelligent multiagent music system. Expert Syst. Appl. 36, 3, Part 1 (2009), 4574--4580.Google Scholar
- Darrell Conklin. 2003. Music generation from statistical models. In Proceedings of the SAISB Symposium on Artificial Intelligence and Creativity in the Arts and Sciences. 30–35.Google Scholar
- Arne Eigenfeldt and Philippe Pasquier. 2010. Realtime generation of harmonic progressions using controlled Markov selection. In Proceedings of the International Conference on Computational Creativity. 16–25.Google Scholar
- David Cope. 2005. Computer Models of Musical Creativity. The MIT Press.Google Scholar
- Jian Wu, Changran Hu, Yulong Wang, Xiaolin Hu, and Jun Zhu. 2017. A hierarchical recurrent neural network for symbolic melody generation. arxiv:1712.05274. Retrieved from https://arxiv.org/abs/1712.05274.Google Scholar
- Daniel D. Johnson. 2017. Generating polyphonic music using tied parallel networks. In Proceedings of the International Conference on Evolutionary and Biologically Inspired Music and Art. 128–143.Google Scholar
Cross Ref
- Olof Mogren. 2016. C-RNN-GAN: Continuous recurrent neural networks with adversarial training. arxiv:1611.09904. Retrieved from https://arxiv.org/abs/1611.09904.Google Scholar
- Satoru Fukayama, Kei Nakatsuma, Shinji Sako, Takuya Nishimoto, and Shigeki Sagayama. 2010. Automatic song composition from the lyrics exploiting prosody of the Japanese language. In Proceedings of the International Conference of Sound and Music Computing. 299–302.Google Scholar
- Kristine Monteith, Tony R. Martinez, and Dan Ventura. 2012. Automatic generation of melodic accompaniments for lyrics. In Proceedings of the 3rd International Conference on Computational Creativity, 2012. 87–94.Google Scholar
- Retrieved from http://www.musiccrashcourses.com/lessons/pitch.html/.Google Scholar
- Wikipedia. Duration. Retrieved from https://en.wikipedia.org/wiki/Duration_(music)/.Google Scholar
- Wikipedia. Rest. Retrieved from https://en.wikipedia.org/wiki/Rest_(music)/.Google Scholar
- Wikipedia. Syllable. Retrieved from https://en.wikipedia.org/wiki/Syllable.Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781. Retrieved from https://arxiv.org/abs/1301.3781.Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780. DOI:http://dx.doi.org/10.1162/neco.1997.9.8.1735Google Scholar
- Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arxiv:1411.1784. Retrieved from http://arxiv.org/abs/1411.1784.Google Scholar
- Retrieved from https://colinraffel.com/projects/lmd/.Google Scholar
- Retrieved from https://www.reddit.com/r/datasets/.Google Scholar
- Alex Smola, Arthur Gretton, Le Song, and Bernhard Schölkopf. 2007. A hilbert space embedding for distributions. In Algorithmic Learning Theory, Marcus Hutter, Rocco A. Servedio, and Eiji Takimoto (Eds.). Springer, Berlin.Google Scholar
- Wacha Bounliphone, Eugene Belilovsky, Matthew B. Blaschko, Ioannis Antonoglou, and Arthur Gretton. 2015. A test of relative similarity for model selection in generative models. arxiv:1511.04581. Retrieved from https://arxiv.org/abs/1511.04581.Google Scholar
- Sashank J. Reddi, Aaditya Ramdas, Barnabas Poczos, Aarti Singh, and Larry Wasserman. On the decreasing power of kernel and distance based nonparametric hypothesis tests in high dimensions. 2015. In Proc. AAAI. 3571--3577.Google Scholar
- Hsin-Pei Lee, Jhih-Sheng Fang, and Wei-Yun Ma. iComposer: An automatic songwriting system for Chinese popular music. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations).Google Scholar
- Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2016. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. arxiv:cs.LG/1609.05473. Retrieved from https://arxiv.org/abs/1609.05473.Google Scholar
- https://synthesizerv.com/en/. ([n.d.]).Google Scholar
- Eric Jang, Shixiang Gu, and Ben Poole. 2016. Categorical reparameterization with Gumbel-Softmax. arxiv:stat.ML/1611.01144. Retrieved from https://arxiv.org/abs/1611.01144.Google Scholar
Index Terms
Conditional LSTM-GAN for Melody Generation from Lyrics
Recommendations
Lyrics-Conditioned Neural Melody Generation
MultiMedia ModelingAbstractGenerating melody from lyrics to compose a song has been a very interesting research topic in the area of artificial intelligence and music, which tries to predict generative music relationship between lyrics and melody. In this demonstration ...
Melody Generation from Lyrics Using Three Branch Conditional LSTM-GAN
MultiMedia ModelingAbstractWith the availability of paired lyrics-melody dataset and advancements of artificial intelligence techniques, research on melody generation conditioned on lyrics has become possible. In this work, for melody generation, we propose a novel ...
Conditional hybrid GAN for melody generation from lyrics
AbstractConditional sequence generation aims to instruct the generation procedure by conditioning the model with additional context information, which is an interesting research issue in AI and machine learning. Unfortunately, current state-of-the-art ...






Comments