Abstract
This text offers a personal and very subjective view on the current situation of Music Information Research (MIR). Motivated by the desire to build systems with a somewhat deeper understanding of music than the ones we currently have, I try to sketch a number of challenges for the next decade of MIR research, grouped around six simple truths about music that are probably generally agreed on but often ignored in everyday research.
- S. Abdallah and M. Plumbley. 2008. Information dynamics: Patterns of expectation and surprise in the perception of music. Connect. Sci. 21, 2--3 (2008), 89--117. Google Scholar
Digital Library
- A. Arzt and G. Widmer. 2010. Simple tempo models for real-time music tracking. In Proceedings of the 7th Sound and Music Computing Conference (SMC 2010). Barcelona, Spain.Google Scholar
- G. Assayag and S. Dubnov. 2004. Using factor oracles for machine improvisation. Soft Comput. 8 (2004), 1--7. Google Scholar
Digital Library
- J. J. Aucouturier and F. Pachet. 2004. Improving timbre similarity: How high is the sky. J. Neg. Res/Speech Audio Sci. 1, 1 (2004), 1--13.Google Scholar
- L. Barrington, A. Chan, and G. Lanckriet. 2010. Modeling music as a dynamic texture. IEEE Trans. Audio Speech Lang/ Process/ 18, 3 (2010), 602--612.Google Scholar
Digital Library
- Y. Bengio. 2009. Learning deep architectures for AI. Found. Trends Mach. Learn. 2, 1 (2009), 1--127. Google Scholar
Digital Library
- S. Böck and M. Schedl. 2012. Polyphonic piano note transcription with recurrent neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’12).Google Scholar
- R. Bod. 2002. Memory-based models of melodic analysis: Challenging the Gestalt principles. J. New Music Res. 30, 3 (2002), 27--36.Google Scholar
Cross Ref
- N. Boulanger-Lewandowski, Y. Bengio, and P. Vincent. 2012. Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. In Proceedings of the 29th International Conference on Machine Learning (ICML’12).Google Scholar
- T. Collins, S. Böck, F. Krebs, and G. Widmer. 2014. Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio. In Proceedings of the 53rd AES Conference on Semantic Audio. London, UK.Google Scholar
- D. Deutsch. 2013. Grouping mechanisms in music. In The Psychology of Music (3rd Ed.), D. Deutsch (Ed.). Academic Press.Google Scholar
- S. Dubnov, S. McAdams, and R. Reynolds. 2006. Structural and affective aspects of music from statistical audio signal analysis. J. Am. Soc. Info. Sci. Technol. 57, 11 (2006), 1526--1536. Google Scholar
Digital Library
- D. Eck and J. Schmidhuber. 2002. Learning the long-term structure of the blues. In Artificial Neural Networks, (ICANN’02). Springer Verlag, Berlin, 284--289. Google Scholar
Digital Library
- F. Eyben, S. Böck, B. Schuller, and A. Graves. 2010. Universal onset detection with bidirectional long short-term memory neural networks. In Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR’10).Google Scholar
- A. Flexer, E. Pampalk, and G. Widmer. 2005. Hidden Markov models for spectral similarity of songs. In Proceedings of the 8th International Conference on Digital Audio Effects (DAFx’05).Google Scholar
- A. Gabrielsson. 2002. Emotion perceived and emotion felt: Same or different? Music. Sci. Special Issue 2001--2002 (2002), 123--147.Google Scholar
- A. Gabrielsson and E. Lindström. 2010. The role of structure in the musical expression of emotions. In Handbook of Music and Emotion: Theory, Research, Applications, P. Juslin and J. Sloboda (Eds.). Oxford University Press, New York, NY, 367--400.Google Scholar
- M. Grachten and F. Krebs. 2014. An assessment of learned score features for modeling expressive dynamics in music. IEEE Trans. Multimed. 16, 5 (2014), 1211--1218.Google Scholar
Cross Ref
- M. Hamanaka, K. Hirata, and S. Tojo. 2006. Implementing a generative theory of tonal music. J. New Music Res. 35, 4 (2006), 249--277.Google Scholar
Cross Ref
- J. Hawkins and D. George. 2006. Hierarchical Temporal Memory: Concepts, Theory, and Terminology. Numenta, technical report.Google Scholar
- K. He, X. Zhang, S. Ren, and J. Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-level Performance on Imagenet Classification. arxiv preprint arxiv:1502.01852 (2015).Google Scholar
- P. Herrera, J. Serrà, C. Laurier, E. Guaus, E. Gómez, and X. Serra. 2009. The discipline formerly known as MIR. In Proceedings of the 10th International Society for Music Information Retrieval Conference (ISMIR’10).Google Scholar
- S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neur. Comput. 9, 8 (1997), 1735--1780. Google Scholar
Digital Library
- E. Humphrey, J. Bello, and Y. LeCun. 2013. Feature learning and deep architectures: New directions for music informatics. J. Intell. Info. Syst. 41 (2013), 461--481. Google Scholar
Digital Library
- A. Huq, J. P. Bello, and R. Rowe. 2010. Automated music emotion recognition: A systematic evaluation. J. New Music Res. 39, 3 (2010), 227--244.Google Scholar
Cross Ref
- D. Huron. 2006. Sweet Anticipation: Music and the Psychology of Expectation. MIT Press, Cambridge, MA.Google Scholar
Cross Ref
- P. Juslin. 2013. What does music express? Basic emotions and beyond. Front. Psychol. 4, article 596 (2013).Google Scholar
- Y. Kim, E. Schmidt, R. Migneco, B. Morton, P. Richardson, J. Scott, J. Speck, and D. Turnbull. 2010. Music emotion recognition: A state of the art review. In Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR’10).Google Scholar
- P. Knees and M. Schedl. 2013. A survey of music similarity and recommendation from music context data. ACM Trans. Multimed. Comput. Commun. Appl. 10, 1 (2013), 2:1--2:21. Google Scholar
Digital Library
- F. Krebs, S. Böck, and G. Widmer. 2015. An efficient state-space model for joint tempo and meter tracking. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR’15).Google Scholar
- M. Leman. 2008. Embodied Music Cognition and Mediation Technology. MIT Press, Cambridge, MA. Google Scholar
Digital Library
- J. London. 2000. Musical expression and musical meaning in context. In 6th International Conference on Music Perception and Cognition (ICMPC’00). Retrieved from http://www.people.carleton.edu/jlondon/musical_expression_and_mus.htm.Google Scholar
- J. Madsen, B. S. Jensen, and J. Larsen. 2014. Modeling temporal structure in music for emotion prediction using pairwise comparisons. In Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR’14).Google Scholar
- L. B. Meyer. 1956. Emotion and Meaning in Music. Chicago University Press, Chicago, IL.Google Scholar
- A. Moles. 1966. Information Theory and Aesthetic Perception. University of Illinois Press, Urbana, IL.Google Scholar
- M. Müller. 2015. Fundamentals of Music Processing. Audio, Analysis, Algorithms, Applications. Springer Verlag, Berlin. Google Scholar
Digital Library
- E. Narmour. 1992. The Analysis and Cognition of Melodic Complexity: The Implication-Realization Model. University of Chicago Press, Chicago, IL.Google Scholar
- J. Nika and M. Chemillier. 2012. Improtek: Integrating harmonic controls into improvisation in the filiation of OMax. In Proceedings of International Computer Music Conference (ICMC’12).Google Scholar
- F. Pachet. 2003. The continuator: Musical interaction with style. J. New Music Res. 32, 3 (2003), 333--341.Google Scholar
Cross Ref
- C. Palmer. 1997. Music performance. Annu. Rev. Psychol. 48, 1 (1997), 115--138.Google Scholar
Cross Ref
- A. Papadopoulos, F. Pachet, P. Roy, and J. Sakellariou. 2015. Exact sampling for regular and Markov constraints with belief propagation. In Proceedings of the 21st International Conference on Principles and Practice of Constraint Programming (CP’15).Google Scholar
- A. Patel. 2008. Music, Language and the Brain. Oxford University Press, Oxford, UK.Google Scholar
- J. Paulus, M. Müller, and A. Klapuri. 2010. State of the art report: Audio-based music structure analysis. In 11th International Society for Music Information Retrieval Conference (ISMIR’10).Google Scholar
- M. Pearce, M. Herrojo Ruiz, S. Kapasi, G. Wiggins, and J. Bhattacharya. 2010a. Unsupervised statistical learning underpins computational, behavioural, and neural manifestations of musical expectation. NeuroImage 50, 1 (2010), 302--313.Google Scholar
Cross Ref
- M. Pearce, D. Müllensiefen, and G. Wiggins. 2010b. The role of expectation and probabilistic learning in auditory boundary perception: A model comparison. Perception 39, 10 (2010), 1365--1391.Google Scholar
Cross Ref
- M. Pearce and G. Wiggins. 2012. Auditory expectation: The information dynamics of music perception and cognition. Top. Cogn. Sci. 4 (2012), 625--652.Google Scholar
Cross Ref
- A. Porter, D. Bogdanov, R. Kaye, R. Tsukanov, and X. Serra. 2015. AcousticBrainz: A community platform for gathering music information obtained from audio. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR’15).Google Scholar
- C. Raphael. 2010. Music plus one and machine learning. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). Haifa, Israel.Google Scholar
Digital Library
- P. Roy and F. Pachet. 2013. Enforcing meter in finite-length Markov sequences. In Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI’13). Google Scholar
Digital Library
- J. A. Russell. 1980. A circumplex model of affect. J. Pers. Soc. Psychol. 39, 6 (1980), 1161--1178.Google Scholar
- P. Russell. 1982. Relationships between judgements of the complexity, pleasingness and interestingness of music. Curr. Psychol. Res. 2 (1982), 195--202.Google Scholar
Cross Ref
- M. Schedl, A. Flexer, and J. Urbano. 2013. The neglected user in music information retrieval research. J. Intell. Info. Syst. 41, 3 (2013), 523--539. Google Scholar
Digital Library
- J. Schlüter and R. Sonnleitner. 2012. Unsupervised feature learning for speech and music detection in radio broadcasts. In Proceedings of the 15th International Conference on Digital Audio Effects (DAFx 2’12).Google Scholar
- X. Serra. 2011. A multicultural approach in music information research. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR’11).Google Scholar
- X. Serra, M. Magas, E. Benetos, M. Chudy, S. Dixon, A. Flexer, E. Gómez, F. Gouyon, P. Herrera, S. Jordà, O. Paytuvi, G. Peeters, J. Schlüter, H. Vinet, and G. Widmer. 2013. Roadmap for Music Information ReSearch. Creative Commons BY-NC-ND 3.0 license ISBN: 978-2-9540351-1-6. Retrieved from http://mires.eecs.qmul.ac.uk.Google Scholar
- S. Sigtia, N. Boulanger-Lewandowski, and S. Dixon. 2015. Audio chord recognition with a hybrid recurrent neural network. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR’15).Google Scholar
- Y. Song, S. Dixon, and M. Pearce. 2012. A survey of music recommendation systems and future perspectives. In Proceedings of the 9th International Symposium on Computer Music Modeling and Retrieval (CMMR’12).Google Scholar
- B. L. Sturm. 2014. A simple method to determine if a music information retrieval system is a horse. IEEE Trans. Multimed. 16, 6 (2014), 1636--1644.Google Scholar
Cross Ref
- D. Temperley. 2007. Music and Probability. MIT Press, Cambridge, MA. Google Scholar
Digital Library
- R. E. Thayer. 1989. The Biopsychology of Mood and Arousal. Oxford University Press, New York, NY.Google Scholar
- K. Ullrich, J. Schlüter, and T. Grill. 2014. Boundary detection in music structure analysis using convolutional neural networks. In Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR’14).Google Scholar
- Y. Vaizman, R. Granot, and G. Lanckriet. 2011. Modeling dynamic patterns for emotional content in music. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR’11).Google Scholar
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. 2015. Show and Tell: A Neural Image Caption Generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVRP 2015).Google Scholar
- M. Wertheimer. 1938. Laws of organization in perceptual forms (reprint). In A Source Book of Gestalt Psychology, W. D. Ellis (Ed.). Kegan Paul, Trench, Trübner & Company, London, 71--88.Google Scholar
- T. Weyde, S. Cottrell, J. Dykes, E. Benetos, D. Wolff, A. Kachkaev, S. Dixon, S. Hargreaves, M. Barthet, N. Gold, S. Abdallah, D. Tidhar, and M. Plumbley. 2015. The digital music lab: A big data infrastructure for digital musicology. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR 2015), Demos and Late Breaking News Session.Google Scholar
- G. Widmer, S. Flossmann, and M. Grachten. 2009. YQX plays Chopin. AI Mag. 30, 3 (2009), 35--48.Google Scholar
Digital Library
- G. Widmer and W. Goebl. 2004. Computational models of expressive music performance: The state of the art. J. New Music Res. 33, 3 (2004), 203--216.Google Scholar
Cross Ref
- G. Wiggins, D. Müllensiefen, and M. Pearce. 2010. On the non-existence of music: Why music theory is a figment of the imagination. Music. Sci. Discuss. Forum 5 (2010), 231--255.Google Scholar
- Y.-H. Yang and H. Chen. 2012. Machine recognition of music emotion: A review. ACM Trans. Intell. Syst. Technol. 3, 3 (2012), 40:1--30. Google Scholar
Digital Library
- M. Zentner, D. Grandjean, and K. Scherer. 2008. Emotions evoked by the sound of music. Characterization, classification, and measurement. Emotion 8, 4 (2008), 494--521.Google Scholar
Cross Ref
Index Terms
Getting Closer to the Essence of Music: The Con Espressione Manifesto
Recommendations
Influence of musical elements on the perception of ‘Chinese style’ in music
AbstractRecently Chinese music has been regarded as an independent music school. However, the definition of Chinese music has been an abstract and subjective concept. That makes music information retrieval (MIR) tasks hard to perform on Chinese music. ...
A behavioral study of emotions in south indian classical music andits implications in music recommendation systems
SAPMIA '10: Proceedings of the 2010 ACM workshop on Social, adaptive and personalized multimedia interaction and accessIn order to model a culture-specific content-based music recommendatio/n system, a total of 750 subjective emotional responses to tunes composed in popular raagas of South Indian classical (Carnatic) music are empirically investigated to find out the ...
From raw audio to a seamless mix: creating an automated DJ system for Drum and Bass
We present the open-source implementation of the first fully automatic and comprehensive DJ system, able to generate seamless music mixes using songs from a given library much like a human DJ does.
The proposed system is built on top of several enhanced ...






Comments