skip to main content
research-article

Getting Closer to the Essence of Music: The Con Espressione Manifesto

Published:03 October 2016Publication History
Skip Abstract Section

Abstract

This text offers a personal and very subjective view on the current situation of Music Information Research (MIR). Motivated by the desire to build systems with a somewhat deeper understanding of music than the ones we currently have, I try to sketch a number of challenges for the next decade of MIR research, grouped around six simple truths about music that are probably generally agreed on but often ignored in everyday research.

References

  1. S. Abdallah and M. Plumbley. 2008. Information dynamics: Patterns of expectation and surprise in the perception of music. Connect. Sci. 21, 2--3 (2008), 89--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Arzt and G. Widmer. 2010. Simple tempo models for real-time music tracking. In Proceedings of the 7th Sound and Music Computing Conference (SMC 2010). Barcelona, Spain.Google ScholarGoogle Scholar
  3. G. Assayag and S. Dubnov. 2004. Using factor oracles for machine improvisation. Soft Comput. 8 (2004), 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. J. Aucouturier and F. Pachet. 2004. Improving timbre similarity: How high is the sky. J. Neg. Res/Speech Audio Sci. 1, 1 (2004), 1--13.Google ScholarGoogle Scholar
  5. L. Barrington, A. Chan, and G. Lanckriet. 2010. Modeling music as a dynamic texture. IEEE Trans. Audio Speech Lang/ Process/ 18, 3 (2010), 602--612.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Bengio. 2009. Learning deep architectures for AI. Found. Trends Mach. Learn. 2, 1 (2009), 1--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Böck and M. Schedl. 2012. Polyphonic piano note transcription with recurrent neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’12).Google ScholarGoogle Scholar
  8. R. Bod. 2002. Memory-based models of melodic analysis: Challenging the Gestalt principles. J. New Music Res. 30, 3 (2002), 27--36.Google ScholarGoogle ScholarCross RefCross Ref
  9. N. Boulanger-Lewandowski, Y. Bengio, and P. Vincent. 2012. Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. In Proceedings of the 29th International Conference on Machine Learning (ICML’12).Google ScholarGoogle Scholar
  10. T. Collins, S. Böck, F. Krebs, and G. Widmer. 2014. Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio. In Proceedings of the 53rd AES Conference on Semantic Audio. London, UK.Google ScholarGoogle Scholar
  11. D. Deutsch. 2013. Grouping mechanisms in music. In The Psychology of Music (3rd Ed.), D. Deutsch (Ed.). Academic Press.Google ScholarGoogle Scholar
  12. S. Dubnov, S. McAdams, and R. Reynolds. 2006. Structural and affective aspects of music from statistical audio signal analysis. J. Am. Soc. Info. Sci. Technol. 57, 11 (2006), 1526--1536. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Eck and J. Schmidhuber. 2002. Learning the long-term structure of the blues. In Artificial Neural Networks, (ICANN’02). Springer Verlag, Berlin, 284--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. F. Eyben, S. Böck, B. Schuller, and A. Graves. 2010. Universal onset detection with bidirectional long short-term memory neural networks. In Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR’10).Google ScholarGoogle Scholar
  15. A. Flexer, E. Pampalk, and G. Widmer. 2005. Hidden Markov models for spectral similarity of songs. In Proceedings of the 8th International Conference on Digital Audio Effects (DAFx’05).Google ScholarGoogle Scholar
  16. A. Gabrielsson. 2002. Emotion perceived and emotion felt: Same or different? Music. Sci. Special Issue 2001--2002 (2002), 123--147.Google ScholarGoogle Scholar
  17. A. Gabrielsson and E. Lindström. 2010. The role of structure in the musical expression of emotions. In Handbook of Music and Emotion: Theory, Research, Applications, P. Juslin and J. Sloboda (Eds.). Oxford University Press, New York, NY, 367--400.Google ScholarGoogle Scholar
  18. M. Grachten and F. Krebs. 2014. An assessment of learned score features for modeling expressive dynamics in music. IEEE Trans. Multimed. 16, 5 (2014), 1211--1218.Google ScholarGoogle ScholarCross RefCross Ref
  19. M. Hamanaka, K. Hirata, and S. Tojo. 2006. Implementing a generative theory of tonal music. J. New Music Res. 35, 4 (2006), 249--277.Google ScholarGoogle ScholarCross RefCross Ref
  20. J. Hawkins and D. George. 2006. Hierarchical Temporal Memory: Concepts, Theory, and Terminology. Numenta, technical report.Google ScholarGoogle Scholar
  21. K. He, X. Zhang, S. Ren, and J. Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-level Performance on Imagenet Classification. arxiv preprint arxiv:1502.01852 (2015).Google ScholarGoogle Scholar
  22. P. Herrera, J. Serrà, C. Laurier, E. Guaus, E. Gómez, and X. Serra. 2009. The discipline formerly known as MIR. In Proceedings of the 10th International Society for Music Information Retrieval Conference (ISMIR’10).Google ScholarGoogle Scholar
  23. S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neur. Comput. 9, 8 (1997), 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. Humphrey, J. Bello, and Y. LeCun. 2013. Feature learning and deep architectures: New directions for music informatics. J. Intell. Info. Syst. 41 (2013), 461--481. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Huq, J. P. Bello, and R. Rowe. 2010. Automated music emotion recognition: A systematic evaluation. J. New Music Res. 39, 3 (2010), 227--244.Google ScholarGoogle ScholarCross RefCross Ref
  26. D. Huron. 2006. Sweet Anticipation: Music and the Psychology of Expectation. MIT Press, Cambridge, MA.Google ScholarGoogle ScholarCross RefCross Ref
  27. P. Juslin. 2013. What does music express? Basic emotions and beyond. Front. Psychol. 4, article 596 (2013).Google ScholarGoogle Scholar
  28. Y. Kim, E. Schmidt, R. Migneco, B. Morton, P. Richardson, J. Scott, J. Speck, and D. Turnbull. 2010. Music emotion recognition: A state of the art review. In Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR’10).Google ScholarGoogle Scholar
  29. P. Knees and M. Schedl. 2013. A survey of music similarity and recommendation from music context data. ACM Trans. Multimed. Comput. Commun. Appl. 10, 1 (2013), 2:1--2:21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. F. Krebs, S. Böck, and G. Widmer. 2015. An efficient state-space model for joint tempo and meter tracking. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR’15).Google ScholarGoogle Scholar
  31. M. Leman. 2008. Embodied Music Cognition and Mediation Technology. MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. London. 2000. Musical expression and musical meaning in context. In 6th International Conference on Music Perception and Cognition (ICMPC’00). Retrieved from http://www.people.carleton.edu/jlondon/musical_expression_and_mus.htm.Google ScholarGoogle Scholar
  33. J. Madsen, B. S. Jensen, and J. Larsen. 2014. Modeling temporal structure in music for emotion prediction using pairwise comparisons. In Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR’14).Google ScholarGoogle Scholar
  34. L. B. Meyer. 1956. Emotion and Meaning in Music. Chicago University Press, Chicago, IL.Google ScholarGoogle Scholar
  35. A. Moles. 1966. Information Theory and Aesthetic Perception. University of Illinois Press, Urbana, IL.Google ScholarGoogle Scholar
  36. M. Müller. 2015. Fundamentals of Music Processing. Audio, Analysis, Algorithms, Applications. Springer Verlag, Berlin. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. E. Narmour. 1992. The Analysis and Cognition of Melodic Complexity: The Implication-Realization Model. University of Chicago Press, Chicago, IL.Google ScholarGoogle Scholar
  38. J. Nika and M. Chemillier. 2012. Improtek: Integrating harmonic controls into improvisation in the filiation of OMax. In Proceedings of International Computer Music Conference (ICMC’12).Google ScholarGoogle Scholar
  39. F. Pachet. 2003. The continuator: Musical interaction with style. J. New Music Res. 32, 3 (2003), 333--341.Google ScholarGoogle ScholarCross RefCross Ref
  40. C. Palmer. 1997. Music performance. Annu. Rev. Psychol. 48, 1 (1997), 115--138.Google ScholarGoogle ScholarCross RefCross Ref
  41. A. Papadopoulos, F. Pachet, P. Roy, and J. Sakellariou. 2015. Exact sampling for regular and Markov constraints with belief propagation. In Proceedings of the 21st International Conference on Principles and Practice of Constraint Programming (CP’15).Google ScholarGoogle Scholar
  42. A. Patel. 2008. Music, Language and the Brain. Oxford University Press, Oxford, UK.Google ScholarGoogle Scholar
  43. J. Paulus, M. Müller, and A. Klapuri. 2010. State of the art report: Audio-based music structure analysis. In 11th International Society for Music Information Retrieval Conference (ISMIR’10).Google ScholarGoogle Scholar
  44. M. Pearce, M. Herrojo Ruiz, S. Kapasi, G. Wiggins, and J. Bhattacharya. 2010a. Unsupervised statistical learning underpins computational, behavioural, and neural manifestations of musical expectation. NeuroImage 50, 1 (2010), 302--313.Google ScholarGoogle ScholarCross RefCross Ref
  45. M. Pearce, D. Müllensiefen, and G. Wiggins. 2010b. The role of expectation and probabilistic learning in auditory boundary perception: A model comparison. Perception 39, 10 (2010), 1365--1391.Google ScholarGoogle ScholarCross RefCross Ref
  46. M. Pearce and G. Wiggins. 2012. Auditory expectation: The information dynamics of music perception and cognition. Top. Cogn. Sci. 4 (2012), 625--652.Google ScholarGoogle ScholarCross RefCross Ref
  47. A. Porter, D. Bogdanov, R. Kaye, R. Tsukanov, and X. Serra. 2015. AcousticBrainz: A community platform for gathering music information obtained from audio. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR’15).Google ScholarGoogle Scholar
  48. C. Raphael. 2010. Music plus one and machine learning. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). Haifa, Israel.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. P. Roy and F. Pachet. 2013. Enforcing meter in finite-length Markov sequences. In Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. J. A. Russell. 1980. A circumplex model of affect. J. Pers. Soc. Psychol. 39, 6 (1980), 1161--1178.Google ScholarGoogle Scholar
  51. P. Russell. 1982. Relationships between judgements of the complexity, pleasingness and interestingness of music. Curr. Psychol. Res. 2 (1982), 195--202.Google ScholarGoogle ScholarCross RefCross Ref
  52. M. Schedl, A. Flexer, and J. Urbano. 2013. The neglected user in music information retrieval research. J. Intell. Info. Syst. 41, 3 (2013), 523--539. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. J. Schlüter and R. Sonnleitner. 2012. Unsupervised feature learning for speech and music detection in radio broadcasts. In Proceedings of the 15th International Conference on Digital Audio Effects (DAFx 2’12).Google ScholarGoogle Scholar
  54. X. Serra. 2011. A multicultural approach in music information research. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR’11).Google ScholarGoogle Scholar
  55. X. Serra, M. Magas, E. Benetos, M. Chudy, S. Dixon, A. Flexer, E. Gómez, F. Gouyon, P. Herrera, S. Jordà, O. Paytuvi, G. Peeters, J. Schlüter, H. Vinet, and G. Widmer. 2013. Roadmap for Music Information ReSearch. Creative Commons BY-NC-ND 3.0 license ISBN: 978-2-9540351-1-6. Retrieved from http://mires.eecs.qmul.ac.uk.Google ScholarGoogle Scholar
  56. S. Sigtia, N. Boulanger-Lewandowski, and S. Dixon. 2015. Audio chord recognition with a hybrid recurrent neural network. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR’15).Google ScholarGoogle Scholar
  57. Y. Song, S. Dixon, and M. Pearce. 2012. A survey of music recommendation systems and future perspectives. In Proceedings of the 9th International Symposium on Computer Music Modeling and Retrieval (CMMR’12).Google ScholarGoogle Scholar
  58. B. L. Sturm. 2014. A simple method to determine if a music information retrieval system is a horse. IEEE Trans. Multimed. 16, 6 (2014), 1636--1644.Google ScholarGoogle ScholarCross RefCross Ref
  59. D. Temperley. 2007. Music and Probability. MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. R. E. Thayer. 1989. The Biopsychology of Mood and Arousal. Oxford University Press, New York, NY.Google ScholarGoogle Scholar
  61. K. Ullrich, J. Schlüter, and T. Grill. 2014. Boundary detection in music structure analysis using convolutional neural networks. In Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR’14).Google ScholarGoogle Scholar
  62. Y. Vaizman, R. Granot, and G. Lanckriet. 2011. Modeling dynamic patterns for emotional content in music. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR’11).Google ScholarGoogle Scholar
  63. O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. 2015. Show and Tell: A Neural Image Caption Generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVRP 2015).Google ScholarGoogle Scholar
  64. M. Wertheimer. 1938. Laws of organization in perceptual forms (reprint). In A Source Book of Gestalt Psychology, W. D. Ellis (Ed.). Kegan Paul, Trench, Trübner & Company, London, 71--88.Google ScholarGoogle Scholar
  65. T. Weyde, S. Cottrell, J. Dykes, E. Benetos, D. Wolff, A. Kachkaev, S. Dixon, S. Hargreaves, M. Barthet, N. Gold, S. Abdallah, D. Tidhar, and M. Plumbley. 2015. The digital music lab: A big data infrastructure for digital musicology. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR 2015), Demos and Late Breaking News Session.Google ScholarGoogle Scholar
  66. G. Widmer, S. Flossmann, and M. Grachten. 2009. YQX plays Chopin. AI Mag. 30, 3 (2009), 35--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. G. Widmer and W. Goebl. 2004. Computational models of expressive music performance: The state of the art. J. New Music Res. 33, 3 (2004), 203--216.Google ScholarGoogle ScholarCross RefCross Ref
  68. G. Wiggins, D. Müllensiefen, and M. Pearce. 2010. On the non-existence of music: Why music theory is a figment of the imagination. Music. Sci. Discuss. Forum 5 (2010), 231--255.Google ScholarGoogle Scholar
  69. Y.-H. Yang and H. Chen. 2012. Machine recognition of music emotion: A review. ACM Trans. Intell. Syst. Technol. 3, 3 (2012), 40:1--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. M. Zentner, D. Grandjean, and K. Scherer. 2008. Emotions evoked by the sound of music. Characterization, classification, and measurement. Emotion 8, 4 (2008), 494--521.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Getting Closer to the Essence of Music: The Con Espressione Manifesto

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 8, Issue 2
      Survey Paper, Special Issue: Intelligent Music Systems and Applications and Regular Papers
      March 2017
      407 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/3004291
      • Editor:
      • Yu Zheng
      Issue’s Table of Contents

      Copyright © 2016 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 October 2016
      • Accepted: 1 February 2016
      • Revised: 1 January 2016
      • Received: 1 October 2015
      Published in tist Volume 8, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!