Abstract
Most music genre classification approaches extract acoustic features from frames to capture timbre information, leading to the common framework of bag-of-frames analysis. However, time-frequency analysis is also vital for modeling music genres. This article proposes multilevel visual features for extracting spectrogram textures and their temporal variations. A confidence-based late fusion is proposed for combining the acoustic and visual features. The experimental results indicated that the proposed method achieved an accuracy improvement of approximately 14% and 2% in the world's largest benchmark dataset (MASD) and Unique dataset, respectively. In particular, the proposed approach won the Music Information Retrieval Evaluation eXchange (MIREX) music genre classification contests from 2011 to 2013, demonstrating the feasibility and necessity of combining acoustic and visual features for classifying music genres.
- Jeremy F. Alm and James S. Walker. 2002. Time-frequency analysis of musical instruments. SIAM Review 44, 3, 457--476. Google Scholar
Digital Library
- James Bergstra, Michael I. Mandel, and Douglas Eck. 2010. Scalable genre and tag prediction with spectral covariance. In Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR). J. Stephen Downie and Remco C. Veltkamp (Eds.), International Society for Music Information Retrieval, 507--512. http://dblp.uni-trier.de/db/conf/ismir/ismir2010.html#BergstraME10.Google Scholar
- Thierry Bertin-Mahieux, Daniel P. W. Ellis, Brian Whitman, and Paul Lamere. 2011. The million song dataset. In Proceedings of the International Conference on Music Information Retrieval. 591--596.Google Scholar
- William M. Campbell, Douglas E. Sturim, and Douglas A. Reynolds. 2006. Support vector machines using GMM supervectors for speaker verification. IEEE Sig. Process. Lett. 13, 5, 308--311.Google Scholar
Cross Ref
- Chuan Cao and Ming Li. 2009. Thinkits submission for MIREX 2009 audio music classification and similarity tasks. http://www.music-ir.org/mirex/results/2009/abs/CL.pdf.Google Scholar
- Chih-Chung Chang and Chih-Jen Lin. 2010. LIBSVM: A library for support vector machine. (2010). http://www.csie.ntu.edu.tw/∼cjlin/libsvm.Google Scholar
- Zhi-Sheng Chen, Jyh-Shing Roger Jang, and Chin-Hui Lee. 2011. A kernel framework for content-based artist recommendation system in music. IEEE Trans. Multimedi. 13, 6, 1371--1380. Google Scholar
Digital Library
- Y. M. G. Costa, L. S. Oliveira, A. L. Koerich, F. Gouyon, and J. G. Martins. 2012. Music genre classification using LBP textural features. Sig. Process. 92, 11, 2723--2737. DOI:http://dx.doi.org/10.1016/j.sigpro. 2012.04.023 Google Scholar
Digital Library
- Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1--30. http://dl.acm.org/citation.cfm?id=1248547.1248548. Google Scholar
Digital Library
- Hrishikesh Deshpande, Rohit Singh, and Unjung Nam. 2001. Classification of music signals in the visual domain. In Proceedings of the COST-G6 Conference on Digital Audio Effects. 1--4.Google Scholar
- J. Stephen Downie, Andreas F. Ehmann, and Xiao Hu. 2005. Music-to-knowledge (M2K): A prototyping and evaluation environment for music digital library research. In Proceedings of the 5th ACM/IEEE Joint Conference on Digital Libraries. IEEE, 376--376. Google Scholar
Digital Library
- Daniel P. W. Ellis. 2007. Beat tracking by dynamic programming. J. New Music Res. 36, 1, 51--60.Google Scholar
Cross Ref
- Daniel P. W. Ellis and Graham E. Poliner. 2007. Identifying cover songs' with chroma features and dynamic programming beat tracking. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Vol. 4, IEEE, 1429--1432.Google Scholar
- Zhouyu Fu, Guojun Lu, Kai Ming Ting, and Dengsheng Zhang. 2011. A survey of audio-based music classification and annotation. IEEE Trans. Multimed. 13, 2, 303--319. DOI:http://dx.doi.org/10.1109/TMM. 2010.2098858 Google Scholar
Digital Library
- Jean-Luc Gauvain and Chin-Hui Lee. 1994. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. Speech Audio Process. 2, 2, 291--298. DOI:http://dx.doi.org/10.1109/89.279278Google Scholar
Cross Ref
- Masataka Goto. 2003. SmartMusicKiosk: Music listening station with chorus-search function. In Proceedings of the 16th ACM Conference on User Interface Software and Technology. ACM, 31--40. Google Scholar
Digital Library
- Peter Grosche, Joan Serrà, Meinard Müller, and Josep Ll. Arcos. 2012. Structure-based audio fingerprinting for music retrieval. In Proceedings of the International Conference on Music Information Retrieval. 55--60.Google Scholar
- Dan-Ning Jiang, Lie Lu, Hong-Jiang Zhang, Jian-Hua Tao, and Lian-Hong Cai. 2002. Music type classification by spectral contrast feature. In Proceedings of the IEEE International Conference on Multimedia and Expo. Vol. 1, 113--116. DOI:http://dx.doi.org/10.1109/ICME.2002.1035731Google Scholar
Cross Ref
- Josef Kittler, Mohamad Hatef, Robert P. W. Duin, and Jiri Matas. 1998. On combining classifiers. IEEE Trans. Patt. Anal. Mach. Intell. 20, 3 (1998), 226--239. Google Scholar
Digital Library
- Chang-Hsing Lee, Jau-Ling Shih, Kun-Ming Yu, and Hwai-San Lin. 2009. Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features. IEEE Trans. Multimed. 11, 4, 670--682. DOI:http://dx.doi.org/10.1109/TMM.2009.2017635 Google Scholar
Digital Library
- Thomas Lidy and Andreas Rauber. 2005. Evaluation of feature extractors and psycho-acoustic transformations for music genre classification. In Proceedings of the International Conference on Music Information Retrieval. 34--41.Google Scholar
- Cory McKay. 2010. Automatic music classification with jMIR. Ph.D. dissertation, McGill University, Canada. Google Scholar
Digital Library
- Anders Meng, Peter Ahrendt, Jan Larsen, and Lars Kai Hansen. 2007. Temporal feature integration for music genre classification. IEEE Trans. Audio, Speech, Lang. Process. 15, 5 (July 2007), 1654--1664. DOI:http://dx.doi.org/10.1109/TASL.2007.899293 Google Scholar
Digital Library
- Anders Meng and John Shawe-Taylor. 2005. An investigation of feature models for music genre classification using the support vector classifier. In Proceedings of the International Conference on Music Information Retrieval. 604--609.Google Scholar
- Meinard Muller, Daniel P. W. Ellis, Anssi Klapuri, and Gaël Richard. 2011. Signal processing for music analysis. IEEE J. Select. Topics Sig. Process. 5, 6, 1088--1110.Google Scholar
Cross Ref
- Timo Ojala, Matti Pietikainen, and Topi Maenpaa. 2002. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Patt. Anal. Machine Intell. 24, 7, 971--987. Google Scholar
Digital Library
- François Pachet and Daniel Cazaly. 2000. A taxonomy of musical genres. In Proceedings of the RIAO Conference. 1238--1245.Google Scholar
- Y. Panagakis, C. L. Kotropoulos, and G. R. Arce. 2014. Music genre classification via joint sparse low-rank representation of audio features. IEEE/ACM Trans. Audio, Speech, Lang. Process. 22, 12, 1905--1917. DOI:http://dx.doi.org/10.1109/TASLP.2014.2355774 Google Scholar
Digital Library
- Yannis Panagakis, Constantine Kotropoulos, and Gonzalo R. Arce. 2010. Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification. IEEE Trans. Audio, Speech, and Lang. Process. 18, 3, 576--588. DOI:http://dx.doi.org/10.1109/TASL.2009.2036813 Google Scholar
Digital Library
- Jouni Paulus, Meinard Müller, and Anssi Klapuri. 2010. State of the art report: Audio-based music structure analysis. In Proceedings of the International Conference on Music Information Retrieval. 625--636.Google Scholar
- Soo-Chang Pei and Nien-Teh Hsu. 2009. Instrumentation analysis and identification of polyphonic music using beat-synchronous feature integration and fuzzy clustering. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, 169--172. Google Scholar
Digital Library
- Lawrence Rabiner and Biing-Hwang Juang. 1993. Fundamentals of Speech Recognition. Vol. 14, Prentice Hall PTR. Google Scholar
Digital Library
- Jia-Min Ren and J. R. Jang. 2012. Discovering time-constrained sequential patterns for music genre classification. IEEE/ACM Trans. Audio, Speech, Lang. Process. 20, 4, 1134--1144. DOI:http://dx.doi.org/10.1109/TASL.2011.2172426 Google Scholar
Digital Library
- Douglas A. Reynolds, Thomas F. Quatieri, and Robert B. Dunn. 2000. Speaker verification using adapted gaussian mixture models. Digital Signal Process. 10, 13, 19--41. DOI:http://dx.doi.org/10.1006/dspr. 1999.0361 Google Scholar
Digital Library
- Alexander Schindler, Rudolf Mayer, and Andreas Rauber. 2012. Facilitating comprehensive benchmarking experiments on the million song dataset. In Proceedings of the International Conference on Music Information Retrieval. 469--474.Google Scholar
- Klaus Seyerlehner. 2010. Content-based music recommender systems: Beyond simple frame-level audio similarity. Ph.D. dissertation, Johannes Kepler University, Linz, Austria.Google Scholar
- Klaus Seyerlehner, Markus Schedl, Peter Knees, and Reinhard Sonnleitner. 2011. Draft: A refined block-level feature set for classification, similarity and tag prediction. http://www.music-ir.org/mirex/abstracts/2011/SSKS1.pdf.Google Scholar
- Klaus Seyerlehner, Markus Schedl, Tim Pohle, and Peter Knees. 2010. Using block-level features for genre classification, tag classification and music similarity estimation. http://www.music-ir.org/mirex/abstracts/2010/SSPK1.pdf.Google Scholar
- E. Tsunoo, G. Tzanetakis, N. Ono, and S. Sagayama. 2011. Beyond timbral statistics: Improving music classification using percussive patterns and bass lines. IEEE/ACM Trans. Audio, Speech, Lang. Process. 19, 4 (May 2011), 1003--1014. DOI:http://dx.doi.org/10.1109/TASL.2010.2073706 Google Scholar
Digital Library
- George Tzanetakis. 2007. MARSYAS submissions to MIREX 2007. http://www.music-ir.org/mirex/abstracts/2007/AI_CC_GC_MC_AS_tzanetakis.pdf.Google Scholar
- George Tzanetakis and Perry Cook. 2002. Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10, 5.Google Scholar
Cross Ref
- Ming-Ju Wu, Zhi-Sheng Chen, Jyh-Shing Jang, Jia-Min Ren, Yi-Hsung Li, and Chun-Hung Lu. 2011. Combining visual and acoustic features for music genre classification. In Proceedings of the International Conference on Machine Learning and Applications (ICMLA). Vol. 2, IEEE, 124--129. Google Scholar
Digital Library
- Ting-Fan Wu, Chih-Jen Lin, and Ruby C. Weng. 2004. Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 5, 975--1005. Google Scholar
Digital Library
- C.-C. M. Yeh, Li Su, and Yi-Hsuan Yang. 2013. Dual-layer bag-of-frames model for music genre classification. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 246--250. DOI:http://dx.doi.org/10.1109/ICASSP.2013.6637646Google Scholar
Cross Ref
Index Terms
Combining Acoustic and Multilevel Visual Features for Music Genre Classification
Recommendations
Feature Selection in Automatic Music Genre Classification
ISM '08: Proceedings of the 2008 Tenth IEEE International Symposium on MultimediaThis paper presents the results of the application of a feature selection procedure to an automatic music genre classification system. The classification system is based on the use of multiple feature vectors and an ensemble approach, according to time ...
Improving Automatic Music Genre Classification Systems by Using Descriptive Statistical Features of Audio Signals
Artificial Intelligence in Music, Sound, Art and DesignAbstractAutomatic music genre classification systems are vital nowadays because the traditional music genre classification process is mostly implemented without following a universal taxonomy and the traditional process for audio indexing is prone to ...
Detecting Musical Genre Borders for Multi-label Genre Classification
ISM '13: Proceedings of the 2013 IEEE International Symposium on MultimediaIn this paper, we propose a novel method to detect music genre borders for the music genre classification. The music genre classification is getting more important because music is influenced by an increasing amount of different musical styles. A ...






Comments