skip to main content
research-article

Contextual tag inference

Published:04 November 2011Publication History
Skip Abstract Section

Abstract

This article examines the use of two kinds of context to improve the results of content-based music taggers: the relationships between tags and between the clips of songs that are tagged. We show that users agree more on tags applied to clips temporally “closer” to one another; that conditional restricted Boltzmann machine models of tags can more accurately predict related tags when they take context into account; and that when training data is “smoothed” using context, support vector machines can better rank these clips according to the original, unsmoothed tags and do this more accurately than three standard multi-label classifiers.

References

  1. Aucouturier, J., Pachet, F., Roy, P., and Beuriv, A. 2007. Signal + context = better classification. In Proceedings of the International Symposium on Music Information Retrieval. 425--430.Google ScholarGoogle Scholar
  2. Bertin-Mahieux, T., Eck, D., Maillet, F., and Lamere, P. 2008. Autotagger: A model for predicting social tags from acoustic features on large music databases. J. New Music Res. 37, 2, 115--135.Google ScholarGoogle ScholarCross RefCross Ref
  3. Besag, J. 1975. Statistical analysis of non-lattice data. Statistician 24, 3, 179--195.Google ScholarGoogle ScholarCross RefCross Ref
  4. Boutell, M., Luo, J., Shen, X., and Brown, C. 2004. Learning multi-label scene classification. Patt. Recog. 37, 9, 1757--1771.Google ScholarGoogle ScholarCross RefCross Ref
  5. Chen, L., Xu, D., Tsang, I. W., and Luo, J. 2010. Tag-based web photo retrieval improved by batch mode re-tagging. In Proceedings of the 22nd IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 3440--3446.Google ScholarGoogle Scholar
  6. Cortes, C. and Mohri, M. 2004. Auc optimization vs. error rate minimization. In Proceedings of the Conference on Advances in Neural Information Processing Systems. S. Thrun, L. Saul, and B. Schölkopf, Eds., MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  7. Eck, D., Lamere, P., Bertin-Mahieux, T., and Green, S. 2008. Automatic generation of social tags for music recommendation. In Proceedings of the Conference on Advances in Neural Information Processing Systems. J. Platt, D. Koller, Y Singer, and S. Roweis, Eds., MIT Press, Cambridge, MA, 385--392.Google ScholarGoogle Scholar
  8. Han, Y., Wu, F., Jia, J., Zhuang, Y., and Yu, B. 2010. Multi-task sparse discriminant analysis (MtSDA) with overlapping categories. In Proceedings of the AAAI Conference on Artificial Intelligence. 469--474.Google ScholarGoogle Scholar
  9. Hand, D. J. 2009. Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77, 103--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Heitz, G. and Koller, D. 2008. Learning spatial context: Using stuff to find things. In Proceedings of the European Conference on Computer Vision. D. Forsyth, P. Torr, and A. Zisserman, Eds., Lecture Notes in Computer Science Series, vol. 5302, Springer, 30--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hinton, G. 2002. Training products of experts by minimizing contrastive divergence. Neur. Computat. 14, 1771--1800. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hoiem, D., Efros, A., and Hebert, M. 2008. Putting objects in perspective. Int. J. Comput. Vis. 80, 1, 3--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kang, F., Jin, R., and Sukthankar, R. 2006. Correlated Label Propagation with Application to Multi-label Learning. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 1719--1726. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Larochelle, H. and Bengio, Y. 2008. Classification using discriminative restricted Boltzmann machines. In Proceedings of the International Conference on Machine Learning. A. McCallum and S. Roweis, Eds., Omnipress, 536--543. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lee, J. H. 2010. Crowdsourcing music similarity judgments using mechanical turk. In Proceedings of the International Symposium on Music Information Retrieval. 183--188.Google ScholarGoogle Scholar
  16. Mandel, M., Pascanu, R., Larochelle, H., and Bengio, Y. 2011. Autotagging music with conditional restricted boltzmann machines. http://arxiv.org/abs/1103.2832.Google ScholarGoogle Scholar
  17. Mandel, M. I., Eck, D., and Bengio, Y. 2010. Learning tags that vary within a song. In Proceedings of the International Symposium on Music Information Retrieval. 399--404.Google ScholarGoogle Scholar
  18. Mandel, M. I. and Ellis, D. P. W. 2008. A web-based game for collecting music metadata. J. New Music Res. 37, 2, 151--165.Google ScholarGoogle ScholarCross RefCross Ref
  19. Manning, C., Raghavan, P., and Schütze, H. 2008. Introduction to Information Retrieval. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Markines, B., Cattuto, C., Menczer, F., Benz, D., Hotho, A., and Stumme, G. 2009. Evaluating similarity measures for emergent semantics of social tagging. In Proceedings of the 18th International Conference on World Wide Web. ACM, 641--650. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Miotto, R., Barrington, L., and Lanckriet, G. 2010. Improving auto-tagging by modeling semantic co-occurrences. In Proceedings of the International Symposium on Music Information Retrieval. 297--302.Google ScholarGoogle Scholar
  22. Murphy, K., Torralba, A., and Freeman, W. T. 2004. Using the forest to see the trees: A graphical model relating features, objects, and scenes. In Proceedings of the Conference on Advances in Neural Information Processing Systems. S. Thrun, L. Saul, and B. Schölkopf, Eds., MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  23. Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., and Belongie, S. 2007. Objects in context. In Proceedings of the International Conference on Computer Vision. IEEE, 1--8.Google ScholarGoogle Scholar
  24. Rasiwasia, N. and Vasconcelos, N. 2009. Holistic context modeling using semantic co-occurrences. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1889--1895.Google ScholarGoogle Scholar
  25. Salakhutdinov, R., Mnih, A., and Hinton, G. 2007. Restricted Boltzmann machines for collaborative filtering. In Proceedings of the International Conference on Machine Learning. 791--798. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Schifanella, R., Barrat, A., Cattuto, C., Markines, B., and Menczer, F. 2010. Folks in folksonomies: Social link prediction from shared metadata. In Proceedings of the ACM International Conference on Web Search and Data Mining. ACM, 271--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Slaney, M. 2002. Semantic-audio retrieval. InProceedings of the International Conference on Acoustics, Speech, and Signal Processing.Google ScholarGoogle Scholar
  28. Smolensky, P. 1986. Information Processing in Dynamical Systems: Foundations of Harmony Theory. MIT Press.Google ScholarGoogle Scholar
  29. Snow, R., O'Connor, B., Jurafsky, D., and Ng, A. 2008. Cheap and fast—but is it good? Evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods on Natural Language Processing. 254--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sorokin, A. and Forsyth, D. 2008. Utility data annotation with amazon mechanical turk. In Proceedings of the Workshop on Internet Vision at the IEEE Conference on Computer Vision and Pattern Recognition. 1--8.Google ScholarGoogle Scholar
  31. Taylor, G., Hinton, G. E., and Roweis, S. 2007. Modeling human motion using binary latent variables. In Proceedings of the Conference on Advances in Neural Information Processing Systems. B. Schiilkopf, J. Platt, and T. Hoffman, Eds., MIT Press, Cambridge, MA, 1345--1352.Google ScholarGoogle Scholar
  32. Tingle, D., Kim, Y. E., and Turnbull, D. 2010. Exploring automatic music annotation with “acoustically-objective” tags. In Proceedings of the International Conference on Multimedia Information Retrieval. ACM, 55--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Trohidis, K., Tsoumakas, G., Kalliris, G., and Vlahavas, I. 2008. Multilabel classification of music into emotions. In Proceedings of the International Symposium on Music Information Retrieval.Google ScholarGoogle Scholar
  34. Tsoumakas, G., Katakis, I., and Vlahavas, I. 2010. Mining multi-label data. In Data Mining and Knowledge Discovery Handbook, O. Maimon and L. Rokach, Eds., Chapter 34, 667--685.Google ScholarGoogle Scholar
  35. Tsoumakas, G., Vilcek, J., Spyromitros, L., and Vlahavas, I. 2011. MULAN: A java library for multi-label learning. J. Mach. Learn. Res. 12, 2411--2414. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Tsoumakas, G. and Vlahavas, I. 2007. Random k-Labelsets: An ensemble method for multilabel classification. In Proceedings of the European Conference on Machine Learning. Lecture Notes in Computer Science, vol. 4701, Springer, 406--417. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., and Movellan, J. 2009. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Proceedings of the Conference on Advances in Neural Information Processing Systems. Y. Bengio, D. Schuurmans, C. Williams, J. Lafferty, and A. Culotta, Eds., 2035--2043.Google ScholarGoogle Scholar
  38. Whitman, B. and Rifkin, R. 2002. Musical query-by-description as a multiclass learning problem. In Proceedings of the IEEE Workshop on Multimedia Signal Processing. 153--156.Google ScholarGoogle Scholar
  39. Yao, B. and Fei-Fei, L. 2010. Modeling mutual context of object and human pose in human-object interaction activities. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. IEEE, 17--24.Google ScholarGoogle Scholar
  40. Zhang, M. and Zhou, Z. 2007. ML-KNN: A lazy learning approach to multi-label learning. Patt. Recog. 40, 7, 2038--2048. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Contextual tag inference

            Recommendations

            Reviews

            Alyx Macfadyen

            Tagging digital music, video, images, and other media is typically frustrating. Most people attempt to either manually tag digital music files or use software to do this. At the end of the day, reliable and comprehensive manual tagging is time consuming and rarely efficient. The authors of this paper address the problem of online tagging solutions that are likewise not efficient in returning optimum results. The methods they describe perhaps go a little way toward improving algorithmic solutions for a tag inferencing system such as that developed by [1]. In this paper, Mandel et al. discuss a method to auto-tag or auto-generate tags across media (image, music, video, or other). Their focus is specific in that they discuss how context underpins the auto-tagging model described. Although the authors cite other tagging models relevant to their own deployment of context, it is not entirely clear exactly what context is, and if they have contributed to developing it. Their reference to the decision tree used by [1] sheds some light on how context works. The system seems to involve creating tag nodes, creating relations between tags, developing training data, and adjusting the semantics at all levels. It would have been helpful for the authors to give some examples using context. However, they did report some optimization in the interrogation of their own dataset for this system. What interests me is their use of Amazon's Mechanical Turk service to create their own datasets to test their learning algorithms for inference tagging of individual media entity data. The cost of accumulating these data was minimal, and the resultant dataset was apparently rich with human responses and idiosyncratic vocabularies. This seems a convenient and financially viable means of developing datasets. This paper does not clearly define context for the reader. It does describe the background work done by others in attempts to develop an efficient tagging system, but there is minimal indication that the authors' contribution will impact tagging data inferencing methods. However, the intelligent auto-tagging system advanced by [1] may gain ground over time with trials such as those advanced by the authors. Online Computing Reviews Service

            Access critical reviews of Computing literature here

            Become a reviewer for Computing Reviews.

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Multimedia Computing, Communications, and Applications
              ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 7S, Issue 1
              Special section on ACM multimedia 2010 best paper candidates, and issue on social media
              October 2011
              246 pages
              ISSN:1551-6857
              EISSN:1551-6865
              DOI:10.1145/2037676
              Issue’s Table of Contents

              Copyright © 2011 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 4 November 2011
              • Revised: 1 August 2011
              • Accepted: 1 August 2011
              • Received: 1 October 2010
              Published in tomm Volume 7S, Issue 1

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!