skip to main content
research-article

Automatic tag recommendation algorithms for social recommender systems

Published:17 February 2011Publication History
Skip Abstract Section

Abstract

The emergence of Web 2.0 and the consequent success of social network Web sites such as Del.icio.us and Flickr introduce us to a new concept called social bookmarking, or tagging. Tagging is the action of connecting a relevant user-defined keyword to a document, image, or video, which helps the user to better organize and share their collections of interesting stuff. With the rapid growth of Web 2.0, tagged data is becoming more and more abundant on the social network Web sites. An interesting problem is how to automate the process of making tag recommendations to users when a new resource becomes available.

In this article, we address the issue of tag recommendation from a machine learning perspective. From our empirical observation of two large-scale datasets, we first argue that the user-centered approach for tag recommendation is not very effective in practice. Consequently, we propose two novel document-centered approaches that are capable of making effective and efficient tag recommendations in real scenarios. The first, graph-based, method represents the tagged data in two bipartite graphs, (document, tag) and (document, word), then finds document topics by leveraging graph partitioning algorithms. The second, prototype-based, method aims at finding the most representative documents within the data collections and advocates a sparse multiclass Gaussian process classifier for efficient document classification. For both methods, tags are ranked within each topic cluster/class by a novel ranking method. Recommendations are performed by first classifying a new document into one or more topic clusters/classes, and then selecting the most relevant tags from those clusters/classes as machine-recommended tags.

Experiments on real-world data from Del.icio.us, CiteULike, and BibSonomy examine the quality of tag recommendation as well as the efficiency of our recommendation algorithms. The results suggest that our document-centered models can substantially improve the performance of tag recommendations when compared to the user-centered methods, as well as topic models LDA and SVM classifiers.

References

  1. Begelman, G., Keller, P., and Smadja, F. 2006. Automated tag clustering: Improving search and exploration in the tag space. In Proceedings of the Collaborative Web Tagging Workshop (WWW'06).Google ScholarGoogle Scholar
  2. Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bogers, T. and van den Bosch, A. 2008. Recommending scientific articles using citeulike. In Proceedings of the ACM Conference on Recommender Systems (RecSys'08). ACM, New York, NY, 287--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Breese, J. S., Heckerman, D., and Kadie, C. 1998. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence. 43--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brinker, K., Furnkranz, J., and Hullermeier, E. 2006. A unified model for multilabel classification and ranking. In Proceedings of the European Conference on Artificial Intelligence (ECAI'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chirita, P. A., Costache, S., Nejdl, W., and Handschuh, S. 2007. P-tag: large scale automatic generation of personalized annotation tags for the web. In Proceedings of the 16th International Conference on World Wide Web (WWW'07). ACM Press, New York, NY, 845--854. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cristianini, N. and Shawe-Taylor, J. 2000. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. Sens B, 39, 1, 1--38.Google ScholarGoogle Scholar
  9. Farooq, U., Song, Y., Carroll, J. M., and Giles, C. L. 2007. Social bookmarking for scholarly digital libraries. IEEE Internet Comput. 29--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Figueiredo, M. A. T. and Jain, A. K. 2002. Unsupervised learning of finite mixture models. IEEE Trans. Patt. Anal. Mach. Intell. 24, 3, 381--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Golder, S. and Huberman, B. 2006. Usage patterns of collaborative tagging systems. J. Inform. Sci. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Golub, G. H. and Loan, C. F. V. 1996. Matrix Computations 3rd Ed. Johns Hopkins University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jaeschke, R., Marinho, L., Hotho, A., Schmidt-Thieme, L., and Stumme, G. 2007. Tag recommendations in folksonomies. In Workshop Proceedings of Lernen—Wissensentdeckung—Adaptivitt (LWA'07). Martin-Luther-Universität Halle-Wittenberg, 13--20.Google ScholarGoogle Scholar
  14. Johnson, R. and Zhang, T. 2007. On the effectiveness of laplacian normalization for graph semi-supervised learning. J. Mach. Learn. Res. 8, 1489--1517. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kendall, M. 1938. A new measure of rank correlation. Biometrika 30, 81--89.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kohonen, T. 2001. Self Organization Maps. Springer.Google ScholarGoogle Scholar
  17. Kullback, S. and Leibler, R. A. 1951. On information and sufficiency. Annl. Math. Stat. 22, 79--86.Google ScholarGoogle ScholarCross RefCross Ref
  18. Lawrence, N., Seeger, M., and Herbrich, R. 2003. Fast sparse gaussian process methods: The informative vector machine. In Proceedings of Neural Information Processing Systems (NIPS15). 609--616.Google ScholarGoogle Scholar
  19. Li, J. and Wang, J. Z. 2006. Real-time computerized annotation of pictures. In Proceedings of the International Conference on Multimedia (MULTIMEDIA'06). 911--920. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Li, J. and Zha, H. 2006. Two-way poisson mixture models for simultaneous document classification and word clustering. Comput. Stat. Data Anal. 50, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Platt, J. C. 2000. Probabilities for SV machines. In Advances in Large Margin Classifiers, 61--74.Google ScholarGoogle Scholar
  22. Rasmussen, C. E. and Williams, C. K. I. 2006. Gaussian Processes for Machine Learning. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Schlattmann, P. 2003. Estimating the number of components in a finite mixture model: the special case of homogeneity. Comput. Stat. Data Anal. 41, 3-4, 441--451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Seeger, M. and Jordan, M. 2004. Sparse gaussian process classification with multiple classes. Tech. rep. 661, Department of Statistics, University of California at Berkeley.Google ScholarGoogle Scholar
  25. Seeger, M. and Williams, C. 2003. Fast forward selection to speed up sparse gaussian process regression. In Proceedings of the Workshop on AI and Statistics.Google ScholarGoogle Scholar
  26. Seo, S., Bode, M., and Obermayer, K. 2003. Soft nearest prototype classification. IEEE Trans. Neural Net. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Song, Y., Zhang, L., and Giles, C. L. 2008. Sparse gaussian processes classification for fast tag recommendation. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM'08). ACM, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Song, Y., Zhuang, Z., Li, H., Zhao, Q., Li, J., Lee, W.-C., and Giles, C. L. 2008. Real-time automatic tag recommendation. In Proceedings of the Annual International ACM SIGIR Conference (SIGIR'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Symeonidis, P., Nanopoulos, A., and Manolopoulos, Y. 2008. Tag recommendations based on tensor dimensionality reduction. In Proceedings of the ACM Conference on Recommender Systems (RecSys'08). ACM, New York, NY, 43--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Tsoumakas, G. and Katakis, I. 2007. Multi-label classification: An overview. Intl. J. Data Warehous. Mining 3, 3, 1--13.Google ScholarGoogle ScholarCross RefCross Ref
  31. Zha, H., He, X., Ding, C., Simon, H., and Gu, M. 2001. Bipartite graph partitioning and data clustering. In Proceedings of the 10th International Conference on Information and Knowledge Management (CIKM'01). ACM Press, New York, NY, 25--32. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automatic tag recommendation algorithms for social recommender systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!