skip to main content
10.1145/1367497.1367594acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Exploring social annotations for information retrieval

Authors Info & Claims
Published:21 April 2008Publication History

ABSTRACT

Social annotation has gained increasing popularity in many Web-based applications, leading to an emerging research area in text analysis and information retrieval. This paper is concerned with developing probabilistic models and computational algorithms for social annotations. We propose a unified framework to combine the modeling of social annotations with the language modeling-based methods for information retrieval. The proposed approach consists of two steps: (1) discovering topics in the contents and annotations of documents while categorizing the users by domains; and (2) enhancing document and query language models by incorporating user domain interests as well as topical background models. In particular, we propose a new general generative model for social annotations, which is then simplified to a computationally tractable hierarchical Bayesian network. Then we apply smoothing techniques in a risk minimization framework to incorporate the topical information to language models. Experiments are carried out on a real-world annotation data set sampled from del.icio.us. Our results demonstrate significant improvements over the traditional approaches.

References

  1. T. Berners-Lee, J. Hendler, and O. Lassila. The semantic web. Scientific American, 284(5):34--43, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  2. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Dill, N. Eiron, D. Gibson, D. Gruhl, R. Guha, A. Jhingran, T. Kanungo, S. Rajagopalan, A. Tomkins, J. A. Tomlin, and J. Y. Zien. Semtag and seeker: bootstrapping the semantic web via automated semantic annotation. In Proceedings of the 12th international conference on World Wide Web, pages 178--186, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems. Journal of Information Science, pages 198--208, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Griffiths and M. Steyvers. Finding scientific topics. In National Academy of Sciences, 2004.Google ScholarGoogle Scholar
  6. A. Hotho, R. Jaschke, C. Schmitz, and G. Stumme. Information retrieval in folksonomies: Search and ranking. In Y. Sure and J. Domingue, editors, The Semantic Web: Research and Applications, volume 4011 of LNAI, pages 411--426, Heidelberg, June 2006. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Jackson. Introduction to expert systems. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pages 41--48, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Jelinek and R. Mercer. Interpolated estimation of markov source parameters from sparse data. In Pattern recognition in Practice, 1980.Google ScholarGoogle Scholar
  10. R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In Proceedings of the 12th international conference on World Wide Web, pages 568--576, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. O. Kurland, L. Lee, and C. Domshlak. Better than the real thing?: iterative pseudo-query processing using cluster-based language models. In SIGIR ?05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 19--26, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR ?01: Proceedings of the 24th annual international conference on Research and development in information retrieval, pages 111--119, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. K. McCallum. Multi-label text classification with a mixture model trained by em. In AAAI?09 Workshop on Text Learning, 1999.Google ScholarGoogle Scholar
  14. Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In KDD ?05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 198--207, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR ?98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275--281, New York, NY, USA, 1998. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. P. Robert and G. Casella. Monte Carlo Statistical Methods. Springer Publisher, 2nd Edition, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In UAI ?04: Proceedings of the 20th conference on Uncertainty in artificial intelligence, pages 487--494. UAI Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic models for information discovery. In KDD ?04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 306--315. ACM Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Tao, X. Wang, Q. Mei, and C. Zhai. Language model information retrieval with document expansion. In HLT-NAACL, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. X. Wu, L. Zhang, and Y. Yu. Exploring social annotations for the semantic web. In WWW ?06: Proceedings of the 15th international conference on World Wide Web, pages 417--426, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transaction of information System, 22(2):179--214, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Zhou, E. Manavoglu, J. Li, C. L. Giles, and H. Zha. Probabilistic models for discovering e-communities. In WWW ?06: Proceedings of the 15th international conference on World Wide Web, pages 173--182. ACM Press, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exploring social annotations for information retrieval

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          WWW '08: Proceedings of the 17th international conference on World Wide Web
          April 2008
          1326 pages
          ISBN:9781605580852
          DOI:10.1145/1367497

          Copyright © 2008 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 21 April 2008

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,899of8,196submissions,23%

          Upcoming Conference

          WWW '24
          The ACM Web Conference 2024
          May 13 - 17, 2024
          Singapore , Singapore

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader