ABSTRACT
Social annotation has gained increasing popularity in many Web-based applications, leading to an emerging research area in text analysis and information retrieval. This paper is concerned with developing probabilistic models and computational algorithms for social annotations. We propose a unified framework to combine the modeling of social annotations with the language modeling-based methods for information retrieval. The proposed approach consists of two steps: (1) discovering topics in the contents and annotations of documents while categorizing the users by domains; and (2) enhancing document and query language models by incorporating user domain interests as well as topical background models. In particular, we propose a new general generative model for social annotations, which is then simplified to a computationally tractable hierarchical Bayesian network. Then we apply smoothing techniques in a risk minimization framework to incorporate the topical information to language models. Experiments are carried out on a real-world annotation data set sampled from del.icio.us. Our results demonstrate significant improvements over the traditional approaches.
- T. Berners-Lee, J. Hendler, and O. Lassila. The semantic web. Scientific American, 284(5):34--43, 2001.Google Scholar
Cross Ref
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 2003. Google Scholar
Digital Library
- S. Dill, N. Eiron, D. Gibson, D. Gruhl, R. Guha, A. Jhingran, T. Kanungo, S. Rajagopalan, A. Tomkins, J. A. Tomlin, and J. Y. Zien. Semtag and seeker: bootstrapping the semantic web via automated semantic annotation. In Proceedings of the 12th international conference on World Wide Web, pages 178--186, 2003. Google Scholar
Digital Library
- S. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems. Journal of Information Science, pages 198--208, 2006. Google Scholar
Digital Library
- T. Griffiths and M. Steyvers. Finding scientific topics. In National Academy of Sciences, 2004.Google Scholar
- A. Hotho, R. Jaschke, C. Schmitz, and G. Stumme. Information retrieval in folksonomies: Search and ranking. In Y. Sure and J. Domingue, editors, The Semantic Web: Research and Applications, volume 4011 of LNAI, pages 411--426, Heidelberg, June 2006. Springer. Google Scholar
Digital Library
- P. Jackson. Introduction to expert systems. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1986. Google Scholar
Digital Library
- K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pages 41--48, 2000. Google Scholar
Digital Library
- F. Jelinek and R. Mercer. Interpolated estimation of markov source parameters from sparse data. In Pattern recognition in Practice, 1980.Google Scholar
- R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In Proceedings of the 12th international conference on World Wide Web, pages 568--576, 2003. Google Scholar
Digital Library
- O. Kurland, L. Lee, and C. Domshlak. Better than the real thing?: iterative pseudo-query processing using cluster-based language models. In SIGIR ?05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 19--26, New York, NY, USA, 2005. ACM Press. Google Scholar
Digital Library
- J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR ?01: Proceedings of the 24th annual international conference on Research and development in information retrieval, pages 111--119, 2001. Google Scholar
Digital Library
- A. K. McCallum. Multi-label text classification with a mixture model trained by em. In AAAI?09 Workshop on Text Learning, 1999.Google Scholar
- Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In KDD ?05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 198--207, New York, NY, USA, 2005. ACM Press. Google Scholar
Digital Library
- J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR ?98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275--281, New York, NY, USA, 1998. ACM Press. Google Scholar
Digital Library
- C. P. Robert and G. Casella. Monte Carlo Statistical Methods. Springer Publisher, 2nd Edition, 2005. Google Scholar
Digital Library
- M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In UAI ?04: Proceedings of the 20th conference on Uncertainty in artificial intelligence, pages 487--494. UAI Press, 2004. Google Scholar
Digital Library
- M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic models for information discovery. In KDD ?04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 306--315. ACM Press, 2004. Google Scholar
Digital Library
- T. Tao, X. Wang, Q. Mei, and C. Zhai. Language model information retrieval with document expansion. In HLT-NAACL, 2006. Google Scholar
Digital Library
- X. Wu, L. Zhang, and Y. Yu. Exploring social annotations for the semantic web. In WWW ?06: Proceedings of the 15th international conference on World Wide Web, pages 417--426, New York, NY, USA, 2006. ACM Press. Google Scholar
Digital Library
- C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transaction of information System, 22(2):179--214, 2004. Google Scholar
Digital Library
- D. Zhou, E. Manavoglu, J. Li, C. L. Giles, and H. Zha. Probabilistic models for discovering e-communities. In WWW ?06: Proceedings of the 15th international conference on World Wide Web, pages 173--182. ACM Press, 2006. Google Scholar
Digital Library
Index Terms
Exploring social annotations for information retrieval
Recommendations
Exploring categorization property of social annotations for information retrieval
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementUser generated social annotations provide extra information for describing document contents. In this paper, we propose an effective method to model the categorization property of social annotations and explore the potential of combining it with ...
Integrating social annotations into topic models for personalized document retrieval
AbstractSocial annotations are valuable resources generated by users on the Web, which encode abundant information on user preferences for certain documents. Social annotation-based information retrieval has been studied in recent years for personalizing ...
Using social annotations to improve language model for information retrieval
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge managementThis poster is concerned with the problem of exploring the use of social annotations for improving language models for information retrieval (denoted as LMIR). Two properties of social annotations, namely keyword property and structure property are ...





Comments