ABSTRACT
It is widely believed that some queries submitted to search engines are by nature ambiguous (e.g., java, apple). However, few studies have investigated the questions of "how many queries are ambiguous?" and "how can we automatically identify an ambiguous query?" This paper deals with these issues. First, we construct the taxonomy of query ambiguity, and ask human annotators to manually classify queries based upon it. From manually labeled results, we find that query ambiguity is to some extent predictable. We then use a supervised learning approach to automatically classify queries as being ambiguous or not. Experimental results show that we can correctly identify 87% of labeled queries. Finally, we estimate that about 16% of queries in a real search log are ambiguous.
References
- S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In Proceedings of the 25th ACM Conference on Research in Information Retrieval (SIGIR), pages 299--306, 2002 Google Scholar
Digital Library
- D. Shen, R. Pan, J. -T. Sun, J. J. Pan, K. Wu, J. Yin, and Q. Yang. Q2cυust: our winning solution to query classification in KDDCUP 2005. SIGKDD Explorations, 7(2):100--110, 2005 Google Scholar
Digital Library
- V. Vapnik. Principles of risk minimization for learning theory. In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors, Advances in neural information processing systems 3, pages 831--838. Morgan Kaufmann, 1992Google Scholar
- Live Search. http://www.live.com/Google Scholar
- Vivisimo search engine. http://www.vivisimo.comGoogle Scholar
Index Terms
Identifying ambiguous queries in web search






Comments