Abstract
In this article, we demonstrate the value of long-term query logs. Most work on query logs to date considers only short-term (within-session) query information. In contrast, we show that long-term query logs can be used to learn about the world we live in. There are many applications of this that lead not only to improving the search engine for its users, but also potentially to advances in other disciplines such as medicine, sociology, economics, and more. In this article, we will show how long-term query logs can be used for these purposes, and that their potential is severely reduced if the logs are limited to short time horizons. We show that query effects are long-lasting, provide valuable information, and might be used to automatically make medical discoveries, build concept hierarchies, and generally learn about the sociological behavior of users. We believe these applications are only the beginning of what can be done with the information contained in long-term query logs, and see this work as a step toward unlocking their potential.
- Adar, E., Weld, D., Bershad, B., and Gribble, S. 2007. Why we search: Visualizing and predicting user behavior. In Proceedings of the 16th International World Wide Web Conference. Banff, Alberta, Canada. Google Scholar
Digital Library
- Adar, E. 2007. User 4XXXXX9: anonymizing query logs. In Proceedings of the Workshop on Query Log Analysis. Banff, Alberta, Canada.Google Scholar
- Agrawal, R. and Srikant, R. 2000. Privacy-preserving data mining. In Proceedings of the 2000 SIGMOD International Conference on Management of Data. Dallas, TX. Google Scholar
Digital Library
- Baeza-Yates, R. and Tiberi, A. 2007. Extracting semantic relations from query logs. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, CA. Google Scholar
Digital Library
- Beeferman, D. and Berger, A. 2000. Agglomerative clustering of a search engine query log. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, MA. Google Scholar
Digital Library
- Beitzel, S., Jensen, E. C., Chowdhury, A., Grossman, D., and Frieder, O. 2004. Hourly analysis of a very large topically categorized web query log. In Proceedings of the 27th Annual International ACM SIGIR Conference. Sheffield, South Yorkshire. Google Scholar
Digital Library
- Blum, A., Dwork, C., McSherry, F., and Nissim, K. 2005. Practical privacy: the SuLQ framework. In Proceedings of the 24th ACM SIGMOD International Conference on Principles of Database Systems. Baltimore, MD. Google Scholar
Digital Library
- Bollegala, D., Matsuo, Y., and Ishizuka, M. 2007. Measuring semantic similarity between words using Web search engines. In Proceedings of the 16th International World Wide Web Conference. Banff, Alberta, Canada. Google Scholar
Digital Library
- Chien, S. and Immorlica, N. 2005. Semantic similarity between search engine queries using temporal correlation. In Proceedings of the 14th International World Wide Web Conference. Chiba, Japan. Google Scholar
Digital Library
- Church, K., Hanks, P., Hindle, D., and Gale, W. 1991. Using statistics in lexical analysis. In Lexical Acquisition: Exploiting Online Resources to Build a Lexicon, U. Zernick, Ed. Lawrence Erlbaum, Hillsdale, NJ, 115--164.Google Scholar
- Cucerzan, S. and Brill, E. 2005. Extracting semantically related queries by exploiting user session information. Tech rep. Microsoft Research.Google Scholar
- Dou, Z., Song R., and Wen, J. 2007. A large-scale evaluation and analysis of personalized search strategies. In Proceedings of the 16th International World Wide Web Conference. Banff, Alberta, Canada. Google Scholar
Digital Library
- Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.Google Scholar
- Grimes, C., Tang, D., and Russell, D. 2007. Query logs alone are not enough. In Proceedings of the Workshop on Query Log Analysis. Banff, Alberta, Canada.Google Scholar
- Hirschman, L., Park, J., Tsujii, J., Wong, L., and Wu, C. 2002. Accomplishments and challenges in literature data mining for biology. Bioinformatics 18, 12, 1553--1561.Google Scholar
Cross Ref
- Jones, R., Rey, B., Madani, O., and Greiner, W. 2006. Generating query substitutions. In Proceedings of the 15th International World Wide Web Conference. Edinburgh, Scotland. Google Scholar
Digital Library
- Kullback, S., and Leibler, R. A. 1951. On information and sufficiency. Ann. Math. Stat. 22, 1, 79--86.Google Scholar
Cross Ref
- Rey, B. and Jhala, P. 2006. Mining associations from web query logs. In Proceedings of the Web Mining Workshop, Berlin, Germany.Google Scholar
- Shen, X., Tan, B., and Zhai, C. 2005. Context-sensitive information retrieval using implicit feedback. In Proceedings of the 28th Annual International ACM SIGIR Conference. Salvador, Brazil. Google Scholar
Digital Library
- Swanson, D. R. 1986. Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30, 7--18.Google Scholar
- Swanson, D. R. 1988. Migraine and magnesium: eleven neglected connections. Perspect. in Biol. Med. 31, 526--557.Google Scholar
Cross Ref
- Tan, B., Shen, X., and Zhai, C. 2006. Mining long-term search history to improve search accuracy. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, PA. Google Scholar
Digital Library
- Teevan, J., Adar, E., Jones, R., and Potts, M. 2007. Information re-retrieval: repeat queries in Yahoo's logs. In Proceedings of the 30th Annual International ACM SIGIR Conference. Amsterdam, The Netherlands. Google Scholar
Digital Library
- Vlachos, M., Meek, C., Vagena, Z., and Gunopulos, D. 2004. Identifying similarities, periodicities and bursts for online search queries. In Proceedings of the 23th ACM SIGMOD International Conference on Management of Data. Paris, France. Google Scholar
Digital Library
- Wedig, S. and Madani, O. 2006. A large-scale analysis of query logs for assessing personalization opportunities. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, PA. Google Scholar
Digital Library
- Wen, J., Nie, J., and Zhang, H. 2001. Clustering user queries of a search engine. In Proceedings of the 10th International World Wide Web Conference. Hong Kong, China. Google Scholar
Digital Library
Index Terms
Learning about the world through long-term query logs
Recommendations
Automated Query Reformulation for Efficient Search based on Query Logs From Stack Overflow
ICSE '21: Proceedings of the 43rd International Conference on Software EngineeringAs a popular Q&A site for programming, Stack Overflow is a treasure for developers. However, the amount of questions and answers on Stack Overflow make it difficult for developers to efficiently locate the information they are looking for. There are two ...
Cross-lingual query suggestion using query logs of different languages
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrievalQuery suggestion aims to suggest relevant queries for a given query, which help users better specify their information needs. Previously, the suggested terms are mostly in the same language of the input query. In this paper, we extend it to cross-...
Mining Generalized Query Patterns from Web Logs
HICSS '01: Proceedings of the 34th Annual Hawaii International Conference on System Sciences ( HICSS-34)-Volume 5 - Volume 5User logs of a popular search engine keep track of user activities including user queries, user click-through from the returned list, and user browsing behaviors. Knowledge about user queries discovered from user logs can improve the performance of the ...






Comments