skip to main content
research-article

Learning about the world through long-term query logs

Published:27 October 2008Publication History
Skip Abstract Section

Abstract

In this article, we demonstrate the value of long-term query logs. Most work on query logs to date considers only short-term (within-session) query information. In contrast, we show that long-term query logs can be used to learn about the world we live in. There are many applications of this that lead not only to improving the search engine for its users, but also potentially to advances in other disciplines such as medicine, sociology, economics, and more. In this article, we will show how long-term query logs can be used for these purposes, and that their potential is severely reduced if the logs are limited to short time horizons. We show that query effects are long-lasting, provide valuable information, and might be used to automatically make medical discoveries, build concept hierarchies, and generally learn about the sociological behavior of users. We believe these applications are only the beginning of what can be done with the information contained in long-term query logs, and see this work as a step toward unlocking their potential.

References

  1. Adar, E., Weld, D., Bershad, B., and Gribble, S. 2007. Why we search: Visualizing and predicting user behavior. In Proceedings of the 16th International World Wide Web Conference. Banff, Alberta, Canada. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Adar, E. 2007. User 4XXXXX9: anonymizing query logs. In Proceedings of the Workshop on Query Log Analysis. Banff, Alberta, Canada.Google ScholarGoogle Scholar
  3. Agrawal, R. and Srikant, R. 2000. Privacy-preserving data mining. In Proceedings of the 2000 SIGMOD International Conference on Management of Data. Dallas, TX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Baeza-Yates, R. and Tiberi, A. 2007. Extracting semantic relations from query logs. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Beeferman, D. and Berger, A. 2000. Agglomerative clustering of a search engine query log. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Beitzel, S., Jensen, E. C., Chowdhury, A., Grossman, D., and Frieder, O. 2004. Hourly analysis of a very large topically categorized web query log. In Proceedings of the 27th Annual International ACM SIGIR Conference. Sheffield, South Yorkshire. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Blum, A., Dwork, C., McSherry, F., and Nissim, K. 2005. Practical privacy: the SuLQ framework. In Proceedings of the 24th ACM SIGMOD International Conference on Principles of Database Systems. Baltimore, MD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bollegala, D., Matsuo, Y., and Ishizuka, M. 2007. Measuring semantic similarity between words using Web search engines. In Proceedings of the 16th International World Wide Web Conference. Banff, Alberta, Canada. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chien, S. and Immorlica, N. 2005. Semantic similarity between search engine queries using temporal correlation. In Proceedings of the 14th International World Wide Web Conference. Chiba, Japan. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Church, K., Hanks, P., Hindle, D., and Gale, W. 1991. Using statistics in lexical analysis. In Lexical Acquisition: Exploiting Online Resources to Build a Lexicon, U. Zernick, Ed. Lawrence Erlbaum, Hillsdale, NJ, 115--164.Google ScholarGoogle Scholar
  11. Cucerzan, S. and Brill, E. 2005. Extracting semantically related queries by exploiting user session information. Tech rep. Microsoft Research.Google ScholarGoogle Scholar
  12. Dou, Z., Song R., and Wen, J. 2007. A large-scale evaluation and analysis of personalized search strategies. In Proceedings of the 16th International World Wide Web Conference. Banff, Alberta, Canada. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  14. Grimes, C., Tang, D., and Russell, D. 2007. Query logs alone are not enough. In Proceedings of the Workshop on Query Log Analysis. Banff, Alberta, Canada.Google ScholarGoogle Scholar
  15. Hirschman, L., Park, J., Tsujii, J., Wong, L., and Wu, C. 2002. Accomplishments and challenges in literature data mining for biology. Bioinformatics 18, 12, 1553--1561.Google ScholarGoogle ScholarCross RefCross Ref
  16. Jones, R., Rey, B., Madani, O., and Greiner, W. 2006. Generating query substitutions. In Proceedings of the 15th International World Wide Web Conference. Edinburgh, Scotland. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kullback, S., and Leibler, R. A. 1951. On information and sufficiency. Ann. Math. Stat. 22, 1, 79--86.Google ScholarGoogle ScholarCross RefCross Ref
  18. Rey, B. and Jhala, P. 2006. Mining associations from web query logs. In Proceedings of the Web Mining Workshop, Berlin, Germany.Google ScholarGoogle Scholar
  19. Shen, X., Tan, B., and Zhai, C. 2005. Context-sensitive information retrieval using implicit feedback. In Proceedings of the 28th Annual International ACM SIGIR Conference. Salvador, Brazil. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Swanson, D. R. 1986. Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30, 7--18.Google ScholarGoogle Scholar
  21. Swanson, D. R. 1988. Migraine and magnesium: eleven neglected connections. Perspect. in Biol. Med. 31, 526--557.Google ScholarGoogle ScholarCross RefCross Ref
  22. Tan, B., Shen, X., and Zhai, C. 2006. Mining long-term search history to improve search accuracy. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, PA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Teevan, J., Adar, E., Jones, R., and Potts, M. 2007. Information re-retrieval: repeat queries in Yahoo's logs. In Proceedings of the 30th Annual International ACM SIGIR Conference. Amsterdam, The Netherlands. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Vlachos, M., Meek, C., Vagena, Z., and Gunopulos, D. 2004. Identifying similarities, periodicities and bursts for online search queries. In Proceedings of the 23th ACM SIGMOD International Conference on Management of Data. Paris, France. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Wedig, S. and Madani, O. 2006. A large-scale analysis of query logs for assessing personalization opportunities. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, PA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Wen, J., Nie, J., and Zhang, H. 2001. Clustering user queries of a search engine. In Proceedings of the 10th International World Wide Web Conference. Hong Kong, China. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning about the world through long-term query logs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on the Web
          ACM Transactions on the Web  Volume 2, Issue 4
          October 2008
          118 pages
          ISSN:1559-1131
          EISSN:1559-114X
          DOI:10.1145/1409220
          Issue’s Table of Contents

          Copyright © 2008 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 October 2008
          • Accepted: 1 August 2008
          • Revised: 1 July 2008
          • Received: 1 December 2007
          Published in tweb Volume 2, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!