skip to main content
research-article

A vlHMM approach to context-aware search

Published:01 November 2013Publication History
Skip Abstract Section

Abstract

Capturing the context of a user's query from the previous queries and clicks in the same session leads to a better understanding of the user's information need. A context-aware approach to document reranking, URL recommendation, and query suggestion may substantially improve users' search experience. In this article, we propose a general approach to context-aware search by learning a variable length hidden Markov model (vlHMM) from search sessions extracted from log data. While the mathematical model is powerful, the huge amounts of log data present great challenges. We develop several distributed learning techniques to learn a very large vlHMM under the map-reduce framework. Moreover, we construct feature vectors for each state of the vlHMM model to handle users' novel queries not covered by the training data. We test our approach on a raw dataset consisting of 1.9 billion queries, 2.9 billion clicks, and 1.2 billion search sessions before filtering, and evaluate the effectiveness of the vlHMM learned from the real data on three search applications: document reranking, query suggestion, and URL recommendation. The experiment results validate the effectiveness of vlHMM in the applications of document reranking, URL recommendation, and query suggestion.

References

  1. Anagnostopoulos, A., Becchetti, L., Castillo, C., and Gionis, A. 2010. An optimization framework for query recommendation. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM'10). ACM, New York, NY, 161--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Baeza-Yates, R. A. and Tiberiet, A. 2007. Extracting semantic relations from query logs. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'07). 76--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Baeza-Yates, R. A., Hurtado, C., and Mendoza, M. 2004. Query recommendation using query logs in search engines. In Proceedings of the 9th International Conference on Extending Database Technology (EDBT'04) Workshop on Clustering Information over the Web. 588--596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Baum, L., Petrie, T., Soules, G., and Weiss, N. 1970. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist. 41, 1, 164--171.Google ScholarGoogle ScholarCross RefCross Ref
  5. Beeferman, D. and Berger, A. 2000. Agglomerative clustering of a search engine query log. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'00). 407--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bilmes, J. 1998. A gentle tutorial of the EM algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Tech. rep. Intetrnational Computer Science Institute, Berkley, CA.Google ScholarGoogle Scholar
  7. Boldi, P., Bonchi, F., Castillo, C., Donato, D., and Vigna, S. 2009. Query suggestions using query-flow graphs. In Proceedings of the Workshop on Web Search Click Data (WSCD'09). 56--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Boldi, P., Bonchi, F., Castillo, C., Donato, D., Gionis, A., and Vigna, S. 2008. The query-flow graph: Model and applications. In Proceeding of the 17th ACM Conference on Information and Knowledge Management (CIKM'08). 609--618. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Borda, J. C. 1781. Mémoire sur les élections au scrution. Histoire de l'Académie Royal des Sciences.Google ScholarGoogle Scholar
  10. Cao, H., Jiang, D., Pei, J., Chen, E., and Li, H. 2009. Towards context-aware search by learning a very large variable length hidden Markov model from search logs. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). 191--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cao, H., Jiang, D., Pei, J., He, Q., Liao, Z., Chen, E., and Li, H. 2008. Context-aware query suggestion by mining click-through and session data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chapelle, O. and Zhang, Y. 2009. A dynamic bayesian network click model for Web search ranking. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chu, C.-T., Kim, S. K., Lin, Y.-A., Yn, Y., Bradski, G., Ng, A. Y., and Olukotun, K. 2006. Map-reduce for machine learning on multicore. In Proceedings of the 20th Annual Conference on Neural Information Processing Systems (NIPS'06). MIT Press, Combridge, MA, 281--288.Google ScholarGoogle Scholar
  14. Craswell, N. and Szummer, M. 2007. Random walks on the click graph. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). 239--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Craswell, N., Zoeter, O., Taylor, M., and Ramsey, B. 2008. An experimental comparison of click position-bias models. In Proceedings of the 1st ACM International Conference on Web Search and Data Mining (WSDM'08). 87--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Dean, J. and Ghemawat, S. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI'04). USENIX Association, Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximal likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soci. Ser B, 39, 1--38.Google ScholarGoogle Scholar
  18. Deng, H., King, I., and Lyu, M. R. 2009. Entropy-biased models for query representation on the click graph. In Proceedings of the 32th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09). 339--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Donato, D., Bonchi, F., Chi, T., and Maarek, Y. 2010. Do you want to take notes?: Identifying research missions in Yahoo! search pad. In Proceedings of the 19th International Conference on World Wide Web (WWW'10). ACM, New York, NY, 321--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Dupret, G. E. and Piwowarski, B. 2008. A user browsing model to predict search engine click data from past observations. In Proceedings of the 31st Annual ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'08). 331--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Durbin, R., Eddy, S. R., Krogh, A., and Mitchison, G. 1999. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge, U.K.Google ScholarGoogle Scholar
  22. Ester, M., Kriegel, H., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD). 226--231.Google ScholarGoogle Scholar
  23. Fagin, R., Kumar, R., and Sivakumar, D. 2003. Comparing top k lists. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'03). Society for Industrial and Applied Mathematics, Philadelphia, PA, 28--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Fonseca, B. M., Golgher, P., Pôssas, B., Ribeiro-Neto, B., and Ziviani, N. 2005. Concept-based interactive query expansion. In Proceedings of the ACM CIKM International Conference on Information and Knowledge Management (CIKM'05). 696--703. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Fox, S., Karnawat, K., Mydland, M., Dumais, S., and White, T. 2005. Evaluating implicit measures to improve Web search. ACM Trans. Inf. Syst. 23, 147--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J.-Y. Gao, J., Yuan, W., Li, X., Deng, K., and Nie, J.-Y. Gao, J., Yuan, W., Li, X., Deng, K., and Nie, J.-Y. 2009. Smoothing clickthrough data for Web search ranking. In Proceedings of the 32th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09). 355--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Guo, F., Liu, C., and Wang, Y.-M. 2009. Efficient multiple-click models in Web search. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining (WSDM'09). 124--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hassan, A., Jones, R., and Klinkner, K.-L. 2010. Beyond DCG: User behavior as a predictor of a successful search (WSDM'10). ACM, New York, NY, 221--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Huang, C., Chien, L., and Oyang, Y. 2003. Relevant term suggestion in interactive Web search based on contextual information in query session logs. J. Am. Soc. Inf. Sci. Technol. 54, 7, 638--649. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Frider, O. Jensen, E.C, Beitzel, S., Chowdhury, A., Frider, O. Jensen, E. C, Beitzel, S., Chowdhury, A., and Frider, O. 2006. Query phrase suggestion from topically tagged session logs. In Proceedings of the 7th International Conference on Flexible Query Answering Systems (FQAS'06). Lecture Notes in Computer Science, vol. 4027, Springer, Berlin Heidelberg, 185--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Joachims, T. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'02). ACM, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Joachims, T., Granka, L., Pan, B., Hembrooke, H., and Gay, G. 2005. Accurately interpreting clickthrough data as implicit feedback. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'05). 154--161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jones, R. and Klinkner, K. L. 2008. Beyond the session timeout: Automatic hierarchical segmentation of search topics in query logs. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM'08). ACM, New York, NY, 699--708. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jones, R., Rey, B., Madani, O., and Greiner, W. 2006. Generating query substitutions. In Proceedings of the 15th International Conference on World Wide Web (WWW'06). ACM, New York, NY, 387--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Kotov, A., Bennett, P., White, R., Dumais, S., and Teevan, J. 2005. Modeling and analysis of cross-session search tasks. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'05).Google ScholarGoogle Scholar
  36. Liao, Z., Jiang, D., Chen, E., Pei, J., Cao, H., and Li, H. 2011. Mining concept sequences from large-scale search logs for context-aware query suggestion. ACM Trans. Intell. Syst. Technol. 3, 17:1--17:40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Liao, Z., Song, Y., He, L.-W., and Huang, Y. 2012. Evaluating the effectiveness of search task trails. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM, New York, NY, 489--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Lucchese, C., Orlando, S., Perego, R., Silvestri, F., and Tolomei, G. 2011. Identifying task-based sessions in search engine query logs. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM'11). ACM, New York, NY, 277--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Mei, Q., Klinkner, K., Kumar, R., and Tomkins, A. 2009. An analysis framework for search sequences. In Proceeding of the 18th ACM Conference on Information and Knowledge Management (CIKM'09). 1991--1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Mei, Q., Zhou, D., and Church, K. 2008. Query suggestion using hitting time. In Proceeding of the 17th ACM Conference on Information and Knowledge Management (CIKM'08). 469--478. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Rabiner, L. R. 1989. A tutorial on hidden Markov models and selected applications inspeech recognition. Proc. IEEE 77, 2, 257--286.Google ScholarGoogle ScholarCross RefCross Ref
  42. Radlinski, F. and Joachims, T. 2005. Query chains: Learning to rank from implicit feedback. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'05). ACM, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Sadikov, E., Madhavan, J., Wang, L., and Halevy, A. 2010. Clustering query refinements by user intent. In Proceedings of the International Conference on World Wide Web (WWW'10). 841--850. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Shen, X., Tan, B., and Zhai, C.-X. 2005. Context-sensitive information retrieval using implicit feedback. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'05). ACM, New York, NY, 43--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Tong, H., Faloutsos, C., and Pan, J.-Y. 2006. Fast random walk with restart and its applications. In Proceedings of the 6th International Conference on Data Mining (ICDM'06). IEEE Computer Society, Washington, DC, 613--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Wang, Y., Zhou, L., Feng, J., Wang, J., and Lin, Z.-Q. 2006. Mining complex time-series data by learning Markovian models. In Proceedings of the 6th International Conference on Data Mining (ICDM'06). IEEE Computer Society, Washington, DC, 1136--1140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Wen, J., Nie, J., and Zhang, H. 2001. Clustering user queries of a search engine. In Proceedings of the 10th International Conference on World Wide Web (WWW'01). 162--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. White, R. W., Bailey, P., and Chen, L. 2009. Predicting user interests from contextual information. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09). 363--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. White, R. W., Bennett, P. N., and Dumais, S. T. 2010. Predicting short-term interests using activity-based search context. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM'10). 1009--1018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. White, R. W., Bilenko, M., and Cucerzan, S. 2007. Studying the use of popular destinations to enhance Web search interaction. In Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). 159--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Xiang, B., Jiang, D., Pei, J., Sun, X., Chen, E., and Li, H. 2010. Context-aware ranking in Web search. In Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10). ACM, 451--458. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Zhao, M., Li, H., Ratnaparkhi, A., Hon, H.-W., and Wang, J. 2006. Adapting document ranking to users preferences using click-through data. In Proceedings of the Asia Information Retrieval Symposium (AIRS'06). 26--42. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A vlHMM approach to context-aware search

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on the Web
      ACM Transactions on the Web  Volume 7, Issue 4
      October 2013
      220 pages
      ISSN:1559-1131
      EISSN:1559-114X
      DOI:10.1145/2540635
      Issue’s Table of Contents

      Copyright © 2013 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 November 2013
      • Accepted: 1 May 2013
      • Revised: 1 February 2012
      • Received: 1 June 2011
      Published in tweb Volume 7, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!