skip to main content
research-article

Q2P: Discovering Query Templates via Autocompletion

Authors Info & Claims
Published:29 April 2016Publication History
Skip Abstract Section

Abstract

We present Q2P, a system that discovers query templates from search engines via their query autocompletion services. Q2P is distinct from the existing works in that it does not rely on query logs of search engines that are typically not readily available. Q2P is also unique in that it uses a trie to economically store queries sampled from a search engine and employs a beam-search strategy that focuses the expansion of the trie on its most promising nodes. Furthermore, Q2P leverages the trie-based storage of query sample to discover query templates using only two passes over the trie. Q2P is a key part of our ongoing project Deep2Q on a template-driven data integration on the Deep Web, where the templates learned by Q2P are used to guide the integration process in Deep2Q. Experimental results on four major search engines indicate that (1) Q2P sends only a moderate number of queries (ranging from 597 to 1,135) to the engines, while obtaining a significant number of completions per query (ranging from 4.2 to 8.5 on the average); (2) a significant number of templates (ranging from 8 to 32 when the minimum support for frequent templates is set to 1%) may be discovered from the samples.

References

  1. Ganesh Agarwal, Govind Kabra, and Kevin Chen-Chuan Chang. 2010. Towards rich query interpretation: Walking back and forth for mining query templates. In Proc. of WWW. 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Rakesh Agrawal and Ramakrishnan Srikant. 1995. Mining sequential patterns. In Proc. of ICDE. 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Amazon. 2014. Amazon Autocompletion API. Retrieved from http://completion.amazon.com/search/complete?method=completion&search-alias==aps&client==amazon-search-ui&mkt==1&x==updateISSCompletion& sc==1&noCacheIE==1294493634389&q=={query}.Google ScholarGoogle Scholar
  4. Ziv Bar-Yossef and Maxim Gurevich. 2008. Mining search engine query logs via suggestion sampling. PVLDB 1, 1 (2008), 54--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bing. 2014. Bing Autocompletion API. Retrieved from http://api.search.live.com/osjson.aspx?query={query}.Google ScholarGoogle Scholar
  6. Xiaoyong Chai, Ba-Quy Vuong, AnHai Doan, and Jeffrey F. Naughton. 2009. Efficiently incorporating user feedback into information extraction and integration programs. In Proc. of SIGMOD. 87--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Anish Das Sarma, Xin Dong, and Alon Y. Halevy. 2008. Bootstrapping pay-as-you-go data integration systems. In Proc. of SIGMOD. 861--874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Y. Halevy, and Pedro Domingos. 2004. iMAP: Discovering complex mappings between database schemas. In Proc. of SIGMOD. 383--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. AnHai Doan, Pedro Domingos, and Alon Halevy. 2001. Reconciling schemas of disparate data sources: A machine-learning approach. In Proc. of the 2001 ACM SIGMOD International Conference on Management of Data (SIGMOD’01). 509--520. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. AnHai Doan, Alon Halevy, and Zazhary Ives. 2012. Principles of Data Integration. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. AnHai Doan and Alon Y. Halevy. 2005. Semantic integration research in the database community: A brief survey. AI Magazine 26, 1 (2005), 83--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Xin Luna Dong and Felix Naumann. 2009. Data fusion—Resolving data conflicts for integration. PVLDB 2, 2 (2009), 1654--1655. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Eduard C. Dragut, Weiyi Meng, and Clement T. Yu. 2012. Deep Web Query Interface Understanding and Integration. Morgan & Claypool Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Google. 2014. Google Autocompletion API. Retrieved from http://google.com/complete/search?output=firefox& q=={query}.Google ScholarGoogle Scholar
  15. Alon Y. Halevy, Michael J. Franklin, and David Maier. 2006a. Dataspaces: A new abstraction for information management. In Proc. of DASFAA. 1--2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Alon Y. Halevy, Anand Rajaraman, and Joann J. Ordille. 2006b. Data integration: The teenage years. In Proc. of VLDB. 9--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Zachary G. Ives, Craig A. Knoblock, Steven Minton, Marie Jacob, Partha Pratim Talukdar, Rattapoom Tuchinda, José Luis Ambite, Maria Muslea, and Cenk Gazen. 2009. Interactive data integration through smart copy & paste. In 4th Biennial Conference on Innovative Data Systems Research (CIDR’’09), Online Proceedings. http://www-db.cs.wisc.edu/cidr/cidr2009/Paper_71.pdf.Google ScholarGoogle Scholar
  18. Zachary G. Ives, Alon Y. Levy, Daniel S. Weld, Daniela Florescu, and Marc Friedman. 2000. Adaptive query processing for internet applications. IEEE Data Eng. Bull. 23, 2 (2000), 19--26.Google ScholarGoogle Scholar
  19. Shawn R. Jeffery, Michael J. Franklin, and Alon Y. Halevy. 2008. Pay-as-you-go user feedback for dataspace systems. In Proc. of SIGMOD. 847--860. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Xin Jin, Nan Zhang, and Gautam Das. 2011a. Attribute domain discovery for hidden web databases. In Proc. of SIGMOD. 553--564. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Xin Jin, Nan Zhang, Aditya Mone, and Gautam Das. 2011b. Randomized generalization for aggregate suppression over hidden web databases. PVLDB 4, 11 (2011), 1099--1110.Google ScholarGoogle Scholar
  22. Ritu Khare, Yuan An, and Il-Yeol Song. 2010. Understanding deep web search interfaces: A survey. SIGMOD Rec. 39, 1 (2010), 33--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Xiao Li. 2010. Understanding the semantic structure of noun phrase queries. In Proc. of ACL. 1337--1345. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Madhavan, P. Bernstein, A. Doan, and A. Halevy. 2005. Corpus-based schema matching. In Proc. of ICDE. 57--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jayant Madhavan, Shirley Cohen, Xin Luna Dong, Alon Y. Halevy, Shawn R. Jeffery, David Ko, and Cong Yu. 2007. Web-scale data integration: You can afford to pay as you go. In Proc. of CIDR. 342--350.Google ScholarGoogle Scholar
  26. Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Y. Halevy. 2008. Google’s deep web crawl. PVLDB 1, 2 (2008), 1241--1252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Arnab Nandi and H. V. Jagadish. 2007. Effective phrase prediction. In Proc. of VLDB. 219--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Arnab Nandi and H. V. Jagadish. 2009. Qunits: Queried units in database search. In Proc. of CIDR.Google ScholarGoogle Scholar
  29. Sandeep Pandey and Kunal Punera. 2012. Unsupervised extraction of template structure in web search queries. In Proc. of WWW. 409--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Greg Pass, Abdur Chowdhury, and Cayley Torgeson. 2006. A picture of search. In Proceedings of the 1st International Conference on Scalable Information Systems, Infoscale 2006. 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto, Qiming Chen, Umeshwar Dayal, and Meichun Hsu. 2001. PrefixSpan: Mining sequential patterns by prefix-projected growth. In Proc. of ICDE. 215--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sriram Raghavan and Hector Garcia-Molina. 2001. Crawling the hidden web. In Proc. of VLDB. 129--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Stuart J. Russell and Peter Norvig. 2010. Artificial Intelligence—A Modern Approach. Pearson Education. I--XVIII, 1--1132 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Marcos Antonio Vaz Salles, Jens-Peter Dittrich, Shant Kirakos Karakashian, Olivier René Girard, and Lukas Blunschi. 2007. iTrails: Pay-as-you-go information integration in dataspaces. In Proc. of VLDB. 663--674. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Cheng Sheng, Nan Zhang, Yufei Tao, and Xin Jin. 2012. Optimal algorithms for crawling a hidden database in the web. PVLDB 5, 11 (2012), 1112--1123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Partha Pratim Talukdar, Zachary G. Ives, and Fernando Pereira. 2010. Automatically incorporating new sources in keyword search-based data integration. In Proc. of SIGMOD. 387--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ping Wu, Ji-Rong Wen, Huan Liu, and Wei-Ying Ma. 2006. Query selection techniques for efficient crawling of structured web sources. In Proc. of ICDE. 47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Wensheng Wu. 2013a. The deep web: Woven to catch the middle ground. In Proc. of ACM CIKM Workshop on Web-scale Knowledge Representation, Retrieval, and Reasoning (Web-KR). 5--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Wensheng Wu. 2013b. Proactive natural language search engine: Tapping into structured data on the web. In Proc. of EDBT. 61--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Wensheng Wu and Tingting Zhong. 2013. Searching the deep web using proactive phrase queries. In Proc. of WWW. 137--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Yahoo! 2014. Yahoo! Autocompletion API. Retrieved fcrom http://ff.search.yahoo.com/gossip?output=fxjson&command=={query}.Google ScholarGoogle Scholar

Index Terms

  1. Q2P: Discovering Query Templates via Autocompletion

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!