Abstract
We present Q2P, a system that discovers query templates from search engines via their query autocompletion services. Q2P is distinct from the existing works in that it does not rely on query logs of search engines that are typically not readily available. Q2P is also unique in that it uses a trie to economically store queries sampled from a search engine and employs a beam-search strategy that focuses the expansion of the trie on its most promising nodes. Furthermore, Q2P leverages the trie-based storage of query sample to discover query templates using only two passes over the trie. Q2P is a key part of our ongoing project Deep2Q on a template-driven data integration on the Deep Web, where the templates learned by Q2P are used to guide the integration process in Deep2Q. Experimental results on four major search engines indicate that (1) Q2P sends only a moderate number of queries (ranging from 597 to 1,135) to the engines, while obtaining a significant number of completions per query (ranging from 4.2 to 8.5 on the average); (2) a significant number of templates (ranging from 8 to 32 when the minimum support for frequent templates is set to 1%) may be discovered from the samples.
- Ganesh Agarwal, Govind Kabra, and Kevin Chen-Chuan Chang. 2010. Towards rich query interpretation: Walking back and forth for mining query templates. In Proc. of WWW. 1--10. Google Scholar
Digital Library
- Rakesh Agrawal and Ramakrishnan Srikant. 1995. Mining sequential patterns. In Proc. of ICDE. 3--14. Google Scholar
Digital Library
- Amazon. 2014. Amazon Autocompletion API. Retrieved from http://completion.amazon.com/search/complete?method=completion&search-alias==aps&client==amazon-search-ui&mkt==1&x==updateISSCompletion& sc==1&noCacheIE==1294493634389&q=={query}.Google Scholar
- Ziv Bar-Yossef and Maxim Gurevich. 2008. Mining search engine query logs via suggestion sampling. PVLDB 1, 1 (2008), 54--65. Google Scholar
Digital Library
- Bing. 2014. Bing Autocompletion API. Retrieved from http://api.search.live.com/osjson.aspx?query={query}.Google Scholar
- Xiaoyong Chai, Ba-Quy Vuong, AnHai Doan, and Jeffrey F. Naughton. 2009. Efficiently incorporating user feedback into information extraction and integration programs. In Proc. of SIGMOD. 87--100. Google Scholar
Digital Library
- Anish Das Sarma, Xin Dong, and Alon Y. Halevy. 2008. Bootstrapping pay-as-you-go data integration systems. In Proc. of SIGMOD. 861--874. Google Scholar
Digital Library
- Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Y. Halevy, and Pedro Domingos. 2004. iMAP: Discovering complex mappings between database schemas. In Proc. of SIGMOD. 383--394. Google Scholar
Digital Library
- AnHai Doan, Pedro Domingos, and Alon Halevy. 2001. Reconciling schemas of disparate data sources: A machine-learning approach. In Proc. of the 2001 ACM SIGMOD International Conference on Management of Data (SIGMOD’01). 509--520. Google Scholar
Digital Library
- AnHai Doan, Alon Halevy, and Zazhary Ives. 2012. Principles of Data Integration. Morgan Kaufmann. Google Scholar
Digital Library
- AnHai Doan and Alon Y. Halevy. 2005. Semantic integration research in the database community: A brief survey. AI Magazine 26, 1 (2005), 83--94. Google Scholar
Digital Library
- Xin Luna Dong and Felix Naumann. 2009. Data fusion—Resolving data conflicts for integration. PVLDB 2, 2 (2009), 1654--1655. Google Scholar
Digital Library
- Eduard C. Dragut, Weiyi Meng, and Clement T. Yu. 2012. Deep Web Query Interface Understanding and Integration. Morgan & Claypool Publishers. Google Scholar
Digital Library
- Google. 2014. Google Autocompletion API. Retrieved from http://google.com/complete/search?output=firefox& q=={query}.Google Scholar
- Alon Y. Halevy, Michael J. Franklin, and David Maier. 2006a. Dataspaces: A new abstraction for information management. In Proc. of DASFAA. 1--2. Google Scholar
Digital Library
- Alon Y. Halevy, Anand Rajaraman, and Joann J. Ordille. 2006b. Data integration: The teenage years. In Proc. of VLDB. 9--16. Google Scholar
Digital Library
- Zachary G. Ives, Craig A. Knoblock, Steven Minton, Marie Jacob, Partha Pratim Talukdar, Rattapoom Tuchinda, José Luis Ambite, Maria Muslea, and Cenk Gazen. 2009. Interactive data integration through smart copy & paste. In 4th Biennial Conference on Innovative Data Systems Research (CIDR’’09), Online Proceedings. http://www-db.cs.wisc.edu/cidr/cidr2009/Paper_71.pdf.Google Scholar
- Zachary G. Ives, Alon Y. Levy, Daniel S. Weld, Daniela Florescu, and Marc Friedman. 2000. Adaptive query processing for internet applications. IEEE Data Eng. Bull. 23, 2 (2000), 19--26.Google Scholar
- Shawn R. Jeffery, Michael J. Franklin, and Alon Y. Halevy. 2008. Pay-as-you-go user feedback for dataspace systems. In Proc. of SIGMOD. 847--860. Google Scholar
Digital Library
- Xin Jin, Nan Zhang, and Gautam Das. 2011a. Attribute domain discovery for hidden web databases. In Proc. of SIGMOD. 553--564. Google Scholar
Digital Library
- Xin Jin, Nan Zhang, Aditya Mone, and Gautam Das. 2011b. Randomized generalization for aggregate suppression over hidden web databases. PVLDB 4, 11 (2011), 1099--1110.Google Scholar
- Ritu Khare, Yuan An, and Il-Yeol Song. 2010. Understanding deep web search interfaces: A survey. SIGMOD Rec. 39, 1 (2010), 33--40. Google Scholar
Digital Library
- Xiao Li. 2010. Understanding the semantic structure of noun phrase queries. In Proc. of ACL. 1337--1345. Google Scholar
Digital Library
- J. Madhavan, P. Bernstein, A. Doan, and A. Halevy. 2005. Corpus-based schema matching. In Proc. of ICDE. 57--68. Google Scholar
Digital Library
- Jayant Madhavan, Shirley Cohen, Xin Luna Dong, Alon Y. Halevy, Shawn R. Jeffery, David Ko, and Cong Yu. 2007. Web-scale data integration: You can afford to pay as you go. In Proc. of CIDR. 342--350.Google Scholar
- Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Y. Halevy. 2008. Google’s deep web crawl. PVLDB 1, 2 (2008), 1241--1252.Google Scholar
Digital Library
- Arnab Nandi and H. V. Jagadish. 2007. Effective phrase prediction. In Proc. of VLDB. 219--230. Google Scholar
Digital Library
- Arnab Nandi and H. V. Jagadish. 2009. Qunits: Queried units in database search. In Proc. of CIDR.Google Scholar
- Sandeep Pandey and Kunal Punera. 2012. Unsupervised extraction of template structure in web search queries. In Proc. of WWW. 409--418. Google Scholar
Digital Library
- Greg Pass, Abdur Chowdhury, and Cayley Torgeson. 2006. A picture of search. In Proceedings of the 1st International Conference on Scalable Information Systems, Infoscale 2006. 1. Google Scholar
Digital Library
- Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto, Qiming Chen, Umeshwar Dayal, and Meichun Hsu. 2001. PrefixSpan: Mining sequential patterns by prefix-projected growth. In Proc. of ICDE. 215--224. Google Scholar
Digital Library
- Sriram Raghavan and Hector Garcia-Molina. 2001. Crawling the hidden web. In Proc. of VLDB. 129--138. Google Scholar
Digital Library
- Stuart J. Russell and Peter Norvig. 2010. Artificial Intelligence—A Modern Approach. Pearson Education. I--XVIII, 1--1132 pages. Google Scholar
Digital Library
- Marcos Antonio Vaz Salles, Jens-Peter Dittrich, Shant Kirakos Karakashian, Olivier René Girard, and Lukas Blunschi. 2007. iTrails: Pay-as-you-go information integration in dataspaces. In Proc. of VLDB. 663--674. Google Scholar
Digital Library
- Cheng Sheng, Nan Zhang, Yufei Tao, and Xin Jin. 2012. Optimal algorithms for crawling a hidden database in the web. PVLDB 5, 11 (2012), 1112--1123. Google Scholar
Digital Library
- Partha Pratim Talukdar, Zachary G. Ives, and Fernando Pereira. 2010. Automatically incorporating new sources in keyword search-based data integration. In Proc. of SIGMOD. 387--398. Google Scholar
Digital Library
- Ping Wu, Ji-Rong Wen, Huan Liu, and Wei-Ying Ma. 2006. Query selection techniques for efficient crawling of structured web sources. In Proc. of ICDE. 47. Google Scholar
Digital Library
- Wensheng Wu. 2013a. The deep web: Woven to catch the middle ground. In Proc. of ACM CIKM Workshop on Web-scale Knowledge Representation, Retrieval, and Reasoning (Web-KR). 5--8. Google Scholar
Digital Library
- Wensheng Wu. 2013b. Proactive natural language search engine: Tapping into structured data on the web. In Proc. of EDBT. 61--67. Google Scholar
Digital Library
- Wensheng Wu and Tingting Zhong. 2013. Searching the deep web using proactive phrase queries. In Proc. of WWW. 137--138. Google Scholar
Digital Library
- Yahoo! 2014. Yahoo! Autocompletion API. Retrieved fcrom http://ff.search.yahoo.com/gossip?output=fxjson&command=={query}.Google Scholar
Index Terms
Q2P: Discovering Query Templates via Autocompletion
Recommendations
Asking what no one has asked before: using phrase similarities to generate synthetic web search queries
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementThis paper introduces a method for automatically inferring meaningful, not-yet-submitted queries. The inferred queries fill some of the knowledge gaps between documents, on one hand, and known (i.e., already-submitted) queries, on the other hand. Thus, ...
Unsupervised extraction of template structure in web search queries
WWW '12: Proceedings of the 21st international conference on World Wide WebWeb search queries are an encoding of the user's search intent and extracting structured information from them can facilitate central search engine operations like improving the ranking of search results and advertisements. Not surprisingly, this area ...
Discovering search engine related queries using association rules
This work presents a method for online generation of query related suggestions for a Web search engine. The method uses association rules to extract related queries from the log of sbumitted queries to the search engine. Experimental results were ...






Comments