Abstract
The emergence of numerous online sources about local services presents a need for more automatic yet accurate data integration techniques. Local services are georeferenced objects and can be queried by their locations on a map, for instance, neighborhoods. Typical local service queries (e.g., “French Restaurant in The Loop”) include not only information about “what” (“French Restaurant”) a user is searching for (such as cuisine) but also “where” information, such as neighborhood (“The Loop”). In this article, we address three key problems: query translation, result merging and ranking. Most local search engines provide a (hierarchical) organization of (large) cities into neighborhoods. A neighborhood in one local search engine may correspond to sets of neighborhoods in other local search engines. These make the query translation challenging. To provide an integrated access to the query results returned by the local search engines, we need to combine the results into a single list of results.
Our contributions include: (1) An integration algorithm for neighborhoods. (2) A very effective business listing resolution algorithm. (3) A ranking algorithm that takes into consideration the user criteria, user ratings and rankings. We have created a prototype system, Yumi, over local search engines in the restaurant domain. The restaurant domain is a representative case study for the local services. We conducted a comprehensive experimental study to evaluate Yumi. A prototype version of Yumi is available online.
- J. A. Aslam and M. Montague. 2001. Models for metasearch. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 276--284. Google Scholar
Digital Library
- T. T. Avrahami, L. Yau, L. SI, and J. Callan. 2006. The fedlemur project: Federated search in the real world. J. Amer. Soc. Inf. Sci. Technol. Google Scholar
Digital Library
- L. Barbosa and J. Freire. 2007. Combining classifiers to identify online databases. In Proceedings of the International World Wide Web Conference. Google Scholar
Digital Library
- N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. 1990. The r*-tree: an efficient and robust access method for points and rectangles. Proceedings of the ACM SIGMOD International Conference on Management of Data. 19, 2, 322--331. Google Scholar
Digital Library
- P. Berman and M. Karpinski. 1999. On some tighter inapproximability results. In Proceeding of the 26th International Colloquium on Automota, Languages, and Programming. 200--209. Google Scholar
Digital Library
- M. Bilenko and R. J. Mooney. 2003. Adaptive duplicate detection using learnable string similarity measures. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 39--48. Google Scholar
Digital Library
- V. Borkar, K. Deshmukh, and S. Sarawagi. 2001. Automatic segmentation of text into structured records. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google Scholar
Digital Library
- X. Cao, G. Cong, and C. S. Jensen. 2010. Retrieving top-k prestige-based relevant spatial web objects. Proc. VLDB Endow. Google Scholar
Digital Library
- Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. 2007. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine Learning (ICML'07). ACM, New York, 129--136. Google Scholar
Digital Library
- N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. Smote: synthetic minority over-sampling technique. J. Artif. Int. Res. 16, 321--357. Google Scholar
Digital Library
- G. V. Cormack, C. L. A. Clarke, and S. Buettcher. 2009. Reciprocal rank fusion outperforms Condorcet and individual rank learning methods. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 758--759. Google Scholar
Digital Library
- E. Dragut, F. Fang, P. Sistla, C. Yu, and W. Meng. 2009. Stop word and related problems in web interface integration. Proc. VLDB Endow. 2, 1, 349--360. Google Scholar
Digital Library
- E. Dragut, W. Wu, P. Sistla, C. Yu, and W. Meng. 2006. Merging source query interfaces on web databases. In Proceedings of the International Conference on Data Engineering. Google Scholar
Digital Library
- A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. 2007. Duplicate record detection: A survey. IEEE Trans. Knowl. Data Eng. 19, 1. Google Scholar
Digital Library
- Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. 2003. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933--969. Google Scholar
Digital Library
- S. Guo, X. Dong, D. Srivastava, and R. Zajac. 2010. Record linkage with uniqueness constraints and erroneous values. Proc. VLDB Endow. 3, 1. Google Scholar
Digital Library
- D. Gusfield. 1997. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York. Google Scholar
Digital Library
- B. He and K. Chang. 2003. Statistical schema matching across web query interfaces. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google Scholar
Digital Library
- H. He, W. Meng, C. Yu, and Z. Wu. 2003. WISE-integrator: An automatic integrator of Web search interfaces for e-commerce. In Proceedings of the International Conference on Very Large Databases. Google Scholar
Digital Library
- Q. Hu, J. Huang, and J. Miao. 2011. A robust approach to optimizing multi-source information for enhancing genomics retrieval performance. BMC Bioinformatics 12, 1--9.Google Scholar
Cross Ref
- K. Järvelin and J. Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20. Google Scholar
Digital Library
- T. Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 133--142. Google Scholar
Digital Library
- H. Köpcke and E. Rahm. 2010. Frameworks for entity matching: A comparison. Data Knowl. Eng. 69, 197--210. Google Scholar
Digital Library
- B. Liu. 2007. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer. Google Scholar
Digital Library
- W. Liu, X. Meng, and W. Meng. 2010. ViDE: A vision-based approach for deep web data extraction. IEEE Trans. Knowl. Data Eng. 22. Google Scholar
Digital Library
- C. D. Manning, P. Raghavan, and H. Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press. Google Scholar
Digital Library
- A. Marian, N. Bruno, and L. Gravano. 2004. Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29, 2. Google Scholar
Digital Library
- W. Meng and C. Yu. 2010. Advanced Metasearch Engine Technology. Morgan & Claypool. Google Scholar
Digital Library
- S. N. Minton, C. Nanjo, C. A. Knoblock, M. Michalowski, and M. Michelson. 2005. A heterogeneous field matching method for record linkage. In Proceedings of the IEEE International Conference on Data Mining. 314--321. Google Scholar
Digital Library
- M. Montague and J. A. Aslam. 2002. Condorcet fusion for improved retrieval. In Proceedings of the International Conference on Information and Knowledge Management. Google Scholar
Digital Library
- F. P. Preparata and M. I. Shamos. 1985. Computational Geometry: An Introduction 3rd Ed. Springer. Google Scholar
Digital Library
- L. Si and J. Callan. 2003. A semisupervised learning method to merge search engine results. ACM Trans. Inf. Syst. Google Scholar
Digital Library
- P. Venetis, H. Gonzalez, C. S. Jensen, and A. Halevy. 2011. Hyper-local, directions-based ranking of places. Proc. VLDB Endow. Google Scholar
Digital Library
- S. Wu and S. McClean. 2007. Result merging methods in distributed information retrieval with overlapping databases. Inf. Retrieval 10, 297--319. Google Scholar
Digital Library
- D. Zardetto, M. Scannapieco, and T. Catarci. 2010. Effective automated object matching. In Proceedings of the International Conference on Data Engineering.Google Scholar
- Z. Zhang, B. He, and K. Chang. 2005. Light-weight domain-based form assistant: Querying web databases on the fly. In Proceedings of the International Conference on Very Large Databases. Google Scholar
Digital Library
- B. Zhao, B. I. P. Rubinstein, J. Gemmell, and J. Han. 2012. A Bayesian approach to discovering truth from conflicting sources for data integration. Proc. VLDB Endow. 5, 6. Google Scholar
Digital Library
Index Terms
Merging Query Results From Local Search Engines for Georeferenced Objects
Recommendations
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementThis work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Usearch: A Meta Search Engine based on a New Result Merging Strategy
IC3K 2015: Proceedings of the International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge ManagementMeta Search Engines are finding tools developed for improving the search performance by submitting user queries to multiple search engines and combining the different search results in a unified ranked list. The effectiveness of a Meta search engine is ...
A study of results overlap and uniqueness among major web search engines
The performance and capabilities of Web search engines is an important and significant area of research. Millions of people world wide use Web search engines very day. This paper reports the results of a major study examining the overlap among results ...






Comments