skip to main content
research-article

Merging Query Results From Local Search Engines for Georeferenced Objects

Published:06 November 2014Publication History
Skip Abstract Section

Abstract

The emergence of numerous online sources about local services presents a need for more automatic yet accurate data integration techniques. Local services are georeferenced objects and can be queried by their locations on a map, for instance, neighborhoods. Typical local service queries (e.g., “French Restaurant in The Loop”) include not only information about “what” (“French Restaurant”) a user is searching for (such as cuisine) but also “where” information, such as neighborhood (“The Loop”). In this article, we address three key problems: query translation, result merging and ranking. Most local search engines provide a (hierarchical) organization of (large) cities into neighborhoods. A neighborhood in one local search engine may correspond to sets of neighborhoods in other local search engines. These make the query translation challenging. To provide an integrated access to the query results returned by the local search engines, we need to combine the results into a single list of results.

Our contributions include: (1) An integration algorithm for neighborhoods. (2) A very effective business listing resolution algorithm. (3) A ranking algorithm that takes into consideration the user criteria, user ratings and rankings. We have created a prototype system, Yumi, over local search engines in the restaurant domain. The restaurant domain is a representative case study for the local services. We conducted a comprehensive experimental study to evaluate Yumi. A prototype version of Yumi is available online.

References

  1. J. A. Aslam and M. Montague. 2001. Models for metasearch. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 276--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. T. T. Avrahami, L. Yau, L. SI, and J. Callan. 2006. The fedlemur project: Federated search in the real world. J. Amer. Soc. Inf. Sci. Technol. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Barbosa and J. Freire. 2007. Combining classifiers to identify online databases. In Proceedings of the International World Wide Web Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. 1990. The r*-tree: an efficient and robust access method for points and rectangles. Proceedings of the ACM SIGMOD International Conference on Management of Data. 19, 2, 322--331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Berman and M. Karpinski. 1999. On some tighter inapproximability results. In Proceeding of the 26th International Colloquium on Automota, Languages, and Programming. 200--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Bilenko and R. J. Mooney. 2003. Adaptive duplicate detection using learnable string similarity measures. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 39--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. V. Borkar, K. Deshmukh, and S. Sarawagi. 2001. Automatic segmentation of text into structured records. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. X. Cao, G. Cong, and C. S. Jensen. 2010. Retrieving top-k prestige-based relevant spatial web objects. Proc. VLDB Endow. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. 2007. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine Learning (ICML'07). ACM, New York, 129--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. Smote: synthetic minority over-sampling technique. J. Artif. Int. Res. 16, 321--357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. V. Cormack, C. L. A. Clarke, and S. Buettcher. 2009. Reciprocal rank fusion outperforms Condorcet and individual rank learning methods. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 758--759. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Dragut, F. Fang, P. Sistla, C. Yu, and W. Meng. 2009. Stop word and related problems in web interface integration. Proc. VLDB Endow. 2, 1, 349--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. Dragut, W. Wu, P. Sistla, C. Yu, and W. Meng. 2006. Merging source query interfaces on web databases. In Proceedings of the International Conference on Data Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. 2007. Duplicate record detection: A survey. IEEE Trans. Knowl. Data Eng. 19, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. 2003. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933--969. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Guo, X. Dong, D. Srivastava, and R. Zajac. 2010. Record linkage with uniqueness constraints and erroneous values. Proc. VLDB Endow. 3, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Gusfield. 1997. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. He and K. Chang. 2003. Statistical schema matching across web query interfaces. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. He, W. Meng, C. Yu, and Z. Wu. 2003. WISE-integrator: An automatic integrator of Web search interfaces for e-commerce. In Proceedings of the International Conference on Very Large Databases. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Q. Hu, J. Huang, and J. Miao. 2011. A robust approach to optimizing multi-source information for enhancing genomics retrieval performance. BMC Bioinformatics 12, 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  21. K. Järvelin and J. Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 133--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Köpcke and E. Rahm. 2010. Frameworks for entity matching: A comparison. Data Knowl. Eng. 69, 197--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. Liu. 2007. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. W. Liu, X. Meng, and W. Meng. 2010. ViDE: A vision-based approach for deep web data extraction. IEEE Trans. Knowl. Data Eng. 22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. D. Manning, P. Raghavan, and H. Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Marian, N. Bruno, and L. Gravano. 2004. Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. W. Meng and C. Yu. 2010. Advanced Metasearch Engine Technology. Morgan & Claypool. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. N. Minton, C. Nanjo, C. A. Knoblock, M. Michalowski, and M. Michelson. 2005. A heterogeneous field matching method for record linkage. In Proceedings of the IEEE International Conference on Data Mining. 314--321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Montague and J. A. Aslam. 2002. Condorcet fusion for improved retrieval. In Proceedings of the International Conference on Information and Knowledge Management. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. F. P. Preparata and M. I. Shamos. 1985. Computational Geometry: An Introduction 3rd Ed. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L. Si and J. Callan. 2003. A semisupervised learning method to merge search engine results. ACM Trans. Inf. Syst. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. P. Venetis, H. Gonzalez, C. S. Jensen, and A. Halevy. 2011. Hyper-local, directions-based ranking of places. Proc. VLDB Endow. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Wu and S. McClean. 2007. Result merging methods in distributed information retrieval with overlapping databases. Inf. Retrieval 10, 297--319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Zardetto, M. Scannapieco, and T. Catarci. 2010. Effective automated object matching. In Proceedings of the International Conference on Data Engineering.Google ScholarGoogle Scholar
  36. Z. Zhang, B. He, and K. Chang. 2005. Light-weight domain-based form assistant: Querying web databases on the fly. In Proceedings of the International Conference on Very Large Databases. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. B. Zhao, B. I. P. Rubinstein, J. Gemmell, and J. Han. 2012. A Bayesian approach to discovering truth from conflicting sources for data integration. Proc. VLDB Endow. 5, 6. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Merging Query Results From Local Search Engines for Georeferenced Objects

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on the Web
            ACM Transactions on the Web  Volume 8, Issue 4
            October 2014
            178 pages
            ISSN:1559-1131
            EISSN:1559-114X
            DOI:10.1145/2686863
            Issue’s Table of Contents

            Copyright © 2014 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 6 November 2014
            • Accepted: 1 July 2014
            • Revised: 1 March 2014
            • Received: 1 January 2013
            Published in tweb Volume 8, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!