skip to main content
research-article

Towards the taxonomy-oriented categorization of yellow pages queries

Published:23 March 2008Publication History
Skip Abstract Section

Abstract

Yellow pages search is a popular service that provides a means for finding businesses close to particular locations. The efficient search of yellow pages is becoming a rapidly evolving research area. The underlying data maintained in yellow pages search engines are typically labeled according to Standard Industry Classification (SIC) categories, and users can search yellow pages with categories according to their interests. Categorizing yellow pages queries into a subset of topical categories can help to improve search experience and quality. However, yellow pages queries are usually short and ambiguous. In addition, a yellow pages query taxonomy is typically organized by a hierarchy of a fairly large number of categories. These characteristics make automatic yellow pages query categorization difficult and challenging. In this article, we propose a flexible yellow pages query categorization approach. The proposed technique is built based on a TF-IDF similarity taxonomy matching scheme that is able to provide more accurate query categorization than previous keyword-based matching schemes. To further improve the categorization performance, we design several filtering schemes. Through extensive experimentation, we demonstrate encouraging results. We obtain F1 measures of about 0.5 and 0.3 for categorizing yellow pages queries into 19 coarse categories and 244 finer categories, respectively. We investigate different components in the proposed approach and also demonstrate the superiority of our approach over a hierarchical support vector machine classifier.

References

  1. Adami, G., Avesani, P., and Sona, D. 2003. Bootstrapping for hierarchical document classification. In Proceedings of the 12th ACM International Conference on Information and Knowledge Management (CIKM). 295--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Baeza-Yates, R. A. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Beitzel, S. M., Jensen, E. C., Chowdhury, A., and Frieder, O. 2008. Analysis of varying approaches to topical web query classification. In Proceedings of the 3rd International Conference on Scalable Information Systems (InfoScale). 15:1--15:5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Beitzel, S. M., Jensen, E. C., Frieder, O., Grossman, D., Lewis, D. D., Chowdhury, A., and Kolcz, A. 2005a. Automatic web query classification using labeled and unlabeled training data. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 581--582. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Beitzel, S. M., Jensen, E. C., Frieder, O., Lewis, D. D., Chowdhury, A., and Kolcz, A. 2005b. Improving automatic query classification via semi-supervised learning. In Proceedings of the 5th IEEE International Conference on Data Mining. (ICDM). 42--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Benkhalifa, M., Mouradi, A., and Bouyakhf, H. 2001. Integrating WordNet knowledge to supplement training data in semi-supervised agglomerative hierarchical clustering for text categorization. Int. J. Intell. Syst. 16, 8, 929--947.Google ScholarGoogle ScholarCross RefCross Ref
  7. Broder, A. Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., and Zhang, T. 2007. Robust classification of rare queries using web knowledge. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 231--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cao, H., Hu, D. H., Shen, D., Jiang, D., Sun, J.-T., Chen, E., and Yang, Q. 2009. Context-aware query classification. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 3--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dai, H. K., Zhao, L., Nie, Z., Wen, J.-R., Wang, L., and Li, Y. 2006. Detecting Online Commercial Intention (OCI). In Proceedings of the 15th International Conference on the World Wide Web (WWW). 829--837. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Diemert, E. and Vandelle, G. 2009. Unsupervised query categorization using automatically-built concept graphs. In Proceedings of the 18th International Conference on the World Wide Web (WWW). 461--470. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dumais, S. and Chen, H. 2000. Hierarchical classification of web content. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 256--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ganti, V., König, A. C., and Li, X. 2010. Precomputing search features for fast and accurate query classification. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM). 61--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gligorov, R., ten Kate, W., Aleksovski, Z., and Van Harmelen, F. 2007. Using Google distance to weight approximate ontology matches. In Proceedings of the 16th International Conference on the World Wide Web. (WWW). 767--776. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gliozzo, A., Strapparava, C., and Dagan, I. 2005. Investigating unsupervised learning for text categorization bootstrapping. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT). 129--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gravano, L., Hatzivassiloglou, V., and Lichtenstein, R. 2003. Categorizing web queries according to geographical locality. In Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM). 325--333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hu, J., Wang, G., Lochovsky, F., Sun, J.-T., and Chen, Z. 2009. Understanding user's query intent with Wikipedia. In Proceedings of the 18th International Conference on the World Wide Web. (WWW). 471--480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jansen, B. J., Booth, D. L., and Spink, A. 2008. Determining the informational, navigational, and transactional intent of web queries. Inf. Process. Manage. 44, 1251--1266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Joachims, T., 1998. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the 10th European Conference on Machine Learning (ECML). 137--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kang, I.-H. and Kim, G. 2003. Query type classification for web document retrieval. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 64--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ko, Y. and Seo, J. 2000. Automatic text categorization by unsupervised learning. In Proceedings of the 18th International Conference on Computational Linguistics (COLING). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Krithara, A., Amini, M. R., Renders, J.-M., and Goutte, C. 2008. Semi-supervised document classification with a mislabeling error model. In Proceedings of the IR Research, 30th European Conference on Advances in Information Retrieval (ECIR). 370--381. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Li, X., Yan, J., Fan, W., Liu, N., Yan, S., and Chen, Z. 2009. An online blog reading system by topic clustering and personalized ranking. ACM Trans. Internet Technol. 9, 3,1--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Li, Z., Wang, C., Xie, X., Wang, X., and Ma, W.-Y. 2007. MSRA Columbus at GeoCLEF 2006. Springer-Verlag, Berlin, 926--929. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ng, W., Deng, L., and Lee, D. L. 2007. Mining user preference using spy voting for search engine personalization. ACM Trans. Internet Technol. 7, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Pang, H., Shen, J., and Krishnan, R. 2010. Privacy-preserving similarity-based text retrieval. ACM Trans. Internet Technol. 10, 4, 1--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Rahm, E. and Bernstein, P. A. 2001. A survey of approaches to automatic schema matching.VLDB J. 10, 334--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sandler, M. 2005. On the use of linear programming for unsupervised text classification. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD). 256--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sebastiani, F. 2002. Machine learning in automated text categorization. ACM Comput. Surv. 34, 1 1--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Shen, D., Li, Y., Li, X., and Zhou, D. 2009. Product query classification. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM). 741--750. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Shen, D., Pan, R., Sun, J.-T., Pan, J. J., Wu, K., Yin, J., and Yang, Q. 2006. Query enrichment for web-query classification. ACM Trans. Inf. Syst. 24, 3, 320--352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Skarmeta, A. G., Bensaid, A., and Tazi, N. 2000. Data mining for text categorization with semisupervised agglomerative hierarchical clustering. Internat. J. Intell. Syst. 15, 7, 633--646.Google ScholarGoogle ScholarCross RefCross Ref
  32. Sun, A., Lim, E.-P., and Liu, Y. 2009. On strategies for imbalanced text classification using SVM: A comparative study. Decis. Support Syst. 48, 1, 191--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yang, Y. and Liu, X. 1999. A re-examination of text categorization methods. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research And Development In Information Retrieval (SIGIR). 42--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yang, Y. and Pedersen, J. O. 1997. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning (ICML). 412--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yi, X., Raghavan, H., and Leggetter, C. 2009. Discovering users' specific geo intention in web search. In Proceedings of the 18th International Conference on the World Wide Web (WWW). 481--490. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards the taxonomy-oriented categorization of yellow pages queries

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!