Abstract
Yellow pages search is a popular service that provides a means for finding businesses close to particular locations. The efficient search of yellow pages is becoming a rapidly evolving research area. The underlying data maintained in yellow pages search engines are typically labeled according to Standard Industry Classification (SIC) categories, and users can search yellow pages with categories according to their interests. Categorizing yellow pages queries into a subset of topical categories can help to improve search experience and quality. However, yellow pages queries are usually short and ambiguous. In addition, a yellow pages query taxonomy is typically organized by a hierarchy of a fairly large number of categories. These characteristics make automatic yellow pages query categorization difficult and challenging. In this article, we propose a flexible yellow pages query categorization approach. The proposed technique is built based on a TF-IDF similarity taxonomy matching scheme that is able to provide more accurate query categorization than previous keyword-based matching schemes. To further improve the categorization performance, we design several filtering schemes. Through extensive experimentation, we demonstrate encouraging results. We obtain F1 measures of about 0.5 and 0.3 for categorizing yellow pages queries into 19 coarse categories and 244 finer categories, respectively. We investigate different components in the proposed approach and also demonstrate the superiority of our approach over a hierarchical support vector machine classifier.
- Adami, G., Avesani, P., and Sona, D. 2003. Bootstrapping for hierarchical document classification. In Proceedings of the 12th ACM International Conference on Information and Knowledge Management (CIKM). 295--302. Google Scholar
Digital Library
- Baeza-Yates, R. A. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston, MA. Google Scholar
Digital Library
- Beitzel, S. M., Jensen, E. C., Chowdhury, A., and Frieder, O. 2008. Analysis of varying approaches to topical web query classification. In Proceedings of the 3rd International Conference on Scalable Information Systems (InfoScale). 15:1--15:5. Google Scholar
Digital Library
- Beitzel, S. M., Jensen, E. C., Frieder, O., Grossman, D., Lewis, D. D., Chowdhury, A., and Kolcz, A. 2005a. Automatic web query classification using labeled and unlabeled training data. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 581--582. Google Scholar
Digital Library
- Beitzel, S. M., Jensen, E. C., Frieder, O., Lewis, D. D., Chowdhury, A., and Kolcz, A. 2005b. Improving automatic query classification via semi-supervised learning. In Proceedings of the 5th IEEE International Conference on Data Mining. (ICDM). 42--49. Google Scholar
Digital Library
- Benkhalifa, M., Mouradi, A., and Bouyakhf, H. 2001. Integrating WordNet knowledge to supplement training data in semi-supervised agglomerative hierarchical clustering for text categorization. Int. J. Intell. Syst. 16, 8, 929--947.Google Scholar
Cross Ref
- Broder, A. Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., and Zhang, T. 2007. Robust classification of rare queries using web knowledge. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 231--238. Google Scholar
Digital Library
- Cao, H., Hu, D. H., Shen, D., Jiang, D., Sun, J.-T., Chen, E., and Yang, Q. 2009. Context-aware query classification. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 3--10. Google Scholar
Digital Library
- Dai, H. K., Zhao, L., Nie, Z., Wen, J.-R., Wang, L., and Li, Y. 2006. Detecting Online Commercial Intention (OCI). In Proceedings of the 15th International Conference on the World Wide Web (WWW). 829--837. Google Scholar
Digital Library
- Diemert, E. and Vandelle, G. 2009. Unsupervised query categorization using automatically-built concept graphs. In Proceedings of the 18th International Conference on the World Wide Web (WWW). 461--470. Google Scholar
Digital Library
- Dumais, S. and Chen, H. 2000. Hierarchical classification of web content. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 256--263. Google Scholar
Digital Library
- Ganti, V., König, A. C., and Li, X. 2010. Precomputing search features for fast and accurate query classification. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM). 61--70. Google Scholar
Digital Library
- Gligorov, R., ten Kate, W., Aleksovski, Z., and Van Harmelen, F. 2007. Using Google distance to weight approximate ontology matches. In Proceedings of the 16th International Conference on the World Wide Web. (WWW). 767--776. Google Scholar
Digital Library
- Gliozzo, A., Strapparava, C., and Dagan, I. 2005. Investigating unsupervised learning for text categorization bootstrapping. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT). 129--136. Google Scholar
Digital Library
- Gravano, L., Hatzivassiloglou, V., and Lichtenstein, R. 2003. Categorizing web queries according to geographical locality. In Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM). 325--333. Google Scholar
Digital Library
- Hu, J., Wang, G., Lochovsky, F., Sun, J.-T., and Chen, Z. 2009. Understanding user's query intent with Wikipedia. In Proceedings of the 18th International Conference on the World Wide Web. (WWW). 471--480. Google Scholar
Digital Library
- Jansen, B. J., Booth, D. L., and Spink, A. 2008. Determining the informational, navigational, and transactional intent of web queries. Inf. Process. Manage. 44, 1251--1266. Google Scholar
Digital Library
- Joachims, T., 1998. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the 10th European Conference on Machine Learning (ECML). 137--142. Google Scholar
Digital Library
- Kang, I.-H. and Kim, G. 2003. Query type classification for web document retrieval. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 64--71. Google Scholar
Digital Library
- Ko, Y. and Seo, J. 2000. Automatic text categorization by unsupervised learning. In Proceedings of the 18th International Conference on Computational Linguistics (COLING). Google Scholar
Digital Library
- Krithara, A., Amini, M. R., Renders, J.-M., and Goutte, C. 2008. Semi-supervised document classification with a mislabeling error model. In Proceedings of the IR Research, 30th European Conference on Advances in Information Retrieval (ECIR). 370--381. Google Scholar
Digital Library
- Li, X., Yan, J., Fan, W., Liu, N., Yan, S., and Chen, Z. 2009. An online blog reading system by topic clustering and personalized ranking. ACM Trans. Internet Technol. 9, 3,1--26. Google Scholar
Digital Library
- Li, Z., Wang, C., Xie, X., Wang, X., and Ma, W.-Y. 2007. MSRA Columbus at GeoCLEF 2006. Springer-Verlag, Berlin, 926--929. Google Scholar
Digital Library
- Ng, W., Deng, L., and Lee, D. L. 2007. Mining user preference using spy voting for search engine personalization. ACM Trans. Internet Technol. 7, 4. Google Scholar
Digital Library
- Pang, H., Shen, J., and Krishnan, R. 2010. Privacy-preserving similarity-based text retrieval. ACM Trans. Internet Technol. 10, 4, 1--39. Google Scholar
Digital Library
- Rahm, E. and Bernstein, P. A. 2001. A survey of approaches to automatic schema matching.VLDB J. 10, 334--350. Google Scholar
Digital Library
- Sandler, M. 2005. On the use of linear programming for unsupervised text classification. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD). 256--264. Google Scholar
Digital Library
- Sebastiani, F. 2002. Machine learning in automated text categorization. ACM Comput. Surv. 34, 1 1--47. Google Scholar
Digital Library
- Shen, D., Li, Y., Li, X., and Zhou, D. 2009. Product query classification. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM). 741--750. Google Scholar
Digital Library
- Shen, D., Pan, R., Sun, J.-T., Pan, J. J., Wu, K., Yin, J., and Yang, Q. 2006. Query enrichment for web-query classification. ACM Trans. Inf. Syst. 24, 3, 320--352. Google Scholar
Digital Library
- Skarmeta, A. G., Bensaid, A., and Tazi, N. 2000. Data mining for text categorization with semisupervised agglomerative hierarchical clustering. Internat. J. Intell. Syst. 15, 7, 633--646.Google Scholar
Cross Ref
- Sun, A., Lim, E.-P., and Liu, Y. 2009. On strategies for imbalanced text classification using SVM: A comparative study. Decis. Support Syst. 48, 1, 191--201. Google Scholar
Digital Library
- Yang, Y. and Liu, X. 1999. A re-examination of text categorization methods. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research And Development In Information Retrieval (SIGIR). 42--49. Google Scholar
Digital Library
- Yang, Y. and Pedersen, J. O. 1997. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning (ICML). 412--420. Google Scholar
Digital Library
- Yi, X., Raghavan, H., and Leggetter, C. 2009. Discovering users' specific geo intention in web search. In Proceedings of the 18th International Conference on the World Wide Web (WWW). 481--490. Google Scholar
Digital Library
Index Terms
Towards the taxonomy-oriented categorization of yellow pages queries
Recommendations
Towards a Taxonomy of Vulnerabilities
HICSS '07: Proceedings of the 40th Annual Hawaii International Conference on System SciencesThis paper presents a taxonomy of vulnerabilities created as a part of an effort to develop a framework for deriving verification and validation strategies to assess software security. This taxonomy is grounded in a Process/Object Model of Computation ...
A taxonomy for object-relational queries
Effective databases for text & document managementA comprehensive study of object-relational queries gives not only an understanding of full capability of object-relational query language but also a direction for query processing and optimization. This chapter classifies object-relational queries into ...
A Taxonomy of Queries for E-commerce Search
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information RetrievalUnderstanding the search tasks and search behavior of users is necessary for optimizing search engine results. While much work has been done on understanding the users in Web search, little knowledge is available about the search tasks and behavior of ...






Comments