Abstract
Retrievals in response to queries to search engines in resource-scarce languages often produce no results, which annoys the user. In such cases, at least partially relevant documents must be retrieved. We propose a novel multilingual framework, MultiStructPRF, which expands the query with related terms by (i) using a resource-rich assisting language and (ii) giving varied importance to the expansion terms depending on their position of occurrence in the document. Our system uses the help of an assisting language to expand the query in order to improve system recall. We propose a systematic expansion model for weighting the expansion terms coming from different parts of the document. To combine the expansion terms from query language and assisting language, we propose a heuristics-based fusion model. Our experimental results show an improvement over other PRF techniques in both precision and recall for multiple resource-scarce languages like Marathi, Bengali, Odia, Finnish, and the like. We study the effect of different assisting languages on precision and recall for multiple query languages. Our experiments reveal an interesting fact: Precision is positively correlated with the typological closeness of query language and assisting language, whereas recall is positively correlated with the resource richness of the assisting language.
- Bashar Al-Shboul and Sung-Hyon Myaeng. 2011. Query phrase expansion using Wikipedia in patent class search. In AIRS. 115--126. Google Scholar
Digital Library
- Arjun Atreya, Yogesh Kakde, Pushpak Bhattacharyya, and Ganesh Ramakrishnan. 2013. Structure cognizant pseudo relevance feedback. In Proceedings of IJCNLP. 982--986.Google Scholar
- Olivier Bodenreider. 2004. The unified medical language system (UMLS): Integrating biomedical terminology. Nucleic Acids Research 32, suppl 1 (2004), D267--D270.Google Scholar
Cross Ref
- Martin Braschler and Carol Peters. 2004. Cross-language evaluation forum: Objectives, results, achievements. Information Retrieval 7, 1--2 (2004), 7--31. Google Scholar
Digital Library
- C. Buckeley, G. Salton, J. Allan, and A. Stinghal. 1994. Automatic query expansion using SMART. In Proceedings of the 3rd Text Retrieval Conference. 69--80.Google Scholar
- Guihong Cao, Jian-Yun Nie, Jianfeng Gao, and Stephen Robertson. 2008. Selecting good expansion terms for pseudo-relevance feedback. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 243--250. Google Scholar
Digital Library
- Claudio Carpineto and Giovanni Romano. 2012. A survey of automatic query expansion in information retrieval. ACM Computing Surveys (CSUR) 44, 1 (2012), 1. Google Scholar
Digital Library
- Manoj K. Chinnakotla, Karthik Raman, and Pushpak Bhattacharyya. 2010a. Multilingual PRF: English lends a helping hand. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 659--666. Google Scholar
Digital Library
- Manoj K. Chinnakotla, Karthik Raman, and Pushpak Bhattacharyya. 2010b. Multilingual pseudo-relevance feedback: Performance study of assisting languages. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1346--1356. Google Scholar
Digital Library
- Kevyn Collins-Thompson and Jamie Callan. 2005. Query expansion using random walk models. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management. ACM, 704--711. Google Scholar
Digital Library
- W. Bruce Croft and David J. Harper. 1979. Using probabilistic models of document retrieval without relevance information. Journal of Documentation 35, 4 (1979), 285--295.Google Scholar
Cross Ref
- Steve Cronen-Townsend, Yun Zhou, and W. Bruce Croft. 2004. A framework for selective query expansion. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management. ACM, 236--237. Google Scholar
Digital Library
- Hang Cui, Ji-Rong Wen, Jian-Yun Nie, and Wei-Ying Ma. 2002. Probabilistic query expansion using query logs. In Proceedings of the 11th International Conference on World Wide Web. ACM, 325--332. Google Scholar
Digital Library
- Surya Ganesh and Vasudeva Verma. 2009. Exploiting structure and content of Wikipedia for query expansion in the context. In International Conference RANLP. 103--106.Google Scholar
- Wei Gao, John Blitzer, and Ming Zhou. 2008. Using english information in non-english web search. In Proceedings of the 2nd ACM Workshop on Improving Non-English Web Searching. ACM, 17--24. Google Scholar
Digital Library
- K. Sparck Jones, Steve Walker, and Stephen E. Robertson. 2000. A probabilistic model of information retrieval: Development and comparative experiments: Part 1. Information Processing 8 Management 36, 6 (2000), 779--808. Google Scholar
Digital Library
- John Lafferty and Chengxiang Zhai. 2001a. Document language models, query models, and risk minimization for information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 111--119. Google Scholar
Digital Library
- John Lafferty and Chengxiang Zhai. 2001b. Document language models, query models, and risk minimization for information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01). ACM, New York, 111--119. DOI:http://dx.doi.org/10.1145/383952.383970 Google Scholar
Digital Library
- Victor Lavrenko and W. Bruce Croft. 2001. Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 120--127. Google Scholar
Digital Library
- Craig Macdonald and Iadh Ounis. 2007. Expertise drift and query expansion in expert search. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07). ACM, New York, NY, USA, 341--350. DOI:http://dx.doi.org/10.1145/1321440.1321490 Google Scholar
Digital Library
- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Vol. 1. Cambridge University Press. Google Scholar
Digital Library
- Mandar Mitra, Amit Singhal, and Chris Buckley. 1998. Improving automatic query expansion. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 206--214. Google Scholar
Digital Library
- Yonggang Qiu and Hans-Peter Frei. 1993. Concept based query expansion. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’93). ACM, New York, 160--169. DOI:http://dx.doi.org/10.1145/160688.160713 Google Scholar
Digital Library
- Tetsuya Sakai, Toshihiko Manabe, and Makoto Koyama. 2005. Flexible pseudo-relevance feedback via selective sampling. ACM Transactions on Asian Language Information Processing (TALIP) 4, 2 (2005), 111--135. Google Scholar
Digital Library
- M. Sanderson and M. Braschler. 2009. Best Practices for Test Collection Creation and Information Retrieval System Evaluation. Technical Report. TrebleCLEF Project.Google Scholar
- Alan F. Smeaton, Fergus Kelledy, and Ruairi O’Donnell. 1995. TREC-4 experiments at Dublin City University: Thresholding posting lists, query expansion with WordNet and POS tagging of Spanish. Harman {6} (1995), 373--389.Google Scholar
- Tao Tao and Cheng Xiang Zhai. 2006. Regularized estimation of mixture models for robust pseudo-relevance feedback. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 162--169. Google Scholar
Digital Library
- Dolf Trieschnigg, Djoerd Hiemstra, Franciska de Jong, and Wessel Kraaij. 2010. A cross-lingual framework for monolingual biomedical information retrieval. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management. ACM, 169--178. Google Scholar
Digital Library
- Ellen M. Voorhees. 2005. The TREC robust retrieval track. ACM SIGIR Forum, Vol. 39. ACM, 11--20. Google Scholar
Digital Library
- Yang Xu, Gareth J. F. Jones, and Bin Wang. 2009a. Query dependent pseudo-relevance feedback based on wikipedia. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 59--66. Google Scholar
Digital Library
- Yang Xu, Gareth J. F. Jones, and Bin Wang. 2009b. Query dependent pseudo-relevance feedback based on wikipedia. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’09). ACM, New York, 59--66. DOI:http://dx.doi.org/10.1145/1571941.1571954 Google Scholar
Digital Library
- Zhijun Yin, Milad Shokouhi, and Nick Craswell. 2009. Query expansion using external evidence. In Advances in Information Retrieval. Springer, 362--374. Google Scholar
Digital Library
- Chengxiang Zhai and John Lafferty. 2001. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the 10th International Conference on Information and Knowledge Management. ACM, 403--410. Google Scholar
Digital Library
- Guangyou Zhou, Fang Liu, Yang Liu, Shizhu He, Jun Zhao, and others. 2013. Statistical machine translation improves question retrieval in community question answering via matrix factorization. ACL (1). 852--861.Google Scholar
- Guangyou Zhou, Kang Liu, Jun Zhao, and others. 2012. Exploiting bilingual translation for question retrieval in community-based question answering. In COLING. 3153--3170.Google Scholar
- Guangyou Zhou, Zhiwen Xie, Tingting He, Jun Zhao, and Xiaohua Tony Hu. 2016. Learning the multilingual translation representations for question retrieval in community question answering via non-negative matrix factorization. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 7 (2016), 1305--1314. Google Scholar
Digital Library
Index Terms
Query Expansion in Resource-Scarce Languages: A Multilingual Framework Utilizing Document Structure
Recommendations
Automatic query expansion: A structural linguistic perspective
A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion ...
Business information query expansion through semantic network
In this article, we propose a method for business information query expansions. In our approach, hypernym/hyponymy and synonym relations in WordNet are used as the basic expansion rules. Then we use WordNet Lexical Chains and WordNet semantic similarity ...
Improving query expansion using WordNet
This study proposes a new way of using WordNet for query expansion QE. We choose candidate expansion terms from a set of pseudo-relevant documents; however, the usefulness of these terms is measured based on their definitions provided in a hand-crafted ...






Comments