skip to main content
research-article

Query Expansion in Resource-Scarce Languages: A Multilingual Framework Utilizing Document Structure

Published:18 November 2016Publication History
Skip Abstract Section

Abstract

Retrievals in response to queries to search engines in resource-scarce languages often produce no results, which annoys the user. In such cases, at least partially relevant documents must be retrieved. We propose a novel multilingual framework, MultiStructPRF, which expands the query with related terms by (i) using a resource-rich assisting language and (ii) giving varied importance to the expansion terms depending on their position of occurrence in the document. Our system uses the help of an assisting language to expand the query in order to improve system recall. We propose a systematic expansion model for weighting the expansion terms coming from different parts of the document. To combine the expansion terms from query language and assisting language, we propose a heuristics-based fusion model. Our experimental results show an improvement over other PRF techniques in both precision and recall for multiple resource-scarce languages like Marathi, Bengali, Odia, Finnish, and the like. We study the effect of different assisting languages on precision and recall for multiple query languages. Our experiments reveal an interesting fact: Precision is positively correlated with the typological closeness of query language and assisting language, whereas recall is positively correlated with the resource richness of the assisting language.

References

  1. Bashar Al-Shboul and Sung-Hyon Myaeng. 2011. Query phrase expansion using Wikipedia in patent class search. In AIRS. 115--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Arjun Atreya, Yogesh Kakde, Pushpak Bhattacharyya, and Ganesh Ramakrishnan. 2013. Structure cognizant pseudo relevance feedback. In Proceedings of IJCNLP. 982--986.Google ScholarGoogle Scholar
  3. Olivier Bodenreider. 2004. The unified medical language system (UMLS): Integrating biomedical terminology. Nucleic Acids Research 32, suppl 1 (2004), D267--D270.Google ScholarGoogle ScholarCross RefCross Ref
  4. Martin Braschler and Carol Peters. 2004. Cross-language evaluation forum: Objectives, results, achievements. Information Retrieval 7, 1--2 (2004), 7--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Buckeley, G. Salton, J. Allan, and A. Stinghal. 1994. Automatic query expansion using SMART. In Proceedings of the 3rd Text Retrieval Conference. 69--80.Google ScholarGoogle Scholar
  6. Guihong Cao, Jian-Yun Nie, Jianfeng Gao, and Stephen Robertson. 2008. Selecting good expansion terms for pseudo-relevance feedback. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 243--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Claudio Carpineto and Giovanni Romano. 2012. A survey of automatic query expansion in information retrieval. ACM Computing Surveys (CSUR) 44, 1 (2012), 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Manoj K. Chinnakotla, Karthik Raman, and Pushpak Bhattacharyya. 2010a. Multilingual PRF: English lends a helping hand. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 659--666. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Manoj K. Chinnakotla, Karthik Raman, and Pushpak Bhattacharyya. 2010b. Multilingual pseudo-relevance feedback: Performance study of assisting languages. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1346--1356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kevyn Collins-Thompson and Jamie Callan. 2005. Query expansion using random walk models. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management. ACM, 704--711. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. Bruce Croft and David J. Harper. 1979. Using probabilistic models of document retrieval without relevance information. Journal of Documentation 35, 4 (1979), 285--295.Google ScholarGoogle ScholarCross RefCross Ref
  12. Steve Cronen-Townsend, Yun Zhou, and W. Bruce Croft. 2004. A framework for selective query expansion. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management. ACM, 236--237. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hang Cui, Ji-Rong Wen, Jian-Yun Nie, and Wei-Ying Ma. 2002. Probabilistic query expansion using query logs. In Proceedings of the 11th International Conference on World Wide Web. ACM, 325--332. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Surya Ganesh and Vasudeva Verma. 2009. Exploiting structure and content of Wikipedia for query expansion in the context. In International Conference RANLP. 103--106.Google ScholarGoogle Scholar
  15. Wei Gao, John Blitzer, and Ming Zhou. 2008. Using english information in non-english web search. In Proceedings of the 2nd ACM Workshop on Improving Non-English Web Searching. ACM, 17--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. Sparck Jones, Steve Walker, and Stephen E. Robertson. 2000. A probabilistic model of information retrieval: Development and comparative experiments: Part 1. Information Processing 8 Management 36, 6 (2000), 779--808. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. John Lafferty and Chengxiang Zhai. 2001a. Document language models, query models, and risk minimization for information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 111--119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. John Lafferty and Chengxiang Zhai. 2001b. Document language models, query models, and risk minimization for information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01). ACM, New York, 111--119. DOI:http://dx.doi.org/10.1145/383952.383970 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Victor Lavrenko and W. Bruce Croft. 2001. Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 120--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Craig Macdonald and Iadh Ounis. 2007. Expertise drift and query expansion in expert search. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07). ACM, New York, NY, USA, 341--350. DOI:http://dx.doi.org/10.1145/1321440.1321490 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Vol. 1. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mandar Mitra, Amit Singhal, and Chris Buckley. 1998. Improving automatic query expansion. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 206--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Yonggang Qiu and Hans-Peter Frei. 1993. Concept based query expansion. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’93). ACM, New York, 160--169. DOI:http://dx.doi.org/10.1145/160688.160713 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Tetsuya Sakai, Toshihiko Manabe, and Makoto Koyama. 2005. Flexible pseudo-relevance feedback via selective sampling. ACM Transactions on Asian Language Information Processing (TALIP) 4, 2 (2005), 111--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Sanderson and M. Braschler. 2009. Best Practices for Test Collection Creation and Information Retrieval System Evaluation. Technical Report. TrebleCLEF Project.Google ScholarGoogle Scholar
  26. Alan F. Smeaton, Fergus Kelledy, and Ruairi O’Donnell. 1995. TREC-4 experiments at Dublin City University: Thresholding posting lists, query expansion with WordNet and POS tagging of Spanish. Harman {6} (1995), 373--389.Google ScholarGoogle Scholar
  27. Tao Tao and Cheng Xiang Zhai. 2006. Regularized estimation of mixture models for robust pseudo-relevance feedback. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 162--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Dolf Trieschnigg, Djoerd Hiemstra, Franciska de Jong, and Wessel Kraaij. 2010. A cross-lingual framework for monolingual biomedical information retrieval. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management. ACM, 169--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ellen M. Voorhees. 2005. The TREC robust retrieval track. ACM SIGIR Forum, Vol. 39. ACM, 11--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Yang Xu, Gareth J. F. Jones, and Bin Wang. 2009a. Query dependent pseudo-relevance feedback based on wikipedia. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 59--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Yang Xu, Gareth J. F. Jones, and Bin Wang. 2009b. Query dependent pseudo-relevance feedback based on wikipedia. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’09). ACM, New York, 59--66. DOI:http://dx.doi.org/10.1145/1571941.1571954 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Zhijun Yin, Milad Shokouhi, and Nick Craswell. 2009. Query expansion using external evidence. In Advances in Information Retrieval. Springer, 362--374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Chengxiang Zhai and John Lafferty. 2001. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the 10th International Conference on Information and Knowledge Management. ACM, 403--410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Guangyou Zhou, Fang Liu, Yang Liu, Shizhu He, Jun Zhao, and others. 2013. Statistical machine translation improves question retrieval in community question answering via matrix factorization. ACL (1). 852--861.Google ScholarGoogle Scholar
  35. Guangyou Zhou, Kang Liu, Jun Zhao, and others. 2012. Exploiting bilingual translation for question retrieval in community-based question answering. In COLING. 3153--3170.Google ScholarGoogle Scholar
  36. Guangyou Zhou, Zhiwen Xie, Tingting He, Jun Zhao, and Xiaohua Tony Hu. 2016. Learning the multilingual translation representations for question retrieval in community question answering via non-negative matrix factorization. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 7 (2016), 1305--1314. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Query Expansion in Resource-Scarce Languages: A Multilingual Framework Utilizing Document Structure

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 16, Issue 2
      TALLIP Notes and Regular Papers
      June 2017
      136 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3008658
      Issue’s Table of Contents

      Copyright © 2016 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 November 2016
      • Accepted: 1 September 2016
      • Revised: 1 July 2016
      • Received: 1 March 2016
      Published in tallip Volume 16, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)7
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!