Abstract
Cross-language information retrieval (CLIR) is an active sub-domain of information retrieval (IR). Like IR, CLIR is centered on the search for documents and for information contained within those documents. Unlike IR, CLIR must reconcile queries and documents that are written in different languages. The usual solution to this mismatch involves translating the query and/or the documents before performing the search. Translation is therefore a pivotal activity for CLIR engines. Over the last 15 years, the CLIR community has developed a wide range of techniques and models supporting free text translation. This article presents an overview of those techniques, with a special emphasis on recent developments.
References
- AbdulJaleel, N. and Larkey, L. S. 2003. Statistical transliteration for English-Arabic cross language information retrieval. In Proceedings of the 12th International Conference on Information and Knowledge Management. ACM Press, New York, 139--146. Google Scholar
Digital Library
- Adriani, M. 2000. Using statistical term similarity for sense disambiguationin cross-language information retrieval. Inf. Retr. 2, 1, 71--82. Google Scholar
Digital Library
- Adriani, M. and Wahyu, I. 2005. The performance of a machine translation-based English-Indonesian CLIR system. In (CLEF 2005): Workshop on Cross-Language Information Retrieval and Evaluation. Google Scholar
Digital Library
- Agirre, E., Di Nunzio, G. M., Ferro, N., Mandl, T., and Peters, C. 2009. CLEF 2008: Ad hoc track overview. In Proceedings of the 9th Cross-language Evaluation Forum Conference on Evaluating Systems for Multilingual and Multimodal Information Access (CLEF'08). Springer, 15--37. Google Scholar
Digital Library
- Alfonseca, E., Bilac, S., and Pharies, S. 2008. Decompounding query keywords from compounding languages. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies Short Papers. Association for Computational Linguistics, 253--256. Google Scholar
Digital Library
- Aljlayl, M. and Frieder, O. 2001. Effective Arabic-English cross-language information retrieval via machine-readable dictionaries and machine translation. In Proceedings of the 10th International Conference on Information and Knowledge Management. ACM, New York, 295--302. Google Scholar
Digital Library
- Amati, G. and Van Rijsbergen, C. J. 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20, 357--389. Google Scholar
Digital Library
- Anderka, M., Lipka, N., and Stein, B. 2009. Evaluating cross-language explicit semantic analysis and cross querying. In Proceedings of the 10th Cross-language Evaluation Forum Conference on Multilingual Information Access Evaluation: Text Retrieval Experiments (CLEF'09). Springer, 50--57. Google Scholar
Digital Library
- Anderka, M. and Stein, B. 2009. The ESA retrieval model revisited. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 670--671. Google Scholar
Digital Library
- Bacchin, M., Ferro, N., and Melucci, M. 2005. A probabilistic model for stemmer generation. Info. Process. Manag. 41, 1, 121--137. Google Scholar
Digital Library
- Baeza-Yates, R. and Ribeiro-Neto, B. 2008. Modern Information Retrieval, 2nd ed. Addison-Wesley Publishing Company. Google Scholar
Digital Library
- Ballesteros, L. and Croft, W. B. 1997. Phrasal translation and query expansion techniques for cross-language information retrieval. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 84--91. Google Scholar
Digital Library
- Ballesteros, L. and Croft, W. B. 1998. Resolving ambiguity for cross-language retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 64--71. Google Scholar
Digital Library
- Ballesteros, L. and Sanderson, M. 2003. Addressing the lack of direct translation resources for cross-language retrieval. In Proceedings of the 12th International Conference on Information and Knowledge Management. ACM Press, New York, 147--152. Google Scholar
Digital Library
- Benczur, A., Csalogany, K., Fogaras, D., Friedman, E., Sarlas, T., Uher, M., and Windhager, E. 2003. Searching a small national domain A preliminary report. In Proceedings of the 12th International World Wide Web Conference (WWW).Google Scholar
- Berger, A. and Lafferty, J. 1999. Information retrieval as statistical translation. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 222--229. Google Scholar
Digital Library
- Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022. 944937. Google Scholar
Digital Library
- Boughanem, M., Chrisment, C., and Nassr, N. 2002. Investigation on disambiguation in CLIR: Aligned corpus and bi-directional translation-based strategies. In Proceedings of the 2nd Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems (Revised Papers). Springer, 158--168. Google Scholar
Digital Library
- Braschler, M. and Ripplinger, B. 2004. How effective is stemming and decompounding for German text retrieval? Inf. Retr. 7, 3-4, 291--316. Google Scholar
Digital Library
- Broglio, J., Callan, J. P., and Croft, W. B. 1993. INQUERY system overview. In Proceedings of the Annual Meeting of the ACL. 47--67. Google Scholar
Digital Library
- Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., and Mercer, R. L. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Linguist. 19, 2, 263--311. Google Scholar
Digital Library
- Buckley, C., Mitra, M., Walz, J., and Cardie, C. 2000. Using clustering and superconcepts within SMART: TREC 6. Inf. Process. Manage. 36, 1, 109--131. Google Scholar
Digital Library
- Callan, J. P., Croft, W. B., and Harding, S. M. 1992. The INQUERY retrieval system. In Proceedings of the 3rd International Conference on Database and Expert Systems Applications. Springer, 78--83.Google Scholar
- Cao, G., Gao, J., and Nie, J.-y. 2007a. A system to mine large-scale bilingual dictionaries from monolingual Web. In Proceedings of the 11th Machine Translation Summit (MT Summit XI). 57--64.Google Scholar
- Cao, G., Gao, J., Nie, J.-Y., and Bai, J. 2007b. Extending query translation to cross-language query expansion with markov chain models. In Proceedings of the 16th ACM Conference on Information and Knowledge Management. ACM, New York, 351--360. Google Scholar
Digital Library
- Carbonell, J. G., Yang, Y., Frederking, R. E., Brown, R., Geng, Y., and Lee, D. 1997. Translingual information retrieval: A comparative evaluation. In Proceedings of the 15th International Joint Conference on Artificial Intelligence. 708--714.Google Scholar
- Chen, A. 2002. Cross-Language retrieval experiments at CLEF-2002. In Proceedings of Evaluation of Cross-Language Information Retrieval Systems: 3rd Workshop of the Cross-Language Evaluation Forum. Springer, 28--48.Google Scholar
- Chen, J., Chau, R., and Yeh, C.-H. 2004. Discovering parallel text from the World Wide Web. In Proceedings of the 2nd Workshop on Australasian Information Security, Data Mining and Web Intelligence, and Software Internationalisation. Vol. 32, 157--161. Google Scholar
Digital Library
- Chen, J. and Nie, J.-Y. 2000. Parallel web text mining for cross-language IR. In Proceedings of RIAO-2000: Content-Based Multimedia Information Access. 188--192.Google Scholar
- Cheng, P.-J., Teng, J.-W., Chen, R.-C., Wang, J.-H., Lu, W.-H., and Chien, L.-F. 2004. Translating unknown queries with web corpora for cross-language information retrieval. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 146--153. Google Scholar
Digital Library
- Chew, P. A., Verzi, S. J., Bauer, T. L., and McClain, J. T. 2006. Evaluation of the bible as a resource for cross-language information retrieval. In Proceedings of the Workshop on Multilingual Language Resources and Interoperability. 68--74. Google Scholar
Digital Library
- Cimiano, P., Schultz, A., Sizov, S., Sorg, P., and Staab, S. 2009. Explicit versus latent concept models for cross-language information retrieval. In Proceedings of the 21st International Jont Conference on Artifical Intelligence. Morgan Kaufmann Publishers Inc., 1513--1518. Google Scholar
Digital Library
- Cleverdon, C. W. 1991. The significance of the Cranfield tests on index languages. In Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 3--12. Google Scholar
Digital Library
- Darwish, K. and Oard, D. W. 2003. Probabilistic structured query methods. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 338--344. Google Scholar
Digital Library
- Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Info. Sci. 41, 6, 391--407.Google Scholar
Cross Ref
- Demner-Fushman, D. and Oard, D. W. 2003. The effect of bilingual term list size on dictionary-based cross-language information retrieval. In Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS'03) IEEE Computer Society. Google Scholar
Digital Library
- Dolamic, L. and Savoy, J. 2010. Retrieval effectiveness of machine translated queries. J. Amer. Soc. Inf. Sci. Technol. 61, 2266--2273. Google Scholar
Digital Library
- Dumais, S. T. 1993. Latent semantic indexing (LSI) and TREC-2. In Proceedings of TREC. 105--115.Google Scholar
- Dumais, S. T. 1995. Latent semantic indexing (LSI): TREC-3 report. In Proceedings of TREC. 219--230.Google Scholar
- Dumais, S. T., Letsche, T. A., Littman, M. L., and Landauer, T. K. 1997. Automatic cross-language retrieval using latent semantic indexing. In Proceedings of the AAAI Spring Symposium Series: Cross-Language Text and Speech Retrieval. 18--24.Google Scholar
- Fautsch, C. and Savoy, J. 2009. Algorithmic stemmers or morphological analysis? An evaluation. J. Amer. Soc. Inf. Sci. Technol. 60, 1616--1624. Google Scholar
Digital Library
- Federico, M. and Bertoldi, N. 2002. Statistical cross-language information retrieval using n-best query translations. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 167--174. Google Scholar
Digital Library
- Ferro, N. and Peters, C. 2009. CLEF 2009 Ad hoc track overview: TEL and Persian tasks. In Proceedings of the 10th Cross-language Evaluation Forum Conference on Multilingual Information Access Evaluation: Text Retrieval Experiments (CLEF'09). Springer, 13--35. Google Scholar
Digital Library
- Fox, C. 1989. A stop list for general text. SIGIR Forum 24, 1-2, 19--21. Google Scholar
Digital Library
- Franz, M., McCarley, J., and Roukos, S. 1999. Ad hoc and multilingual information retrieval at IBM. In Proceedings of TREC-7. 157--168.Google Scholar
- Fujii, A. and Ishikawa, T. 2001. Japanese/english cross-language information retrieval: exploration of query translation and transliteration. Comput. Humanit. 35, 4, 389--420.Google Scholar
Cross Ref
- Gao, J. and Nie, J.-Y. 2006. A study of statistical models for query translation: finding a good unit of translation. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 194--201. Google Scholar
Digital Library
- Gao, J., Nie, J.-Y., Wu, G., and Cao, G. 2004. Dependence language model for information retrieval. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 170--177. Google Scholar
Digital Library
- Gao, J., Nie, J.-Y., Xun, E., Zhang, J., Zhou, M., and Huang, C. 2001. Improving query translation for cross-language information retrieval using statistical models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 96--104. Google Scholar
Digital Library
- Gao, J., Zhou, M., Nie, J.-Y., He, H., and Chen, W. 2002. Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 183--190. Google Scholar
Digital Library
- Gao, W., Wong, K.-F., and Lam, W. 2005. Phoneme-Based transliteration of foreign names for OOV problem. In Proceedings of the 1st International Joint Conference in Natural Language Processing (IJCNLP 04.) Vol. 3248/2005. Springer, 110--119. Google Scholar
Digital Library
- Gey, F. 2007. Search between Chinese and Japanese text collections. In Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access. 73--76.Google Scholar
- Gey, F. and Chen, A. 1998. TREC-9 cross-language information retrieval (English-Chinese) overview. In Proceedings of the 9th Text Retrieval Conference (TREC-9). 15--23.Google Scholar
- Gollins, T. and Sanderson, M. 2001. Improving cross language retrieval with triangulated translation. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 90--95. Google Scholar
Digital Library
- Goto, I., Kato, N., Ehara, T., and Tanaka, H. 2004. Back transliteration from Japanese to English using target English context. In Proceedings of the 20th International Conference on Computational Linguistics. (COLING '04). Association for Computational Linguistics. Google Scholar
Digital Library
- Goutte, C., Cancedda, N., Dymetman, M., and Foster, G., Eds. 2009. Learning Machine Translation. The MIT Press, Cambridge, MA. Google Scholar
Digital Library
- He, D. and Wu, D. 2008. Translation enhancement: A new relevance feedback method for cross-language information retrieval. In Proceeding of the 17th ACM Conference on Information and Knowledge Management. ACM, New York, 729--738. Google Scholar
Digital Library
- Hedlund, T. 2002. Compounds in dictionary-based cross-language information retrieval. Info. Res. 7, 2.Google Scholar
- Hollink, V., Kamps, J., Monz, C., and Rijke, M. D. 2004. Monolingual document retrieval for european languages. Inf. Retr. 7, 1-2, 33--52. Google Scholar
Digital Library
- Huang, F., Zhang, Y., and Vogel, S. 2005. Mining key phrase translations from Web corpora. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 483--490. Google Scholar
Digital Library
- Hull, D. 1993. Using statistical testing in the evaluation of retrieval experiments. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 329--338. Google Scholar
Digital Library
- Hull, D. A. 1996. Stemming algorithms: A case study for detailed evaluation. J. Amer. Soc. Info. Sci. 47, 1, 70--84. Google Scholar
Digital Library
- Hull, D. A. 1997. Using structured queries for disambiguation in cross-language information retrieval. In AAAI Symposium on Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence, 84--98.Google Scholar
- Hull, D. A. and Grefenstette, G. 1996. Querying across languages: A dictionary-based approach to multilingual information retrieval. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 49--57. Google Scholar
Digital Library
- Jang, M.-G., Myaeng, S. H., and Park, S. Y. 1999. Using mutual information to resolve query translation ambiguities and query term weighting. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics. Association for Computational Linguistics, 223--229. Google Scholar
Digital Library
- Jeong, K., Myaeng, S., Lee, J., and Choi, K.-S. 1999. Automatic identification and back-transliteration of foreign words for information retrieval. Info. Process. Manag. 35, 523--540.Google Scholar
Cross Ref
- Jin, R., Hauptmann, A. G., and Zhai, C. X. 2002. Title language model for information retrieval. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 42--48. Google Scholar
Digital Library
- Jones, G. J. F., Fantino, F., Newman, E., and Zhang, Y. 2008. Domain-Specific query translation for multilingual information access using machine translation augmented with dictionaries mined from wikipedia. In Proceedings of the 2nd International Workshop on Cross Lingual Information Access - Addressing the Information Need of Multilingual Societies (CLIA 08). 34--41.Google Scholar
- Kang, B.-J. and Choi, K.-S. 2000. Two approaches for the resolution of word mismatch problem caused by English words and foreign words in Korean information retrieval. In Proceedings of the 5th International Workshop on Information Retrieval with Asian Languages. (IRAL '00). ACM, New York, 133--140. Google Scholar
Digital Library
- Kang, I.-H. and Kim, G. 2000. English-to-Korean transliteration using multiple unbounded overlapping phoneme chunks. In Proceedings of the 18th Conference on Computational Linguistics - Volume 1. Association for Computational Linguistics, 418--424. Google Scholar
Digital Library
- Kang, I.-S., Na, S.-H., and Lee, J.-H. 2004. POSTECH at NTCIR-4: CJKE monolingual and Korean-related cross-language retrieval experiments. In Proceedings of the 4th NTCIR Workshop. National Institute of Informatics.Google Scholar
- Kashioka, H., Maruyama, T., and Tanaka, H. 2003. Building a parallel corpus for monologues with clause alignment. In Proceedings of the Machine Translation Summit (MT Summit IX). 216--223.Google Scholar
- Keskustalo, H., Pirkola, A., Visala, K., Leppanen, E., and Jarvelin, K. 2003. Non-Adjacent digrams improve matching of cross-lingual spelling variants. In Proceedings of String Processing and Information Retrieval: 10th International Symposium (SPIRE 03). 252--265.Google Scholar
- Kishida, K. 2008. Prediction of performance of cross-language information retrieval using automatic evaluation of translation. Libr. Info. Sci. Res. 30, 2, 138--144.Google Scholar
Cross Ref
- Kishida, K. and Kando, N. 2005. Hybrid approach of query and document translation with pivot language for cross-language information retrieval. In Proceedings of the Workshop on Cross-Language Information Retrieval and Evaluation. Google Scholar
Digital Library
- Knight, K. and Graehl, J. 1998. Machine transliteration. Comput. Linguist. 24, 4, 599--612. Google Scholar
Digital Library
- Korn, M., Schulz, S., Medelyan, O., and Hahn, U. 2005. Bootstrapping dictionaries for cross-language information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 528--535. Google Scholar
Digital Library
- Kraaij, W. 2003. Exploring transitive translation methods. In Proceedings of 4th Dutch-Belgian Information Retrieval Workshop.Google Scholar
- Kraaij, W., Nie, J.-Y., and Simard, M. 2003. Embedding web-based statistical translation models in cross-language information retrieval. Comput. Linguist. 29, 3, 381--419. Google Scholar
Digital Library
- Kuriyama, K., Kando, N., Nozue, T., and Eguchi, K. 2002. Pooling for a large-scale test collection: An analysis of the search results from the first NTCIR workshop. Inf. Retr. 5, 41--59. Google Scholar
Digital Library
- Kwok, K. L. 1999. English-Chinese cross-language retrieval based on a translation package. In Proceedings of the Machine Translation Summit VII Workshop of Machine Translation for Cross Language Information Retrieval. 8--13.Google Scholar
- Kwok, K. L. 2000. Exploiting a Chinese-English bilingual wordlist for English-Chinese cross language information retrieval. In Proceedings of the 5th International Workshop on Information Retrieval with Asian Languages. ACM Press, New York, 173--179. Google Scholar
Digital Library
- Kwok, K. L. and Grunfeld, L. 1996. TREC-5 English and Chinese retrieval experiments using PIRCS. In Proceedings of TREC-5. 133--142.Google Scholar
- Lancaster, F. and Fayen, E. 1973. Information Retrieval On-Line. Melville Publishing Co., Los Angeles, CA.Google Scholar
- Landauer, T. K., Foltz, P. W., and Laham, D. 1998. An introduction to latent semantic analysis. Discourse Process. 25, 259--284.Google Scholar
Cross Ref
- Lavrenko, V., Choquette, M., and Croft, W. B. 2002. Cross-lingual relevance models. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 175--182. Google Scholar
Digital Library
- Lavrenko, V. and Croft, W. B. 2001. Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 120--127. Google Scholar
Digital Library
- Lee, C.-J., Chen, C.-H., Kao, S.-H., and Cheng, P.-J. 2010. To translate or not to translate? In Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 651--658. Google Scholar
Digital Library
- Leek, T., Jin, H., Sista, S., and Schwartz, R. 2000. The BBN cross-lingual topic detection and tracking system. In Working Notes of the 3rd Topic Detection and Tracking Workshop. National Institutes of Standards and Technology.Google Scholar
- Lehtokangas, R., Airio, E., J, K., and rvelin. 2004. Transitive dictionary translation challenges direct dictionary translation in CLIR. Inf. Process. Manage. 40, 6, 973--988. Google Scholar
Digital Library
- Lehtokangas, R., Keskustalo, H., and Järvelin, K. 2008. Experiments with transitive dictionary translation and pseudo-relevance feedback using graded relevance assessments. J. Amer. Soc. Inf. Sci. Technol. 59, 476--488. Google Scholar
Digital Library
- Leveling, J., Zhou, D., Jones, G. J. F., and Wade, V. 2009. Document expansion, query translation and language modeling for ad-hoc IR. In Proceedings of the 10th Cross-language Evaluation Forum Conference on Multilingual Information Access Evaluation: Text Retrieval Experiments (CLEF'09). Springer, 58--61. Google Scholar
Digital Library
- Levow, G.-A. and Oard, D. W. 2000. Translingual topic tracking with PRISE. In Working Notes of the 3rd Topic Detection and Tracking Workshop. National Institutes of Standards and Technology.Google Scholar
- Levow, G.-A., Oard, D. W., and Resnik, P. 2005. Dictionary-Based techniques for cross-language information retrieval. Inf. Process. Manage. 41, 3, 523--547. Google Scholar
Digital Library
- Lin, M.-C., Li, M.-X., Hsu, C.-C., and Wu, S.-H. 2010. Query expansion from Wikipedia and topic Web crawler on CLIR. In Proceedings of the 8th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access. 101--106.Google Scholar
- Liu, X. and Croft, W. B. 2005. Statistical language modeling for information retrieval. Ann. Rev. Info. Sci. Technol. 39, 1, 1--31.Google Scholar
Cross Ref
- Liu, Y., Jin, R., and Chai, J. Y. 2005. A maximum coherence model for dictionary-based cross-language information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 536--543. Google Scholar
Digital Library
- Lopez, A. 2008. Statistical machine translation. ACM Comput. Surv. 40, 3, 1--49. Google Scholar
Digital Library
- Loponen, A. and Järvelin, K. 2010. A dictionary- and corpus-independent statistical lemmatizer for information retrieval in low resource languages. In Proceedings of the International Conference on Multilingual and Multimodal Information Access Evaluation: Cross-language Evaluation Forum (CLEF'10). Springer, 3--14. Google Scholar
Digital Library
- Lu, W.-H., Chien, L.-F., and Lee, H.-J. 2002. Translation of web queries using anchor text mining. ACM Trans. Asian Lang. Info. Process. 1, 2, 159--172. Google Scholar
Digital Library
- Lu, W.-H., Chien, L.-F., and Lee, H.-J. 2004. Anchor text mining for translation of web queries: A transitive translation approach. ACM Trans. Inf. Syst. 22, 2, 242--269. Google Scholar
Digital Library
- Maeda, A., Sadat, F., Yoshikawa, M., and Uemura, S. 2000. Query term disambiguation for web cross-language information retrieval using a search engine. In Proceedings of the 5th International Workshop on Information Retrieval with Asian Languages. ACM Press, 25--32. Google Scholar
Digital Library
- Majumder, P., Mitra, M., Parui, S. K., Kole, G., Mitra, P., and Datta, K. 2007. YASS: Yet another suffix stripper. ACM Trans. Info. Syst. 25, 4, 18:1--3:20. Google Scholar
Digital Library
- Manning, C. D., Raghavan, P., and Schtze, H. 2008. Introduction to Information Retrieval. Cambridge University Press. Google Scholar
Digital Library
- Mayfield, J. and McNamee, P. 2004. Triangulation without translation. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 490--491. Google Scholar
Digital Library
- McCarley, J. S. 1999. Should we translate the documents or the queries in cross-language information retrieval? In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics. Association for Computational Linguistics, 208--214. Google Scholar
Digital Library
- McEwan, C. J. A., Ounis, I., and Ruthven, I. 2002. Building bilingual dictionaries from parallel web documents. In Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval. Springer, 303--323. Google Scholar
Digital Library
- McNamee, P. and Mayfield, J. 2002. Comparing cross-language query expansion techniques by degrading translation resources. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 159--166. Google Scholar
Digital Library
- McNamee, P. and Mayfield, J. 2004a. Character n-gram tokenization for European language text retrieval. Inf. Retr. 7, 1-2, 73--97. Google Scholar
Digital Library
- McNamee, P. and Mayfield, J. 2004b. Cross-Language retrieval using HAIRCUT at CLEF 2004. In Proceedings of the Workshop on Cross-Language Information Retrieval and Evaluation. Google Scholar
Digital Library
- McNamee, P., Mayfield, J., and Piatko, C. 2002. HAIRCUT: A system for multilingual text retrieval in java. J. Comput. Small Coll. 17, 8--22. Google Scholar
Digital Library
- Melamed, I. D. 2000. Models of translational equivalence among words. Comput. Linguist. 26, 221--249. Google Scholar
Digital Library
- Melucci, M. and Orio, N. 2003. A novel method for stemmer generation based on hidden markov models. In Proceedings of the 12th International Conference on Information and Knowledge Management. ACM Press, New York, 131--138. Google Scholar
Digital Library
- Meng, H., Chen, B., Khudanpur, S., Levow, G.-A., Lo, W.-K., Oard, D., Schone, P., Tang, K., Wang, H.-M., and Wang, J. 2001. Mandarin-English information (MEI): Investigating translingual speech retrieval. In Proceedings of the 1st International Conference on Human Language Technology Research. Association for Computational Linguistics, 1--7. Google Scholar
Digital Library
- Meng, H., Khudanpur, S., Levow, G., Oard, D. W., and Wang, H.-M. 2000. Mandarin-English information (MEI): Investigating translingual speech retrieval. In Proceedings of the NAACL-ANLP Workshop on Embedded Machine Translation Systems. Vol 5, Association for Computational Linguistics, 23--30. Google Scholar
Digital Library
- Miller, D. R. H., Leek, T., and Schwartz, R. M. 1999. A hidden markov model information retrieval system. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 214--221. Google Scholar
Digital Library
- Monz, C. and Dorr, B. J. 2005. Iterative translation disambiguation for cross-language information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 520--527. Google Scholar
Digital Library
- Moreau, F., Claveau, V., and Sebillot, P. 2007. Automatic morphological query expansion using analogy-based machine learning. In Proceedings of the 29th European Conference on IR Research. Springer, 222--233. Google Scholar
Digital Library
- Mori, T., Kokubu, T., and Tanaka, T. 2001. Cross-Lingual information retrieval based on LSI with multiple word spaces. In Proceedings of the 2nd NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access.Google Scholar
- Nie, J.-y. 1998. Using a probabilistic translation model for cross-language information retrieval. In Proceedings of the 6th Workshop on Very Large Corpora. Morgan Kaufmann Publishers.Google Scholar
- Nie, J.-y. 2010. Cross-Language Information Retrieval. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers. Google Scholar
Digital Library
- Nie, J.-Y. and Ren, F. 1999. Chinese information retrieval: Using characters or words? Info. Process. Manag. 35, 4, 443--462.Google Scholar
Cross Ref
- Nie, J.-Y., Simard, M., Isabelle, P., and Durand, R. 1999. Cross-Language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 74--81. Google Scholar
Digital Library
- Oard, D. W. 1998. A comparative study of query and document translation for cross-language information retrieval. In Proceedings of the 3rd Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup. Springer, 472--483. Google Scholar
Digital Library
- Oard, D. W. 1999. Topic tracking with the PRISE information retrieval system. In Proceedings of the DARPA Broadcast News Workshop. 209--211.Google Scholar
- Oard, D. W. and Dorr, B. J. 1996. A survey of multilingual text retrieval. Tech. rep., University. of Maryland Institute for Advanced Computer Studies report no. UMIACS-TR-96-19, University of Maryland at College Park, MD. Google Scholar
Digital Library
- Oard, D. W. and Ertunc, F. 2002. Translation-Based indexing for cross-language retrieval. In Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval. Springer, 324--333. Google Scholar
Digital Library
- Oard, D. W. and Hackett, P. 1997. Document translation for cross-langauge text retrieval at the university of Maryland. In Proceedings of the 6th Text Retrieval Conference (TREC-6). NIST, 687--696.Google Scholar
- Oard, D. W., Levow, G.-A., and Cabezas, C. I. 2000. CLEF experiments at Maryland: Statistical stemming and backoff translation. In Proceedings of Evaluation of Cross-Language Information Retrieval Systems: Third Workshop of the Cross-Language Evaluation Forum. Google Scholar
Digital Library
- Oard, D. W. and Wang, J. 2001. NTCIR-2 ECIR experiments at Maryland: Comparing pirkola's structured queries and balanced translation. In Proceedings of the 2nd NTCIR Workshop on Research in Chinese & Japanese, Text Retrieval and Text Summarization. National Institute of Informatics.Google Scholar
- Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Linguist. 29, 1, 19--51. Google Scholar
Digital Library
- Parton, K., McKeown, K. R., Allan, J., and Henestroza, E. 2008. Simultaneous multilingual search for translingual information retrieval. In Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, New York, 719--728. Google Scholar
Digital Library
- Peters, C. and Picchi, E. 1996. A system for cross-language information retrieval. ERCIM News 27.Google Scholar
- Peters, C. and Sheridan, P. 2001. Lectures on Information Retrieval. Springer, Chapter Multilingual information Access, 51--80. Google Scholar
Digital Library
- Pirkola, A. 1998. The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 55--63. Google Scholar
Digital Library
- Pirkola, A., Keskustalo, H., Leppanen, E., Kansala, A.-P., and Jarvelin, K. 2002. Targeted s-gram matching: A novel n-gram matching technique for cross- and monolingual word form variants. Info. Res. 7, 2.Google Scholar
- Pirkola, A., Puolamäki, D., and Järvelin, K. 2003a. Applying query structuring in cross-language retrieval. Inf. Process. Manage. 39, 391--402. Google Scholar
Digital Library
- Pirkola, A., Toivonen, J., Keskustalo, H., Visala, K., J, K., and rvelin. 2003b. Fuzzy translation of cross-lingual spelling variants. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 345--352. Google Scholar
Digital Library
- Ponte, J. M. and Croft, W. B. 1998. A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 275--281. Google Scholar
Digital Library
- Porter, M. F. 1980. An algorithm for suffix stripping. Program 14, 130--137.Google Scholar
Cross Ref
- Potthast, M., Stein, B., and Anderka, M. 2008. A Wikipedia-based multilingual retrieval model. In Proceedings of 30th European Conference on Information Retrieval. Springer, 522--530. Google Scholar
Digital Library
- Qu, Y., Grefenstette, G., and Evans, D. A. 2003. Automatic transliteration for Japanese-to-English text retrieval. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 353--360. Google Scholar
Digital Library
- Resnik, P. 1998. Parallel strands: A preliminary investigation into mining the web for bilingual text. In Proceedings of the 3rd Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup. Springer, 72--82. Google Scholar
Digital Library
- Resnik, P. 1999. Mining the web for bilingual text. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics. Association for Computational Linguistics, 527--534. Google Scholar
Digital Library
- Resnik, P., Oard, D., and Levow, G. 2001. Improved cross-language retrieval using backoff translation. In Proceedings of the 1st International Conference on Human Language Technology Research (HLT '01). Association for Computational Linguistics, 1--3. Google Scholar
Digital Library
- Resnik, P. and Smith, N. A. 2003. The Web as a parallel corpus. Comput. Linguist. 29, 3, 349--380. Google Scholar
Digital Library
- Robertson, A. and Willett, P. 1998. Applications of n-grams in textual information systems. J. Document. 54, 1, 48--69.Google Scholar
Cross Ref
- Rocchio, J. 1971. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing, 313--323.Google Scholar
- Ruiz, M., Diekema, A., and Sheridan, P. 1999. CINDOR conceptual interlingua document retrieval: TREC-8 evaluation. In Proceedings of the 8th Text Retrieval Conference (TREC-8).Google Scholar
- Sakai, T., Kando, N., Lin, C.-J., Mitamura, T., Shima, H., Ji, D., Chen, K.-H., and Nyberg, E. 2008. Overview of the NTCIR-7 ACLIA IR4QA task. In Proceedings of the 7th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access. 77--114.Google Scholar
- Sakai, T., Shima, H., Kando, N., Song, R., Lin, C.-J., Mitamura, T., Sugimito, M., and Lee, C.-W. 2010. Overview of NTCIR-8 ACLIA IR4QA. In Proceedings of the 8th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access. 63--93.Google Scholar
- Salton, G. 1971. The SMART Retrieval System & Experiments in Automatic Document Processing. Prentice-Hall, Inc., Upper Saddle River, NJ. Google Scholar
Digital Library
- Salton, G., Fox, E. A., and Wu, H. 1983. Extended Boolean information retrieval. Comm. ACM 26, 1022--1036. Google Scholar
Digital Library
- Salton, G., Wong, A., and Yang, C. S. 1975. A vector space model for automatic indexing. Comm. ACM 18, 613--620. Google Scholar
Digital Library
- Savoy, J. 2004. Combining multiple strategies for effective monolingual and cross-language retrieval. Inf. Retr. 7, 121--148. Google Scholar
Digital Library
- Savoy, J. 2005. Comparative study of monolingual and multilingual search models for use with asian languages. ACM Trans. Asian Lang. Inf. Process. 4, 2, 163--189. Google Scholar
Digital Library
- Savoy, J. 2007. Why do successful search systems fail for some topics. In Proceedings of the ACM Symposium on Applied Computing (SAC '07). ACM, New York, 872--877. Google Scholar
Digital Library
- Savoy, J. and Dolamic, L. 2009. How effective is google's translation service in search? Comm. ACM 52, 139--143. Google Scholar
Digital Library
- Schäuble, P. 1993. SPIDER: A multiuser information retrieval system for semistructured and dynamic data. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 318--327. Google Scholar
Digital Library
- Schönhofen, P., Benczúr, A., Bíró, I., and Csalogány, K. 2008. Cross-Language retrieval with Wikipedia. In Proceedings of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF 07) (Revised Selected Papers). Springer, 72--79.Google Scholar
- Shannon, C. E. and Weaver, W. 1963. A Mathematical Theory of Communication. University of Illinois Press. Google Scholar
Digital Library
- Shaw, J. A. and Fox, E. A. 1994. Combination of multiple searches. In the Proceedings of the 2nd Text REtrieval Conference (TREC-2). 243--252.Google Scholar
- Sheridan, P. and Ballerini, J. P. 1996. Experiments in multilingual information retrieval using the SPIDER system. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 58--65. Google Scholar
Digital Library
- Shi, L. 2010. Mining OOV translations from mixed-language Web pages for cross language information retrieval. In Proceedings of the 32nd European Conference on Information Retrieval (ECIR 10). 471--482. Google Scholar
Digital Library
- Shi, L., Nie, J.-Y., and Bai, J. 2007. Comparing different units for query translation in Chinese cross-language information retrieval. In Proceedings of the 2nd International Conference on Scalable Information Systems. 1--9. Google Scholar
Digital Library
- Singhal, A. and Pereira, F. 1999. Document expansion for speech retrieval. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 34--41. Google Scholar
Digital Library
- Snajder, J., Basic, B. D., and Tadic, M. 2008. Automatic acquisition of inflectional lexica for morphological normalisation. Inf. Process. Manage. 44, 5, 1720--1731. Google Scholar
Digital Library
- Song, F. and Croft, W. B. 1999. A general language model for information retrieval. In Proceedings of the 8th International Conference on Information and Knowledge Management. ACM Press, New York, 316--321. Google Scholar
Digital Library
- Sorg, P. and Cimiano, P. 2008. Cross-language information retrieval with explicit semantic analysis. In the Working Notes of the CLEF Workshop.Google Scholar
- Sparck Jones, K. 1988. A Statistical Interpretation of Term Specificity and its Application in Retrieval. Taylor Graham Publishing, London, UK, 132--142. Google Scholar
Digital Library
- Su, C.-Y., Lin, T.-C., and Wu, S.-H. 2007. Using wikipedia to translate OOV terms on MLIR. In Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access. 109--115.Google Scholar
- Sun, L., Xue, S., Qu, W., Wang, X., and Sun, Y. 2002. Constructing of a large-scale Chinese-English parallel corpus. In Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization - Volume 12. Association for Computational Linguistics, 1--8. Google Scholar
Digital Library
- Virga, P. and Khudanpur, S. 2003. Transliteration of proper names in cross-lingual information retrieval. In Proceedings of the ACL Workshop on Multilingual and Mixed-Language Named Entity Recognition - Volume 15. Association for Computational Linguistics, 57--64. Google Scholar
Digital Library
- Voorhees, E. M. and Harman, D. 2000. Overview of the ninth text retrieval conference (trec-9). In Proceedings of the 9th Text REtrieval Conference (TREC-9). 1--14.Google Scholar
Cross Ref
- Wang, J. and Oard, D. W. 2006. Combining bidirectional translation and synonymy for cross-language information retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 202--209. Google Scholar
Digital Library
- Wong, S. K. M., Ziarko, W., and Wong, P. C. N. 1985. Generalized vector spaces model in information retrieval. In Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 18--25. Google Scholar
Digital Library
- Xu, J. and Weischedel, R. 2000. Cross-Lingual information retrieval using hidden markov models. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora Held in conjunction with The 38th Annual Meeting of the Association for Computational Linguistics. Vol. 13, Association for Computational Linguistics, 95--103. Google Scholar
Digital Library
- Xu, J. and Weischedel, R. 2005. Empirical studies on the impact of lexical resources on CLIR performance. Inf. Process. Manage. 41, 3, 475--487. Google Scholar
Digital Library
- Xu, J., Weischedel, R., and Nguyen, C. 2001. Evaluating a probabilistic model for cross-lingual information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 105--110. Google Scholar
Digital Library
- Yang, C. C. and Li, K. W. 2002. Mining English/Chinese parallel documents from the World Wide Web. In Proceedings of the 11th International World Wide Web Conference. ACM Press, New York, 188--192.Google Scholar
- Zhai, C. 2009. Statistical Language Models for Information Retrieval. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers. Google Scholar
Digital Library
- Zhai, C. and Lafferty, J. 2001. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 334--342. Google Scholar
Digital Library
- Zhang, Y., Uchimoto, K., Ma, Q., and Isahara, H. 2005a. Building an annotated Japanese-Chinese parallel corpus - A part of NICT multilingual corpora. In Proceedings of the 10th Machine Translation Summit MT Summit X. 71--78.Google Scholar
- Zhang, Y. and Vines, P. 2004. Using the web for automated translation extraction in cross-language information retrieval. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 162--169. Google Scholar
Digital Library
- Zhang, Y., Vines, P., and Zobel, J. 2005b. Chinese OOV translation and post-translation query expansion in Chinese-English cross-lingual information retrieval. ACM Trans. Asian Lang. Info. Process. 4, 2, 57--77. Google Scholar
Digital Library
- Zhou, D., Truran, M., Brailsford, T., and Ashman, H. 2007. NTCIR-6 experiments using pattern matched translation extraction. In Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access. 145--151.Google Scholar
- Zhou, D., Truran, M., Brailsford, T., and Ashman, H. 2008a. A hybrid technique for English-Chinese cross language information retrieval. ACM Trans. Asian Lang. Info. Process. 7, 5:1--5:35. Google Scholar
Digital Library
- Zhou, D., Truran, M., Brailsford, T., Ashman, H., and Goulding, J. 2008b. Gcon: A graph-based technique for resolving ambiguity in query translation candidates. In Proceedings of the 23rd Annual ACM Symposium on Applied Computing. 1566--1573. Google Scholar
Digital Library
- Zhu, J. and Wang, H. 2006. The effect of translation quality in MT-based cross-language information retrieval. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. 593--600. Google Scholar
Digital Library
- Zobel, J. and Dart, P. 1995. Finding approximate matches in large lexicons. Softw. Practi. Exper. 25, 3, 331--345. Google Scholar
Digital Library
Index Terms
Translation techniques in cross-language information retrieval





Comments