skip to main content
research-article

Matching Graph, a Method for Extracting Parallel Information from Comparable Corpora

Authors Info & Claims
Published:25 July 2019Publication History
Skip Abstract Section

Abstract

Comparable corpora are valuable alternatives for the expensive parallel corpora. They comprise informative parallel fragments that are useful resources for different natural language processing tasks. In this work, a generative model is proposed for efficient extraction of parallel fragments from a pair of comparable documents. The core of the proposed model is a graph called the Matching Graph. The ability of the Matching Graph to be trained on a small initial seed makes it a proper model for language pairs suffering from the scarce resource problem. Experiments show that the Matching Graph performs significantly better than other recently published models. According to the experiments on English-Persian and Arabic-Persian language pairs, the extracted parallel fragments can be used instead of parallel data for training statistical machine translation systems. Results reveal that the extracted fragments in the best case are able to retrieve about 90% of the information of a statistical machine translation system that is trained on a parallel corpus. Moreover, it is shown that using the extracted fragments as additional information for training statistical machine translation systems leads to an improvement of about 2% for English-Persian and about 1% for Arabic-Persian translation on BLEU score.

References

  1. H. Afli, L. Barrault, and H. Schwenk. 2014. Multimodal comparable corpora for machine translation. In Proceedings of the 7th International Workshop on Building and Using Comparable Corpora, Building Resources for Machine Translation Research, Co-located with LREC 2014. 22--27.Google ScholarGoogle Scholar
  2. H. Afli, L. Barrault, and H. Schwenk. 2016. Building and using multimodal comparable corpora for machine translation. Natural Language Engineering 22, 4 (2016), 603--625.Google ScholarGoogle ScholarCross RefCross Ref
  3. A. Aker, Y. Feng, and R. Gaizauskas. 2012. Automatic bilingual phrase extraction from comparable corpora. In Proceedings of the 24th International Conference on Computational Linguistics. 23--32.Google ScholarGoogle Scholar
  4. D. Andrade, T. Matsuzaki, and J. Tsujii. 2012. Statistical extraction and comparison of pivot words for bilingual lexicon extension. ACM Transactions on Asian Language Information Processing 11, 2 (June 2012), 1--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Apidianaki, N. Ljubesi, and D. Fiser. 2013. Cross-lingual WSD for translation extraction from comparable corpora. In Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Co-located with ACL 2013. Association for Computational Linguistics, Sofia Bulgaria, 1--10.Google ScholarGoogle Scholar
  6. S. Bakhshaei, S. Khadivi, N. Riahi, and H. Sameti. 2010. A study to find influential parameters on a Farsi-English statistical machine translation system. In 5th International Symposium on Telecommunications (IST’10). IEEE, 985--991.Google ScholarGoogle Scholar
  7. S. Bakhshaei, R. Safabakhsh, and S. Khadivi. 2019. Extracting parallel fragments from comparable documents using a generative model. Computer Speech 8 Language 53 (2019), 25--42.Google ScholarGoogle Scholar
  8. Y. Bengio, A. Courville, and P. Vincent. 2013. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 8 (2013), 1798--1828. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Bergsma and B. Van Durme. 2011. Learning bilingual lexicons using the visual similarity of labeled web images. In Proceedings of IJCAI -International Joint Conference on Artificial Intelligence. 1764--1769. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Boyd-Graber and D. M. Blei. 2009. Multilingual topic models for unaligned text. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. 75--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Brown, V. Pietra, S. Pietra, and R. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 2 (1993), 263--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Celuex and J. Diebolt. 1985. The SEM algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problems. Computational Statistics 2 (1985), 73--82.Google ScholarGoogle Scholar
  13. C. Chu, T. Nakazawa, and S. Kurohashi. 2013. Accurate parallel fragment extraction from quasi-comparable corpora using alignment model and translation lexicon. In Proceedings of IJCNLP. 1144--1150.Google ScholarGoogle Scholar
  14. C. Chu, T. Nakazawa, and S. Kurohashi. 2014. Improving statistical machine translation accuracy using bilingual lexicon extraction with paraphrases. In Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing. 2014.Google ScholarGoogle Scholar
  15. C. Chu, T. Nakazawa, and S. Kurohashi. 2015. Integrated parallel sentence and fragment extraction from comparable corpora: A case study on Chinese--Japanese Wikipedia. ACM Transactions on Asian Language Information Processing 15, 2 (Dec. 2015), 1--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, (Aug. 2011), 2493--2537. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Daume III and J. Jagarlamudi. 2011. Domain adaptation for machine translation by mining unseen words. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 407--412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. De Gispert and J. B. Marino. 2006. Catalan-English statistical machine translation without parallel corpus: Bridging through Spanish. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC’06). 65--68.Google ScholarGoogle Scholar
  19. H. Dejean, E. Gaussier, and F. Sadat. 2002. Bilingual terminology extraction: An approach based on a multilingual thesaurus applicable to comparable corpora. In Proceedings of the 19th International Conference on Computational Linguistics COLING. 218--224.Google ScholarGoogle Scholar
  20. M. Diab and S. Finch. 2000. A statistical word-level translation model for comparable corpora. Content-Based Multimedia Information Access 2 (2000), 1500--1508. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Dong, Y. Liu, H. Luan, M. Sun, T. Izuha, and D. Zhang. 2015. Iterative learning of parallel lexicons and phrases from non-parallel corpora. In Proceedings of the 24th International Conference on Artificial Intelligence. 1250--1256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. El Kholy, N. Habash, G. Leusch, E. Matusov, and H. Sawaf. 2013. Language independent connectivity strength features for phrase pivot statistical machine translation. In Proceedings of the 51st Annual Meeting of the ACL. 412--418.Google ScholarGoogle Scholar
  23. M. Erdmann, K. Nakayama, T. Hara, and S. Nishio. 2008. An approach for extracting bilingual terminology from wikipedia. In Proceedings of the International Conference on Database Systems for Advanced Applications. 380--392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Faruqui and C. Dyer. 2014. Improving vector space word representations using multilingual correlation. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. 462--471.Google ScholarGoogle Scholar
  25. D. Fiser and N. Ljubesic. 2011. Bilingual lexicon extraction from comparable corpora for closely related languages. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP’11). 125--131.Google ScholarGoogle Scholar
  26. X. Fu, W. Wei, S. Lu, Z. Chen, and B. Xu. 2013. Phrase-based parallel fragments extraction from comparable corpora. In Proceedings of IJCNLP. 972--976.Google ScholarGoogle Scholar
  27. P. Fung and P. Cheung. 2004. Multi-level bootstrapping for extracting parallel sentences from a quasi-comparable corpus. In Proceedings of the 20th International Conference on Computational Linguistics. 1051. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Fung and K. McKeown. 1997. Finding terminology translations from non-parallel corpora. In Proceedings of the 5th Annual Workshop on Very Large Corpora. 192--202.Google ScholarGoogle Scholar
  29. P. Fung and L. Y. Yee. 1998. An IR approach for translating new words from nonparallel, comparable texts. In Proceedings of the 17th International Conference on Computational Linguistics. 414--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Garera, C. Callison-Burch, and D. Yarowsky. 2009. Improving translation lexicon induction from monolingual corpora via dependency contexts and part-of-speech equivalences. In Proceedings of the 13th Conference on Computational Natural Language Learning. 129--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Gouws, Y. Bengio, and G. Corrado. 2015. BilBOWA: Fast bilingual distributed representations without word alignments. In JMLR Workshop and Conference Proceedings of the 32nd International Conference on Machine Learning. 748--756. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Haghighi, P. Liang, T. Berg-Kirkpatrick, and D. Klein. 2008. Learning bilingual lexicons from monolingual corpora. In Proceedings of ACL. 771--779.Google ScholarGoogle Scholar
  33. K. M. Hammouda and M. S. Kamel. 2004. Document similarity using a phrase indexing graph model. Knowledge and Information Systems 6, 6 (2004), 710--727. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor. 2004. Canonical correlation analysis: An overview with application to learning methods. Neural Computation 16, 12 (2004), 2639--2664. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Hazem, E. Morin, and S. P. Saldarriaga. 2011. Bilingual lexicon extraction from comparable corpora as metasearch. In Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web (BUCC’11). Association for Computational Linguistics, Stroudsburg, PA, 35--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. Hewavitharana and S. Vogel. 2013. Extracting parallel phrases from comparable data. Building and Using Comparable Corpora. Springer, Berlin, Heidelberg, 191–204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Irvine and C. Callison-Burch. 2013. Combining bilingual and comparable corpora for low resource machine translation. In Proceedings of the 8th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Sofia, Bulgaria, 262--270.Google ScholarGoogle Scholar
  38. A. Irvine and C. Callison-Burch. 2013. Supervised bilingual lexicon induction with multiple monolingual signals. In Proceedings of HLT-NAACL. 518--523.Google ScholarGoogle Scholar
  39. G. Jeh and J. Widom. 2002. SimRank: A measure of structural-context similarity. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 538--543. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. H. Kaji, S. Tamamura, and D. Erdenebat. 2008. Automatic construction of a Japanese-Chinese dictionary via English. In Proceedings of LREC, Marrakech (Morocco). 699--706.Google ScholarGoogle Scholar
  41. A. Kilgarriff. 2000. WordNet: An electronic lexical database. JSTOR 76 (2000), 706--708.Google ScholarGoogle Scholar
  42. A. Klementiev, A. Irvine, C. Callison-Burch, and D. Yarowsky. 2012. Toward statistical machine translation without parallel corpora. In Proceedings of the 13th Conference of the European Chapter of the ACL. 130--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, and Others. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. 177--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. P. Koehn and K. Knight. 2002. Learning a translation lexicon from monolingual corpora. In Proceedings of the ACL-02 Workshop on Unsupervised Lexical Acquisition. 9--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. G. Kontonatsios, I. Korkontzelos, J. Tsujii, and S. Ananiadou. 2014. Combining string and context similarity for bilingual term alignment from comparable corpora. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, Doha, Qatar, 1701--1712.Google ScholarGoogle Scholar
  46. G. Kontonatsios, I. Korkontzelos, J. Tsujii, and S. Ananiadou. 2014. Using a random forest classifier to compile bilingual dictionaries of technical terms from comparable corpora. In Proceedings of the 14th Conference of the European Chapter of the ACL. Association for Computational Linguistics, Gothenburg, Sweden, 111--116.Google ScholarGoogle Scholar
  47. S. Kumar, F. J. Och, and W. Macherey. 2007. Improving word alignment with bridge languages. In Proceedings of EMNLP-CoNLL. 42--50.Google ScholarGoogle Scholar
  48. F. Laws, L. Michelbacher, B. Dorow, C. Scheible, U. Heid, and H. Schutze. 2010. A linguistically grounded graph model for bilingual lexicon extraction. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters. 614--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. B. Li and E. Gaussier. 2010. Improving corpus comparability for bilingual lexicon extraction from comparable corpora. In Proceedings of the 23rd International Conference on Computational Linguistics. 644--652. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. B. Li, E. Gaussier, and A. Aizawa. 2011. Clustering comparable corpora for bilingual lexicon extraction. In Proceedings of the 49th Annual Meeting of the ACL: Human Language Technologies, Portland, Oregon, USA. Association for Computational Linguistics, 473--478. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. J. Li, J. Li, X. Fu, M. A. Masud, and J. Z. Huang. 2016. Learning distributed word representation with multi-contextual mixed embedding. Knowledge-Based Systems 106 (2016), 220--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. A. Linard, B. Daille, and E. Morin. 2015. Attempting to bypass alignment from comparable corpora via pivot language. In Proceedings of ACL-IJCNLP 2015. 32--37.Google ScholarGoogle Scholar
  53. C. Liu, Y. Liu, H. Luan, M. Sun, and H. Yu. 2016. Agreement-based learning of parallel lexicons and phrases from non-parallel corpora. In Proceedings of the 54th Annual Meeting of the ACL. 1024--1033.Google ScholarGoogle Scholar
  54. G. S. Mann and D. Yarowsky. 2001. Multipath translation lexicon induction via bridge languages. In Proceedings of the 2nd Meeting of the North American Chapter of the ACL on Language Technologies. 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. A. K. McCallum. 2002. Mallet: A machine learning for language toolkit.Google ScholarGoogle Scholar
  56. T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013. Efficient estimation of word representations in vector space. arXiv:hep-ph/1301.3781Google ScholarGoogle Scholar
  57. T. Mikolov, Q. V. Le, and I. Sutskever. 2013. Exploiting similarities among languages for machine translation. arXiv:hep-ph/1309.4168Google ScholarGoogle Scholar
  58. T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems (2013), 3111--3119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. D. Mimno, H. M. Wallach, J. Naradowsky, D. A. Smith, and A. McCallum. 2009. Polylingual topic models. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 880--889. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. E. Minkov and W. Cohen. 2012. Graph based similarity measures for synonym extraction from parsed text. In Workshop Proceedings of TextGraphs-7 on Graph-bBased Methods for Natural Language Processing. 20--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. D. Munteanu and D. Marcu. 2005. Improving machine translation performance by exploiting non-parallel corpora. Computational Linguistics 31, 4 (2005), 477--504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. D. S. Munteanu and D. Marcu. 2006. Extracting parallel sub-sentential fragments from non-parallel corpora. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL. 81--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. P. Muthukrishnan, D. Radev, and Q. Mei. 2011. Simultaneous similarity learning and feature-weight learning for document clustering. In Proceedings of Textgraphs-6: Graph-Based Methods for Natural Language Processing. 42--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. A. H. Nasution, Y. Murakami, and T. Ishida. 2017. A generalized constraint approach to bilingual dictionary induction for low-resource language families. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 17, 2 (Nov. 2017), 1--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. R. Navigli and P. S. Ponzetto. 2012. BabelNetXplorer: A platform for multilingual lexical knowledge base access and exploration. In Proceedings of the 21st International Conference on World Wide Web. 393--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. R. M. Neal. 2000. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics 9, 2 (2000), 249--265.Google ScholarGoogle Scholar
  67. J. Niehues and A. Waibel. 2011. Using Wikipedia to translate domain-specific terms in SMT. In Proceedings of the 7th IWSLT, Marcello Federico, Mei-Yuh Hwang, Margit Rödder, and Sebastian Stüker (Eds.). 230--237.Google ScholarGoogle Scholar
  68. S. Nielsen. 2000. The stochastic EM algorithm: Estimation and asymptotic results. Bernoulli 6, 3 (2000), 457--489.Google ScholarGoogle ScholarCross RefCross Ref
  69. F. J. Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on ACL. 160--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. P. G. Otero and I. G. Lopez. 2010. Wikipedia as multilingual source of comparable corpora. In Proceedings of the 3rd Workshop on Building and Using Comparable Corpora, LREC. 21--25.Google ScholarGoogle Scholar
  71. S. Pal, P. Pakray, A. Gelbukh, and J. van Genabith. 2015. Mining parallel resources for machine translation from comparable corpora. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics. 534--544.Google ScholarGoogle ScholarCross RefCross Ref
  72. S. Pal, P. Pakray, S. Naskar, and Others. 2014. Automatic building and using parallel resources for SMT from comparable corpora. In Proceedings of the 3rd Workshop on Hybrid Approaches to Translation (HyTra)@ EACL. 48--57.Google ScholarGoogle ScholarCross RefCross Ref
  73. K. Papineni, S. Roukos, T. Ward, and W. Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 311--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. C. Quirk, R. Udupa, and A. Menezes. 2007. Generative models of noisy translations with applications to parallel fragment extraction. In Proceedings of MT Summit XI. 337--384.Google ScholarGoogle Scholar
  75. R. Rahimi, A. Shakery, J. Dadashkarimi, M. Ariannezhad, M. Dehghani, and H. N. Esfahani. 2016. Building a multi-domain comparable corpus using a learning to rank method. Natural Language Engineering 22, 4 (2016), 627--653.Google ScholarGoogle ScholarCross RefCross Ref
  76. R. Rapp. 1995. Identifying word translations in non-parallel texts. In Proceedings of the 33rd Annual Meeting on ACL. 320--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. R. Rapp. 1999. Automatic identification of word translations from unrelated English and German corpora. In Proceedings of the 37th Annual Meeting of the ACL. 519--526. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. R. Rapp, S. Sharoff, and P. Zweigenbaum. 2016. Recent advances in machine translation using comparable corpora. Natural Language Engineering 22, 4 (2016), 501--516.Google ScholarGoogle ScholarCross RefCross Ref
  79. R. Rapp and M. Zock. 2009. Automatic dictionary expansion using non-parallel corpora. Advances in Data Analysis, Data Handling and Business Intelligence (2009), 317--325.Google ScholarGoogle Scholar
  80. M. Razmara, M. Siahbani, R. Haffari, and A. Sarkar. 2013. Graph propagation for paraphrasing out-of-vocabulary words in statistical machine translation. In Proceedings of ACL (1). 1105--1115.Google ScholarGoogle Scholar
  81. A. Saluja, H. Hassan, K. Toutanova, and C. Quirk. 2014. Graph-based semi-supervised learning of translation models from monolingual data. In Proceedings of the 52nd Annual Meeting of the ACL. Association for Computational Linguistics, Baltimore, Maryland, 676--686.Google ScholarGoogle Scholar
  82. A. Saluja and J. Navratil. 2013. Graph-based unsupervised learning of word similarities using heterogeneous feature types. In Proceedings of TextGraphs@ EMNLP. Citeseer, 29--38.Google ScholarGoogle Scholar
  83. C. Schafer and D. Yarowsky. 2002. Inducing translation lexicons via diverse similarity measures and bridge languages. In Proceedings of the 6th Conference on Natural Language Learning. 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. M. Shamsfard, A. Hesabi, H. Fadaei, N. Mansoory, A. Famian, S. Bagherbeigi, E. Fekri, M. Monshizadeh, and S. M. Assi. 2010. Semi automatic development of farsnet; the Persian wordnet. In Proceedings of 5th Global WordNet Conference.Google ScholarGoogle Scholar
  85. D. Shezaf and A. Rappoport. 2010. Bilingual lexicon generation using non-aligned signatures. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 98--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. B. Snyder, R. Barzilay, and K. Knight. 2010. A statistical model for lost language decipherment. In Proceedings of the 48th Annual Meeting of the ACL. 1048--1057. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. S. Soderland, O. Etzioni, S. Weld, D., M. Skinner, J. Bilmes, and Others. 2009. Compiling a massive, multilingual dictionary via probabilistic inference. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP-Volume 1. 262--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. A. Tamura, T. Watanabe, and E. Sumita. 2012. Bilingual lexicon extraction from comparable corpora using label propagation. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, Jeju Island, Korea, 24--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. L. Tang, T. Y. Wang, Y. B. Chen, and T. Y. Wang. 2015. Problems of alignment in Paraconc for a case study. In Proceedings of the 2014 Asia-Pacific Conference on Computer Science and Applications (CSAC’14). 57--62.Google ScholarGoogle Scholar
  90. G. Tholpadi, C. Bhattacharyya, and S. Shevade. 2017. Corpus-based translation induction in Indian languages using auxiliary language corpora from Wikipedia. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 16, 3 (March 2017), 1--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. J. Turian, L. Ratinov, and Y. Bengio. 2010. Word representations: A simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the ACL. 384--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. I. Vulić, W. De Smet, and M. Moens. 2011. Identifying word translations from comparable corpora using latent topic models. In Proceedings of the 49th Annual Meeting of the ACL: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, USA, 479--484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. I. Vulić and M. Moens. 2012. Detecting highly confident word translations from comparable corpora without any prior knowledge. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL’12). Association for Computational Linguistics, Stroudsburg, PA, 449--459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. K. Wolk and K. Marasek. 2014. Building subject-aligned comparable corpora and mining it for truly parallel sentence pairs, Vol. 18. Elsevier, 126--132.Google ScholarGoogle Scholar
  95. H. Wu and H. Wang. 2007. Pivot language approach for phrase-based statistical machine translation. Machine Translation 21, 3 (2007), 165--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. L. Xiang, Y. Zhou, and C. Zong. 2013. An efficient framework to extract parallel units from comparable data. Natural Language Processing and Chinese Computing (2013), 151--163.Google ScholarGoogle Scholar
  97. K. Yu and J. Tsujii. 2009. Bilingual dictionary extraction from wikipedia. In Proceedings of Machine Translation Summit XII. 121--124.Google ScholarGoogle Scholar
  98. C. Zhang and T. Zhao. 2015. Bilingual lexicon extraction using locally weighted linear regression from comparable corpora. In Proceedings of the International Conference on Asian Language (IALP’15). 13--16.Google ScholarGoogle Scholar
  99. Y. Zhang, M. M. Rahman, A. Braylan, B. Dang, H. Chang, H. Kim, Q. McNamara, A. Angert, E. Banner, V. Khetan, and T. McDonnell. 2016. Neural information retrieval: A literature review. (2016). arXiv:arXiv:1611.06792.Google ScholarGoogle Scholar
  100. Z. Zhu, M. Li, L. Chen, and Z. Yang. 2013. Building comparable corpora based on bilingual LDA model. In Proceedings of ACL (2). 278--282.Google ScholarGoogle Scholar
  101. Z. Zhu, X. Zeng, S. Zheng, X. Sun, S. Wang, and S. Weng. 2016. A mutual iterative enhancement model for simultaneous comparable corpora and bilingual lexicons construction. In Proceedings of the 9th Workshop on Building and Using Comparable Corpora, Co-located with LREC 2016. 27--33.Google ScholarGoogle Scholar

Index Terms

  1. Matching Graph, a Method for Extracting Parallel Information from Comparable Corpora

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Article Metrics

          • Downloads (Last 12 months)9
          • Downloads (Last 6 weeks)2

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!