skip to main content
research-article

Plan Optimization to Bilingual Dictionary Induction for Low-resource Language Families

Published:15 March 2021Publication History
Skip Abstract Section

Abstract

Creating bilingual dictionary is the first crucial step in enriching low-resource languages. Especially for the closely related ones, it has been shown that the constraint-based approach is useful for inducing bilingual lexicons from two bilingual dictionaries via the pivot language. However, if there are no available machine-readable dictionaries as input, we need to consider manual creation by bilingual native speakers. To reach a goal of comprehensively create multiple bilingual dictionaries, even if we already have several existing machine-readable bilingual dictionaries, it is still difficult to determine the execution order of the constraint-based approach to reducing the total cost. Plan optimization is crucial in composing the order of bilingual dictionaries creation with the consideration of the methods and their costs. We formalize the plan optimization for creating bilingual dictionaries by utilizing Markov Decision Process (MDP) with the goal to get a more accurate estimation of the most feasible optimal plan with the least total cost before fully implementing the constraint-based bilingual lexicon induction. We model a prior beta distribution of bilingual lexicon induction precision with language similarity and polysemy of the topology as \(\) and \(\) parameters. It is further used to model cost function and state transition probability. We estimated the cost of all investment plans as a baseline for evaluating the proposed MDP-based approach with total cost as an evaluation metric. After utilizing the posterior beta distribution in the first batch of experiments to construct the prior beta distribution in the second batch of experiments, the result shows 61.5% of cost reduction compared to the estimated all investment plans and 39.4% of cost reduction compared to the estimated MDP optimal plan. The MDP-based proposal outperformed the baseline on the total cost.

References

  1. Carlos Ansótegui, María Luisa Bonet, and Jordi Levy. 2009. Solving (weighted) partial MaxSAT through satisfiability testing. In Proceedings of the International Conference on Theory and Applications of Satisfiability Testing (SAT’09). Springer, 427–440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Richard Bellman. 2013. Dynamic Programming. Courier Corporation.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Fredrick Jelinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin. 1990. A statistical approach to machine translation. Comput. Ling. 16, 2 (1990), 79–85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Doshi, R. Goodwin, R. Akkiraju, and K. Verma. 2004. Dynamic workflow composition using Markov decision processes. In Proceedings of the IEEE International Conference on Web Services. 576–582. DOI:https://doi.org/10.1109/ICWS.2004.1314784 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Javier Fente, Kraig Knutson, and Cliff Schexnayder. 1999. Defining a beta distribution function for construction simulation. In Proceedings of the 31st Conference on Winter Simulation: Simulation—A Bridge to the Future - Volume 2 (WSC’99). ACM, New York, NY, 1010–1015. DOI:https://doi.org/10.1145/324898.324983 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Pascale Fung. 1995. Compiling bilingual lexicon entries from a non-parallel English-Chinese corpus. In Proceedings of the 3rd Workshop on Very Large Corpora. 173–183.Google ScholarGoogle Scholar
  7. Pascale Fung. 1998. A statistical view on bilingual lexicon extraction: From parallel corpora to non-parallel corpora. In Machine Translation and the Information Soup. Springer, 1–17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Arjun K. Gupta and Saralees Nadarajah. 2004. Handbook of Beta Distribution and Its Applications. CRC Press.Google ScholarGoogle Scholar
  9. Eric W. Holman, Cecil H. Brown, Søren Wichmann, André Müller, Viveka Velupillai, Harald Hammarström, Sebastian Sauppe, Hagen Jung, Dik Bakker, Pamela Brown et al. 2011. Automated dating of the world’s language families based on lexical similarity. Curr. Anthrop. 52, 6 (2011), 841–875.Google ScholarGoogle ScholarCross RefCross Ref
  10. Ronald A Howard. 1960. Dynamic Programming and Markov Processes. The MIT Press.Google ScholarGoogle Scholar
  11. Toru Ishida. 2016. Intercultural collaboration and support systems: A brief history. In Proceedings of the International Conference on Principles and Practice of Multi-agent Systems (PRIMA’16). Springer, 3–19.Google ScholarGoogle ScholarCross RefCross Ref
  12. T. Ishida, Y. Murakami, D. Lin, T. Nakaguchi, and M. Otani. 2018. Language service infrastructure on the web: The language grid. Computer 51, 6 (June 2018), 72–81. DOI:https://doi.org/10.1109/MC.2018.2701643Google ScholarGoogle ScholarCross RefCross Ref
  13. M. Paul Lewis, Gary F. Simons, and Charles D. Fennig (Eds.). 2015. Ethnologue: Languages of the World (18th ed.). SIL International, Dallas, Texas. Retrieved from http://www.ethnologue.com.Google ScholarGoogle Scholar
  14. Yohei Murakami. 2019. Indonesia language sphere: An ecosystem for dictionary development for low-resource languages. In Journal of Physics: Conference Series, Vol. 1192. IOP Publishing, 012001.Google ScholarGoogle Scholar
  15. Arbi Haza Nasution. 2018. Pivot-based hybrid machine translation to support multilingual communication for closely related languages. World Trans. Eng. Technol. Educ. 16, 2 (2018), 12–17.Google ScholarGoogle Scholar
  16. Arbi Haza Nasution, Evizal Abdul Kadir, Yohei Murakami, and Toru Ishida. 2020. Toward Formalization of Comprehensive Bilingual Dictionaries Creation Planning as Constraint Optimization Problem. Springer Singapore, 41–54. DOI:https://doi.org/10.1007/978-981-15-2655-8_3Google ScholarGoogle Scholar
  17. Arbi Haza Nasution, Yohei Murakami, and Toru Ishida. 2016. Constraint-based bilingual lexicon induction for closely related languages. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). 3291–3298. Google ScholarGoogle Scholar
  18. Arbi Haza Nasution, Yohei Murakami, and Toru Ishida. 2017. A generalized constraint approach to bilingual dictionary induction for low-resource language families. ACM Trans. Asian Low-Resour. Lang. Inf. Proc. 17, 2 (Nov. 2017). DOI:https://doi.org/10.1145/3138815 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Arbi Haza Nasution, Yohei Murakami, and Toru Ishida. 2017. Plan optimization for creating bilingual dictionaries of low-resource languages. In Proceedings of the International Conference on Culture and Computing. 35–41. DOI:https://doi.org/10.1109/Culture.and.Computing.2017.21Google ScholarGoogle ScholarCross RefCross Ref
  20. Arbi Haza Nasution, Yohei Murakami, and Toru Ishida. 2018. Designing a collaborative process to create bilingual dictionaries of indonesian ethnic languages. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’18). European Language Resources Association (ELRA), 3397–3404. Google ScholarGoogle Scholar
  21. Arbi Haza Nasution, Yohei Murakami, and Toru Ishida. 2019. Generating similarity cluster of Indonesian languages with semi-supervised clustering. Int. J. Electric. Comput. Eng. 9, 1 (2019), 1–8.Google ScholarGoogle Scholar
  22. Arbi Haza Nasution, Nesi Syafitri, Panji Rahmat Setiawan, and Des Suryani. 2017. Pivot-based hybrid machine translation to support multilingual communication. In Proceedings of the International Conference on Culture and Computing. 147–148. DOI:https://doi.org/10.1109/Culture.and.Computing.2017.22Google ScholarGoogle ScholarCross RefCross Ref
  23. Reinhard Rapp. 1995. Identifying word translations in non-parallel texts. In Proceedings of the 33rd Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 320–322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Stuart J. Russell and Peter Norvig. 2016. Artificial Intelligence: A Modern Approach. Pearson Education Limited. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Stephen Soderland, Oren Etzioni, Daniel S. Weld, Michael Skinner, Jeff Bilmes, et al. 2009. Compiling a massive, multilingual dictionary via probabilistic inference. In Proceedings of the Joint Conference of the 47th Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1. Association for Computational Linguistics, 262–270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Morris Swadesh. 1955. Towards greater accuracy in lexicostatistic dating. Int. J. Amer. Ling. 21, 2 (1955), 121–137.Google ScholarGoogle ScholarCross RefCross Ref
  27. Kumiko Tanaka and Kyoji Umemura. 1994. Construction of a bilingual dictionary intermediated by a third language. In Proceedings of the 15th Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 297–303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Douglas J. White. 1993. A survey of applications of Markov decision processes. J. Oper. Res. Soc. 44, 11 (1993), 1073–1096.Google ScholarGoogle ScholarCross RefCross Ref
  29. Mairidan Wushouer, Donghui Lin, Toru Ishida, and Katsutoshi Hirayama. 2015. A constraint approach to pivot-based bilingual dictionary induction. ACM Trans. Asian Low-Resour. Lang. Inf. Proc 15, 1 (Nov. 2015). DOI:https://doi.org/10.1145/2723144 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jia Yu, R. Buyya, and Chen Khong Tham. 2005. Cost-based scheduling of scientific workflow applications on utility grids. In Proceedings of the 1st International Conference on e-Science and Grid Computing (e-Science’05). DOI:https://doi.org/10.1109/E-SCIENCE.2005.26 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Plan Optimization to Bilingual Dictionary Induction for Low-resource Language Families

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader

                HTML Format

                View this article in HTML Format .

                View HTML Format
                About Cookies On This Site

                We use cookies to ensure that we give you the best experience on our website.

                Learn more

                Got it!