Abstract
Creating bilingual dictionary is the first crucial step in enriching low-resource languages. Especially for the closely related ones, it has been shown that the constraint-based approach is useful for inducing bilingual lexicons from two bilingual dictionaries via the pivot language. However, if there are no available machine-readable dictionaries as input, we need to consider manual creation by bilingual native speakers. To reach a goal of comprehensively create multiple bilingual dictionaries, even if we already have several existing machine-readable bilingual dictionaries, it is still difficult to determine the execution order of the constraint-based approach to reducing the total cost. Plan optimization is crucial in composing the order of bilingual dictionaries creation with the consideration of the methods and their costs. We formalize the plan optimization for creating bilingual dictionaries by utilizing Markov Decision Process (MDP) with the goal to get a more accurate estimation of the most feasible optimal plan with the least total cost before fully implementing the constraint-based bilingual lexicon induction. We model a prior beta distribution of bilingual lexicon induction precision with language similarity and polysemy of the topology as \(\) and \(\) parameters. It is further used to model cost function and state transition probability. We estimated the cost of all investment plans as a baseline for evaluating the proposed MDP-based approach with total cost as an evaluation metric. After utilizing the posterior beta distribution in the first batch of experiments to construct the prior beta distribution in the second batch of experiments, the result shows 61.5% of cost reduction compared to the estimated all investment plans and 39.4% of cost reduction compared to the estimated MDP optimal plan. The MDP-based proposal outperformed the baseline on the total cost.
- Carlos Ansótegui, María Luisa Bonet, and Jordi Levy. 2009. Solving (weighted) partial MaxSAT through satisfiability testing. In Proceedings of the International Conference on Theory and Applications of Satisfiability Testing (SAT’09). Springer, 427–440. Google Scholar
Digital Library
- Richard Bellman. 2013. Dynamic Programming. Courier Corporation.Google Scholar
Digital Library
- Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Fredrick Jelinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin. 1990. A statistical approach to machine translation. Comput. Ling. 16, 2 (1990), 79–85. Google Scholar
Digital Library
- P. Doshi, R. Goodwin, R. Akkiraju, and K. Verma. 2004. Dynamic workflow composition using Markov decision processes. In Proceedings of the IEEE International Conference on Web Services. 576–582. DOI:https://doi.org/10.1109/ICWS.2004.1314784 Google Scholar
Digital Library
- Javier Fente, Kraig Knutson, and Cliff Schexnayder. 1999. Defining a beta distribution function for construction simulation. In Proceedings of the 31st Conference on Winter Simulation: Simulation—A Bridge to the Future - Volume 2 (WSC’99). ACM, New York, NY, 1010–1015. DOI:https://doi.org/10.1145/324898.324983 Google Scholar
Digital Library
- Pascale Fung. 1995. Compiling bilingual lexicon entries from a non-parallel English-Chinese corpus. In Proceedings of the 3rd Workshop on Very Large Corpora. 173–183.Google Scholar
- Pascale Fung. 1998. A statistical view on bilingual lexicon extraction: From parallel corpora to non-parallel corpora. In Machine Translation and the Information Soup. Springer, 1–17. Google Scholar
Digital Library
- Arjun K. Gupta and Saralees Nadarajah. 2004. Handbook of Beta Distribution and Its Applications. CRC Press.Google Scholar
- Eric W. Holman, Cecil H. Brown, Søren Wichmann, André Müller, Viveka Velupillai, Harald Hammarström, Sebastian Sauppe, Hagen Jung, Dik Bakker, Pamela Brown et al. 2011. Automated dating of the world’s language families based on lexical similarity. Curr. Anthrop. 52, 6 (2011), 841–875.Google Scholar
Cross Ref
- Ronald A Howard. 1960. Dynamic Programming and Markov Processes. The MIT Press.Google Scholar
- Toru Ishida. 2016. Intercultural collaboration and support systems: A brief history. In Proceedings of the International Conference on Principles and Practice of Multi-agent Systems (PRIMA’16). Springer, 3–19.Google Scholar
Cross Ref
- T. Ishida, Y. Murakami, D. Lin, T. Nakaguchi, and M. Otani. 2018. Language service infrastructure on the web: The language grid. Computer 51, 6 (June 2018), 72–81. DOI:https://doi.org/10.1109/MC.2018.2701643Google Scholar
Cross Ref
- M. Paul Lewis, Gary F. Simons, and Charles D. Fennig (Eds.). 2015. Ethnologue: Languages of the World (18th ed.). SIL International, Dallas, Texas. Retrieved from http://www.ethnologue.com.Google Scholar
- Yohei Murakami. 2019. Indonesia language sphere: An ecosystem for dictionary development for low-resource languages. In Journal of Physics: Conference Series, Vol. 1192. IOP Publishing, 012001.Google Scholar
- Arbi Haza Nasution. 2018. Pivot-based hybrid machine translation to support multilingual communication for closely related languages. World Trans. Eng. Technol. Educ. 16, 2 (2018), 12–17.Google Scholar
- Arbi Haza Nasution, Evizal Abdul Kadir, Yohei Murakami, and Toru Ishida. 2020. Toward Formalization of Comprehensive Bilingual Dictionaries Creation Planning as Constraint Optimization Problem. Springer Singapore, 41–54. DOI:https://doi.org/10.1007/978-981-15-2655-8_3Google Scholar
- Arbi Haza Nasution, Yohei Murakami, and Toru Ishida. 2016. Constraint-based bilingual lexicon induction for closely related languages. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). 3291–3298. Google Scholar
- Arbi Haza Nasution, Yohei Murakami, and Toru Ishida. 2017. A generalized constraint approach to bilingual dictionary induction for low-resource language families. ACM Trans. Asian Low-Resour. Lang. Inf. Proc. 17, 2 (Nov. 2017). DOI:https://doi.org/10.1145/3138815 Google Scholar
Digital Library
- Arbi Haza Nasution, Yohei Murakami, and Toru Ishida. 2017. Plan optimization for creating bilingual dictionaries of low-resource languages. In Proceedings of the International Conference on Culture and Computing. 35–41. DOI:https://doi.org/10.1109/Culture.and.Computing.2017.21Google Scholar
Cross Ref
- Arbi Haza Nasution, Yohei Murakami, and Toru Ishida. 2018. Designing a collaborative process to create bilingual dictionaries of indonesian ethnic languages. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’18). European Language Resources Association (ELRA), 3397–3404. Google Scholar
- Arbi Haza Nasution, Yohei Murakami, and Toru Ishida. 2019. Generating similarity cluster of Indonesian languages with semi-supervised clustering. Int. J. Electric. Comput. Eng. 9, 1 (2019), 1–8.Google Scholar
- Arbi Haza Nasution, Nesi Syafitri, Panji Rahmat Setiawan, and Des Suryani. 2017. Pivot-based hybrid machine translation to support multilingual communication. In Proceedings of the International Conference on Culture and Computing. 147–148. DOI:https://doi.org/10.1109/Culture.and.Computing.2017.22Google Scholar
Cross Ref
- Reinhard Rapp. 1995. Identifying word translations in non-parallel texts. In Proceedings of the 33rd Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 320–322. Google Scholar
Digital Library
- Stuart J. Russell and Peter Norvig. 2016. Artificial Intelligence: A Modern Approach. Pearson Education Limited. Google Scholar
Digital Library
- Stephen Soderland, Oren Etzioni, Daniel S. Weld, Michael Skinner, Jeff Bilmes, et al. 2009. Compiling a massive, multilingual dictionary via probabilistic inference. In Proceedings of the Joint Conference of the 47th Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1. Association for Computational Linguistics, 262–270. Google Scholar
Digital Library
- Morris Swadesh. 1955. Towards greater accuracy in lexicostatistic dating. Int. J. Amer. Ling. 21, 2 (1955), 121–137.Google Scholar
Cross Ref
- Kumiko Tanaka and Kyoji Umemura. 1994. Construction of a bilingual dictionary intermediated by a third language. In Proceedings of the 15th Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 297–303. Google Scholar
Digital Library
- Douglas J. White. 1993. A survey of applications of Markov decision processes. J. Oper. Res. Soc. 44, 11 (1993), 1073–1096.Google Scholar
Cross Ref
- Mairidan Wushouer, Donghui Lin, Toru Ishida, and Katsutoshi Hirayama. 2015. A constraint approach to pivot-based bilingual dictionary induction. ACM Trans. Asian Low-Resour. Lang. Inf. Proc 15, 1 (Nov. 2015). DOI:https://doi.org/10.1145/2723144 Google Scholar
Digital Library
- Jia Yu, R. Buyya, and Chen Khong Tham. 2005. Cost-based scheduling of scientific workflow applications on utility grids. In Proceedings of the 1st International Conference on e-Science and Grid Computing (e-Science’05). DOI:https://doi.org/10.1109/E-SCIENCE.2005.26 Google Scholar
Digital Library
Index Terms
Plan Optimization to Bilingual Dictionary Induction for Low-resource Language Families
Recommendations
A Generalized Constraint Approach to Bilingual Dictionary Induction for Low-Resource Language Families
The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction a difficult task for low-resource languages. The pivot language and cognate recognition approaches have been proven useful for inducing bilingual lexicons for such ...
A Constraint Approach to Pivot-Based Bilingual Dictionary Induction
High-quality bilingual dictionaries are very useful, but such resources are rarely available for lower-density language pairs, especially for those that are closely related. Using a third language to link two other languages is a well-known solution and ...
Simple measures of bridging lexical divergence help unsupervised neural machine translation for low-resource languages
AbstractUnsupervised Neural Machine Translation (UNMT) approaches have gained widespread popularity in recent times. Though these approaches show impressive translation performance using only monolingual corpora of the languages involved, these approaches ...






Comments