Abstract
In this article, we present a rule-based approach for transliterating two of the most used orthographies in Sorani Kurdish. Our work consists of detecting a character in a word by removing the possible ambiguities and mapping it into the target orthography. We describe different challenges in Kurdish text mining and propose novel ideas concerning the transliteration task for Sorani Kurdish. Our transliteration system, named Wergor, achieves 82.79% overall precision and more than 99% in detecting the double-usage characters. We also present a manually transliterated corpus for Kurdish.
- Stefan Sperl and Philip G. Kreyenbroek. 2005. The Kurdish Question: A Historical Review. Routledge. 17--34.Google Scholar
- Hossein Hassani and Dzejla Medjedovic. 2016. Automatic Kurdish dialects identification. Computer Science 8 Information Technology 6.2 (2016), 61--78.Google Scholar
- Laurent Besacier, Etienne Barnard, Alexey Karpov, and Tanja Schultz. 2014. Automatic speech recognition for under-resourced languages: A survey. Speech Communication 56 (2014), 85--100. Google Scholar
Digital Library
- Kurdish Academy of Language. {n.d.}. Kurdish Unified Alphabet. Retrieved May 24, 2018 from http://www.kurdishacademy.org/?q=node/2.Google Scholar
- Amir Hassanpour. 1992. Nationalism and Language in Kurdistan, 1918-1985. Edwin Mellen Pr.Google Scholar
- Kevin Knight and Jonathan Graehl. 1998. Machine transliteration. Comput. Linguist. 24, 4 (Dec. 1998), 599--612. http://dl.acm.org/citation.cfm?id=972764.972767 Google Scholar
Digital Library
- Kyumars Sheykh Esmaili. 2012. Challenges in Kurdish text processing. arXiv preprint arXiv:1212.0074 (2012).Google Scholar
- Kyumars Sheykh Esmaili, Shahin Salavati, and Anwitaman Datta. 2014. Towards Kurdish information retrieval. ACM Transactions on Asian Language Information Processing (TALIP) 13, 2 (2014), 7. Google Scholar
Digital Library
- Purya Aliabadi, Sina Ahmadi, Shahin Salavati, and Kyumars Sheykh Esmaili. 2014. Towards building kurdnet, the Kurdish wordnet. In Proceedings of the 7th Global WordNet Conference (GWC’14). 1--6.Google Scholar
- Hossein Hassani. 2017. Kurdish interdialect machine translation. VarDial 2017 (2017), 63.Google Scholar
- Yaser Al-Onaizan and Kevin Knight. 2002. Machine transliteration of names in Arabic text. In Proceedings of the ACL-02 Workshop on Computational Approaches to Semitic Languages. Association for Computational Linguistics, 1--13. Google Scholar
Digital Library
- Mehdi M. Kashani, Fred Popowich, and Anoop Sarkar. 2007. Automatic transliteration of proper nouns from Arabic to English. In Proceedings of the 2nd Workshop on Computational Approaches to Arabic Script-based Languages. 275--282.Google Scholar
- Bonnie Glover Stalls and Kevin Knight. 1998. Translating names and technical terms in Arabic text. In Proceedings of the Workshop on Computational Approaches to Semitic Languages. Association for Computational Linguistics, 34--41. Google Scholar
Digital Library
- Vladimir Pervouchine, Haizhou Li, and Bo Lin. 2009. Transliteration alignment. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1. Association for Computational Linguistics, 136--144. Google Scholar
Digital Library
- Hassan Sajjad, Helmut Schmid, Alexander Fraser, and Hinrich Schütze. 2017. Statistical models for unsupervised, semi-supervised, and supervised transliteration mining. Computational Linguistics 43, 2 (2017), 349--375.Google Scholar
Digital Library
- Hassan Sajjad, Nadir Durrani, Helmut Schmid, and Alexander Fraser. 2011. Comparing two techniques for learning transliteration models using a parallel corpus. In Proceedings of 5th International Joint Conference on Natural Language Processing. 129--137.Google Scholar
- Sara Noeman and Amgad Madkour. 2010. Language independent transliteration mining system using finite state automata framework. In Proceedings of the 2010 Named Entities Workshop. Association for Computational Linguistics, 57--61. Google Scholar
Digital Library
- Kurdish Academy of Language. {n.d.}. Orthography, standardization and unification. Retrieved May 24, 2018 from http://www.kurdishacademy.org/?q=node/499.Google Scholar
- Kurdish Academy of Language. {n.d.}. Kurdish Orthography, a historical view. Retrieved May 24, 2018 from http://www.kurdishacademy.org/?q=node/116.Google Scholar
- Taufiq Wahby. 1929. Desturî zimanî kurdî (Grammar of Kurdish Language). Al-Haditha Publishers, Baghdad.Google Scholar
- Abdurrahman Sharafkandi. 1991. Henbane Borîne (Kurdish-Kurdish-Persian Dictionary). Soroush Pub.Google Scholar
- W. M. Thackston. 2006. Sorani Kurdish: A reference grammar with selected readings. Harvard University. Department of Near Eastern Languages 8 Civilizations (2006).Google Scholar
- W. M. Thackston. 2006. Kurmanji Kurdish: A Reference Grammar with Selected Readings. Harvard University. Department of Near Eastern Languages 8 Civilizations.Google Scholar
- Roger Lescot Emir Djeladet Bedir Khan. 1970. Grammaire kurde: dialecte kurmandji. Librairie d’Amérique et d’Orient.Google Scholar
- Joyce Blau. 2000. Manuel de kurde: Sorani. L’Harmattan.Google Scholar
- Ernest N. McCarus. 1958. A Kurdish grammar: Descriptive analysis of the Kurdish of Sulaimaniya, Iraq. American Council of Learned Societies Program in Oriental Languages Publications Series B-Aids, 10 (1958).Google Scholar
- Joyce Blau and Veysi Barak. 1999. Manuel de kurde: kurmanji. Editions L’Harmattan.Google Scholar
Index Terms
A Rule-Based Kurdish Text Transliteration System
Recommendations
Transliteration of Arabizi into Arabic Script for Tunisian Dialect
The evolution of information and communication technology has markedly influenced communication between correspondents. This evolution has facilitated the transmission of information and has engendered new forms of written communication (email, chat, ...
Transliteration for Resource-Scarce Languages
Today, parallel corpus-based systems dominate the transliteration landscape. But the resource-scarce languages do not enjoy the luxury of large parallel transliteration corpus. For these languages, rule-based transliteration is the only viable option. ...
Forward-backward Transliteration of Punjabi Gurmukhi Script Using N-gram Language Model
Transliterating the text of a language to a foreign script is called forward transliteration and transliterating the text back to the original script is called backward transliteration. In this work, we perform both forward as well as backward ...






Comments