skip to main content
note

A Rule-Based Kurdish Text Transliteration System

Published:18 January 2019Publication History
Skip Abstract Section

Abstract

In this article, we present a rule-based approach for transliterating two of the most used orthographies in Sorani Kurdish. Our work consists of detecting a character in a word by removing the possible ambiguities and mapping it into the target orthography. We describe different challenges in Kurdish text mining and propose novel ideas concerning the transliteration task for Sorani Kurdish. Our transliteration system, named Wergor, achieves 82.79% overall precision and more than 99% in detecting the double-usage characters. We also present a manually transliterated corpus for Kurdish.

References

  1. Stefan Sperl and Philip G. Kreyenbroek. 2005. The Kurdish Question: A Historical Review. Routledge. 17--34.Google ScholarGoogle Scholar
  2. Hossein Hassani and Dzejla Medjedovic. 2016. Automatic Kurdish dialects identification. Computer Science 8 Information Technology 6.2 (2016), 61--78.Google ScholarGoogle Scholar
  3. Laurent Besacier, Etienne Barnard, Alexey Karpov, and Tanja Schultz. 2014. Automatic speech recognition for under-resourced languages: A survey. Speech Communication 56 (2014), 85--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Kurdish Academy of Language. {n.d.}. Kurdish Unified Alphabet. Retrieved May 24, 2018 from http://www.kurdishacademy.org/?q=node/2.Google ScholarGoogle Scholar
  5. Amir Hassanpour. 1992. Nationalism and Language in Kurdistan, 1918-1985. Edwin Mellen Pr.Google ScholarGoogle Scholar
  6. Kevin Knight and Jonathan Graehl. 1998. Machine transliteration. Comput. Linguist. 24, 4 (Dec. 1998), 599--612. http://dl.acm.org/citation.cfm?id=972764.972767 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Kyumars Sheykh Esmaili. 2012. Challenges in Kurdish text processing. arXiv preprint arXiv:1212.0074 (2012).Google ScholarGoogle Scholar
  8. Kyumars Sheykh Esmaili, Shahin Salavati, and Anwitaman Datta. 2014. Towards Kurdish information retrieval. ACM Transactions on Asian Language Information Processing (TALIP) 13, 2 (2014), 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Purya Aliabadi, Sina Ahmadi, Shahin Salavati, and Kyumars Sheykh Esmaili. 2014. Towards building kurdnet, the Kurdish wordnet. In Proceedings of the 7th Global WordNet Conference (GWC’14). 1--6.Google ScholarGoogle Scholar
  10. Hossein Hassani. 2017. Kurdish interdialect machine translation. VarDial 2017 (2017), 63.Google ScholarGoogle Scholar
  11. Yaser Al-Onaizan and Kevin Knight. 2002. Machine transliteration of names in Arabic text. In Proceedings of the ACL-02 Workshop on Computational Approaches to Semitic Languages. Association for Computational Linguistics, 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mehdi M. Kashani, Fred Popowich, and Anoop Sarkar. 2007. Automatic transliteration of proper nouns from Arabic to English. In Proceedings of the 2nd Workshop on Computational Approaches to Arabic Script-based Languages. 275--282.Google ScholarGoogle Scholar
  13. Bonnie Glover Stalls and Kevin Knight. 1998. Translating names and technical terms in Arabic text. In Proceedings of the Workshop on Computational Approaches to Semitic Languages. Association for Computational Linguistics, 34--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Vladimir Pervouchine, Haizhou Li, and Bo Lin. 2009. Transliteration alignment. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1. Association for Computational Linguistics, 136--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hassan Sajjad, Helmut Schmid, Alexander Fraser, and Hinrich Schütze. 2017. Statistical models for unsupervised, semi-supervised, and supervised transliteration mining. Computational Linguistics 43, 2 (2017), 349--375.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hassan Sajjad, Nadir Durrani, Helmut Schmid, and Alexander Fraser. 2011. Comparing two techniques for learning transliteration models using a parallel corpus. In Proceedings of 5th International Joint Conference on Natural Language Processing. 129--137.Google ScholarGoogle Scholar
  17. Sara Noeman and Amgad Madkour. 2010. Language independent transliteration mining system using finite state automata framework. In Proceedings of the 2010 Named Entities Workshop. Association for Computational Linguistics, 57--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kurdish Academy of Language. {n.d.}. Orthography, standardization and unification. Retrieved May 24, 2018 from http://www.kurdishacademy.org/?q=node/499.Google ScholarGoogle Scholar
  19. Kurdish Academy of Language. {n.d.}. Kurdish Orthography, a historical view. Retrieved May 24, 2018 from http://www.kurdishacademy.org/?q=node/116.Google ScholarGoogle Scholar
  20. Taufiq Wahby. 1929. Desturî zimanî kurdî (Grammar of Kurdish Language). Al-Haditha Publishers, Baghdad.Google ScholarGoogle Scholar
  21. Abdurrahman Sharafkandi. 1991. Henbane Borîne (Kurdish-Kurdish-Persian Dictionary). Soroush Pub.Google ScholarGoogle Scholar
  22. W. M. Thackston. 2006. Sorani Kurdish: A reference grammar with selected readings. Harvard University. Department of Near Eastern Languages 8 Civilizations (2006).Google ScholarGoogle Scholar
  23. W. M. Thackston. 2006. Kurmanji Kurdish: A Reference Grammar with Selected Readings. Harvard University. Department of Near Eastern Languages 8 Civilizations.Google ScholarGoogle Scholar
  24. Roger Lescot Emir Djeladet Bedir Khan. 1970. Grammaire kurde: dialecte kurmandji. Librairie d’Amérique et d’Orient.Google ScholarGoogle Scholar
  25. Joyce Blau. 2000. Manuel de kurde: Sorani. L’Harmattan.Google ScholarGoogle Scholar
  26. Ernest N. McCarus. 1958. A Kurdish grammar: Descriptive analysis of the Kurdish of Sulaimaniya, Iraq. American Council of Learned Societies Program in Oriental Languages Publications Series B-Aids, 10 (1958).Google ScholarGoogle Scholar
  27. Joyce Blau and Veysi Barak. 1999. Manuel de kurde: kurmanji. Editions L’Harmattan.Google ScholarGoogle Scholar

Index Terms

  1. A Rule-Based Kurdish Text Transliteration System

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 18, Issue 2
        June 2019
        208 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/3300146
        Issue’s Table of Contents

        Copyright © 2019 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 January 2019
        • Accepted: 1 September 2018
        • Revised: 1 June 2018
        • Received: 1 August 2017
        Published in tallip Volume 18, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • note
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!