skip to main content
research-article

Forward-backward Transliteration of Punjabi Gurmukhi Script Using N-gram Language Model

Published:27 December 2022Publication History
Skip Abstract Section

Abstract

Transliterating the text of a language to a foreign script is called forward transliteration and transliterating the text back to the original script is called backward transliteration. In this work, we perform both forward as well as backward transliteration on Punjabi. We transliterate Punjabi person names from Gurmukhi script to English Roman script and from English Roman script back to Gurmukhi script using n-gram language model. We used more than one million parallel entities of person names in Gurmukhi and Roman script as the training corpus. We generated English to Punjabi and Punjabi to English n-grams databases from the corpus. To get better results, we tried to create as long n-grams as possible ranging from bi-gram to 30-gram. Our n-grams database contains more than 10 million n-grams, with each n-gram having multiple mappings of the other script. The most challenging part is to find the mapping for the given n-gram from the parallel name entity while creating n-grams databases. As per the orthography rules, the same combination of letters may have different pronunciation, depending upon its location in the word. Therefore, we categorized n-grams into starting, middle, and ending n-grams and used them accordingly in the transliteration process. The transliteration process works like the merge sort. We start searching the longest possible n-gram in the database and split the string recursively until the match is found. The transliterated strings are merged back to form the final output. In English to Punjabi transliteration, we achieved 96% accuracy using gold standard and 99.14% accuracy using minimum edit distance. In Punjabi to English transliteration, the result showed 96.85% and 99.35% accuracy for the gold standard and minimum edit distance, respectively.

REFERENCES

  1. [1] Jaleel Nasreen Abdul and Larkey Leah S.. 2003. Statistical transliteration for English-Arabic cross language information retrieval. In Proceedings of the 12th International Conference on Information and Knowledge Management. ACM, 139146.Google ScholarGoogle Scholar
  2. [2] Balkan Lorna. 1994. Test Suites: Some Issues in Their Use and Design. Citeseer.Google ScholarGoogle Scholar
  3. [3] Bhalla Deepti, Joshi Nisheeth, and Mathur Iti. 2013. Rule based transliteration scheme for English to Punjabi. Int. J. Nat. Lang. Comput. 2, 2 (2013).Google ScholarGoogle Scholar
  4. [4] Chatterjee Soma and Sarkar Kamal. 2021. Machine transliteration using SVM and HMM. Int. J. Adv. Intell. Parad. 19, 1 (2021), 327.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Chinnakotla Manoj K., Damani Om P., and Satoskar Avijit. 2010. Transliteration for resource-scarce languages. ACM Trans. Asian Lang. Inf. Process. 9, 4 (2010), 14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Deep Kamal and Goyal Dr. Vishal. 2011. Hybrid approach for Punjabi to English transliteration system. Int. J. Comput. Applic. 28, 1 (2011), 09758887.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Deep Kamal and Goyal Vishal. 2011. Development of a Punjabi to English transliteration system. Int. J. Comput. Sci. Commun. 2, 2 (2011), 521526.Google ScholarGoogle Scholar
  8. [8] Dhore Manikrao, Dixit Shantanu, and Dhore Ruchi. 2012. Hindi and Marathi to English NE transliteration tool using phonology and stress analysis. In Proceedings of the International Conference on Computational Linguistics. 111118.Google ScholarGoogle Scholar
  9. [9] Elliott Debbie, Hartley Anthony, and Atwell E. S.. 2003. Rationale for a multilingual corpus for machine translation evaluation. In Proceedings of the International Conference on Corpus Linguistics. 191200.Google ScholarGoogle Scholar
  10. [10] Goyal Vishal and Lehal Gurpreet Singh. 2009. Hindi-Punjabi machine transliteration system (for machine translation system). George Ronchi Found. J., Italy 64, 1 (2009).Google ScholarGoogle Scholar
  11. [11] Goyal Vishal and Lehal Gurpreet Singh. 2010. Web based Hindi to Punjabi machine translation system. J. Emerg. Technol. Web Intell. 2, 2 (2010), 148151.Google ScholarGoogle Scholar
  12. [12] Hoard James E.. 1991. Preliminaries to the development of evaluation metrics for natural language semantic and pragmatic analysis systems. In Proceedings of the Natural Language Processing Systems Evaluation Workshop. 97.Google ScholarGoogle Scholar
  13. [13] Josan Gurpreet Singh and Kaur Jagroop. 2011. Punjabi to Hindi statistical machine transliteration. Int. J. Inf. Technol. Knowl. Manag. 4, 2 (2011), 459463.Google ScholarGoogle Scholar
  14. [14] Josan Gurpreet Singh and Lehal Gurpreet Singh. 2010. A Punjabi to Hindi machine transliteration system. Int. J. Computat. Ling. Chinese Lang. Process. 15, 2 (2010).Google ScholarGoogle Scholar
  15. [15] Kaur Arshveer and Goyal Vishal. 2018. Punjabi to English machine transliteration for proper nouns. In Proceedings of the 3rd International Conference on Internet of Things: Smart Innovation and Usages (IoT-SIU). IEEE, 17.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Kaur Devinder and Kaur Rishamjot. 2015. English to Punjabi script converter system for proper nouns using hybrid approach. Int. J. Sci. Res. Manag. 3, 3 (2015).Google ScholarGoogle Scholar
  17. [17] Kaur Kamaljeet and Singh Parminder. 2014. Review of machine transliteration techniques. Int. J. Comput. Applic. 107, 20 (2014).Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Khantonthong Navapat, Kawtrakul Asanee, and Poovarawan Yuen. 2000. An enhancement of thai text retrieval efficiency by automatic backward transliteration. In Proceedings of the 7th International Workshop on Academic Information Networks and Systems, Bangkok, Thailand. 73–84.Google ScholarGoogle Scholar
  19. [19] Knight Kevin and Graehl Jonathan. 1998. Machine transliteration. Computat. Ling. 24, 4 (1998), 599612.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Lakshmi B. S. Sowmya and Shambhavi B. R.. 2021. An ensemble of grapheme and phoneme-based models for automatic English to Kannada back-transliteration. Int. J. Intell. Sustain. Comput. 1, 2 (2021), 138150.Google ScholarGoogle Scholar
  21. [21] Lee Jae Sung and Choi Key-Sun. 1998. English to Korean statistical transliteration for information retrieval. Comput. Process. Orient. Lang. 12, 1 (1998), 1737.Google ScholarGoogle Scholar
  22. [22] Lehal Gurpreet Singh. 2009. A Gurmukhi to Shahmukhi transliteration system. In Proceedings of the 7th International Conference on Natural Language Processing. 167173.Google ScholarGoogle Scholar
  23. [23] Lehal Gurpreet Singh and Saini Tejinder Singh. 2012. Conversion between scripts of Punjabi: Beyond simple transliteration. In Proceedings of the International Conference on Computational Linguistics. 633642.Google ScholarGoogle Scholar
  24. [24] Malik Muhammad G.. 2006. Punjabi machine transliteration. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 11371144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Nabende Peter. 2009. Transliteration system using pair HMM with weighted FSTs. In Proceedings of the Named Entities Workshop: Shared Task on Transliteration. Association for Computational Linguistics, 100103.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Oh Jong-Hoon and Choi Key-Sun. 2002. An English-Korean transliteration model using pronunciation and contextual rules. In Proceedings of the 19th International Conference on Computational Linguistics. Association for Computational Linguistics, 17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Oh Jong-Hoon, Choi Key-Sun, and Isahara Hitoshi. 2006. A machine transliteration model based on correspondence between graphemes and phonemes. ACM Trans. Asian Lang. Inf. Process. 5, 3 (2006), 185208.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Padda Sheilly, Kaur Rupinderdeep, and Nidhi. 2012. Punjabi phonetic: Punjabi text to IPA conversion. Int. J. Emerg. Technol. Adv. Eng. Retrieved from www.ijetae.com.Google ScholarGoogle Scholar
  29. [29] Saini Tejinder Singh and Lehal Gurpreet Singh. 2008. Shahmukhi to Gurmukhi transliteration system: A corpus based approach. Res. Comput. Sci. 33 (2008), 151162.Google ScholarGoogle Scholar
  30. [30] Kaur Samandeep and Singh Charanjiv. 2015. Conversion of Punjabi text to Ipa using phonetic symbols. Inte. J. Techno. Res. Eng. 2, 12 (2015).Google ScholarGoogle Scholar
  31. [31] Stalls Bonnie Glover and Knight Kevin. 1998. Translating names and technical terms in Arabic text. In Proceedings of the Workshop on Computational Approaches to Semitic Languages. Association for Computational Linguistics, 3441.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Virga Paola and Khudanpur Sanjeev. 2003. Transliteration of proper names in cross-lingual information retrieval. In Proceedings of the ACL Workshop on Multilingual and Mixed-language Named Entity Recognition. Association for Computational Linguistics, 5764.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Forward-backward Transliteration of Punjabi Gurmukhi Script Using N-gram Language Model

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 2
        February 2023
        624 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/3572719
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 December 2022
        • Online AM: 9 June 2022
        • Accepted: 31 May 2022
        • Revised: 11 May 2022
        • Received: 13 June 2021
        Published in tallip Volume 22, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)87
        • Downloads (Last 6 weeks)3

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!