Abstract
Transliterating the text of a language to a foreign script is called forward transliteration and transliterating the text back to the original script is called backward transliteration. In this work, we perform both forward as well as backward transliteration on Punjabi. We transliterate Punjabi person names from Gurmukhi script to English Roman script and from English Roman script back to Gurmukhi script using n-gram language model. We used more than one million parallel entities of person names in Gurmukhi and Roman script as the training corpus. We generated English to Punjabi and Punjabi to English n-grams databases from the corpus. To get better results, we tried to create as long n-grams as possible ranging from bi-gram to 30-gram. Our n-grams database contains more than 10 million n-grams, with each n-gram having multiple mappings of the other script. The most challenging part is to find the mapping for the given n-gram from the parallel name entity while creating n-grams databases. As per the orthography rules, the same combination of letters may have different pronunciation, depending upon its location in the word. Therefore, we categorized n-grams into starting, middle, and ending n-grams and used them accordingly in the transliteration process. The transliteration process works like the merge sort. We start searching the longest possible n-gram in the database and split the string recursively until the match is found. The transliterated strings are merged back to form the final output. In English to Punjabi transliteration, we achieved 96% accuracy using gold standard and 99.14% accuracy using minimum edit distance. In Punjabi to English transliteration, the result showed 96.85% and 99.35% accuracy for the gold standard and minimum edit distance, respectively.
- [1] . 2003. Statistical transliteration for English-Arabic cross language information retrieval. In Proceedings of the 12th International Conference on Information and Knowledge Management. ACM, 139–146.Google Scholar
- [2] . 1994. Test Suites: Some Issues in Their Use and Design. Citeseer.Google Scholar
- [3] . 2013. Rule based transliteration scheme for English to Punjabi. Int. J. Nat. Lang. Comput. 2, 2 (2013).Google Scholar
- [4] . 2021. Machine transliteration using SVM and HMM. Int. J. Adv. Intell. Parad. 19, 1 (2021), 3–27.Google Scholar
Digital Library
- [5] . 2010. Transliteration for resource-scarce languages. ACM Trans. Asian Lang. Inf. Process. 9, 4 (2010), 14.Google Scholar
Digital Library
- [6] . 2011. Hybrid approach for Punjabi to English transliteration system. Int. J. Comput. Applic. 28, 1 (2011), 0975–8887.Google Scholar
Cross Ref
- [7] . 2011. Development of a Punjabi to English transliteration system. Int. J. Comput. Sci. Commun. 2, 2 (2011), 521–526.Google Scholar
- [8] . 2012. Hindi and Marathi to English NE transliteration tool using phonology and stress analysis. In Proceedings of the International Conference on Computational Linguistics. 111–118.Google Scholar
- [9] . 2003. Rationale for a multilingual corpus for machine translation evaluation. In Proceedings of the International Conference on Corpus Linguistics. 191–200.Google Scholar
- [10] . 2009. Hindi-Punjabi machine transliteration system (for machine translation system). George Ronchi Found. J., Italy 64, 1 (2009).Google Scholar
- [11] . 2010. Web based Hindi to Punjabi machine translation system. J. Emerg. Technol. Web Intell. 2, 2 (2010), 148–151.Google Scholar
- [12] . 1991. Preliminaries to the development of evaluation metrics for natural language semantic and pragmatic analysis systems. In Proceedings of the Natural Language Processing Systems Evaluation Workshop. 97.Google Scholar
- [13] . 2011. Punjabi to Hindi statistical machine transliteration. Int. J. Inf. Technol. Knowl. Manag. 4, 2 (2011), 459–463.Google Scholar
- [14] . 2010. A Punjabi to Hindi machine transliteration system. Int. J. Computat. Ling. Chinese Lang. Process. 15, 2 (2010).Google Scholar
- [15] . 2018. Punjabi to English machine transliteration for proper nouns. In Proceedings of the 3rd International Conference on Internet of Things: Smart Innovation and Usages (IoT-SIU). IEEE, 1–7.Google Scholar
Cross Ref
- [16] . 2015. English to Punjabi script converter system for proper nouns using hybrid approach. Int. J. Sci. Res. Manag. 3, 3 (2015).Google Scholar
- [17] . 2014. Review of machine transliteration techniques. Int. J. Comput. Applic. 107, 20 (2014).Google Scholar
Cross Ref
- [18] . 2000. An enhancement of thai text retrieval efficiency by automatic backward transliteration. In Proceedings of the 7th International Workshop on Academic Information Networks and Systems, Bangkok, Thailand. 73–84.Google Scholar
- [19] . 1998. Machine transliteration. Computat. Ling. 24, 4 (1998), 599–612.Google Scholar
Digital Library
- [20] . 2021. An ensemble of grapheme and phoneme-based models for automatic English to Kannada back-transliteration. Int. J. Intell. Sustain. Comput. 1, 2 (2021), 138–150.Google Scholar
- [21] . 1998. English to Korean statistical transliteration for information retrieval. Comput. Process. Orient. Lang. 12, 1 (1998), 17–37.Google Scholar
- [22] . 2009. A Gurmukhi to Shahmukhi transliteration system. In Proceedings of the 7th International Conference on Natural Language Processing. 167–173.Google Scholar
- [23] . 2012. Conversion between scripts of Punjabi: Beyond simple transliteration. In Proceedings of the International Conference on Computational Linguistics. 633–642.Google Scholar
- [24] . 2006. Punjabi machine transliteration. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1137–1144.Google Scholar
Digital Library
- [25] . 2009. Transliteration system using pair HMM with weighted FSTs. In Proceedings of the Named Entities Workshop: Shared Task on Transliteration. Association for Computational Linguistics, 100–103.Google Scholar
Cross Ref
- [26] . 2002. An English-Korean transliteration model using pronunciation and contextual rules. In Proceedings of the 19th International Conference on Computational Linguistics. Association for Computational Linguistics, 1–7.Google Scholar
Digital Library
- [27] . 2006. A machine transliteration model based on correspondence between graphemes and phonemes. ACM Trans. Asian Lang. Inf. Process. 5, 3 (2006), 185–208.Google Scholar
Digital Library
- [28] . 2012. Punjabi phonetic: Punjabi text to IPA conversion. Int. J. Emerg. Technol. Adv. Eng. Retrieved from www.ijetae.com.Google Scholar
- [29] . 2008. Shahmukhi to Gurmukhi transliteration system: A corpus based approach. Res. Comput. Sci. 33 (2008), 151–162.Google Scholar
- [30] . 2015. Conversion of Punjabi text to Ipa using phonetic symbols. Inte. J. Techno. Res. Eng. 2, 12 (2015).Google Scholar
- [31] . 1998. Translating names and technical terms in Arabic text. In Proceedings of the Workshop on Computational Approaches to Semitic Languages. Association for Computational Linguistics, 34–41.Google Scholar
Digital Library
- [32] . 2003. Transliteration of proper names in cross-lingual information retrieval. In Proceedings of the ACL Workshop on Multilingual and Mixed-language Named Entity Recognition. Association for Computational Linguistics, 57–64.Google Scholar
Digital Library
Index Terms
Forward-backward Transliteration of Punjabi Gurmukhi Script Using N-gram Language Model
Recommendations
Punjabi to ISO 15919 and Roman Transliteration with Phonetic Rectification
Transliteration removes the script barriers. Unfortunately, Punjabi is written in four different scripts, i.e., Gurmukhi, Shahmukhi, Devnagri, and Latin. The Latin script is understandable for nearly all factions of the Punjabi community. The objective ...
Punjabi Stop Words: A Gurmukhi, Shahmukhi and Roman Scripted Chronicle
WIR '16: Proceedings of the ACM Symposium on Women in Research 2016With advent of Unicode encoding, Punjabi language content, written using gurmukhi script as well as in shahmukhi script, is increasing day by day on internet. Processing textual information involves passing it to various pre-processing phases. Stop-word ...
Punjabi machine transliteration
ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational LinguisticsMachine Transliteration is to transcribe a word written in a script with approximate phonetic equivalence in another language. It is useful for machine translation, cross-lingual information retrieval, multilingual text and speech processing. Punjabi ...






Comments