skip to main content
research-article

A New Concept of Electronic Text Based on Semantic Coding System for Machine Translation

Published:02 November 2021Publication History
Skip Abstract Section

Abstract

In the field of machine translation of texts, the ambiguity in both lexical (dictionary) and structural aspects is still one of the difficult problems. Researchers in this field use different approaches, the most important of which is machine learning in its various types. The goal of the approach that we propose in this article is to define a new concept of electronic text, which makes the electronic text free from any lexical or structural ambiguity. We used a semantic coding system that relies on attaching the original electronic text (via the text editor interface) with the meanings intended by the author. The author defines the meaning desired for each word that can be a source of ambiguity. The proposed approach in this article can be used with any type of electronic text (text processing applications, web pages, email text, etc.). Thanks to the approach that we propose and through the experiments that we have conducted using it, we can obtain a very high accuracy rate. We can say that the problem of lexical and structural ambiguity can be completely solved. With this new concept of electronic text, the text file contains not only the text but also with it the true sense of the exact meaning intended by the writer in the form of symbols. These semantic symbols are used during machine translation to obtain a translated text completely free of any lexical and structural ambiguity.

REFERENCES

  1. [1] Mohammed O. and Madkour D.. 2011. Automatic Translation: Concept and Methods. Fayoum University, Egypt: Dar Elilm.Google ScholarGoogle Scholar
  2. [2] AL-Saiagh W., Tiun S., Al-Saffar A., Awang S., and Al-khaleefa A. S.. 2018. Metaheuristic for word sense disambiguation: A review. International Journal of Engineering & Technology 7, (3.20), (2018), 428434.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Navigli R.. 2009. Word sense disambiguation: A survey. ACM Computing Surveys 41, 2, 10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Mihalcea R. 2006. Knowledge-based methods for WSD. Word Sense Disambiguation: Algorithms and Applications (2006), 107131.Google ScholarGoogle Scholar
  5. [5] McCarthy D.. 2009. Word sense disambiguation: An overview. Language and Linguistics Compass 3, 2 (2009), 537558.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Veenstra J., Van den Bosch A., Buchholz S., and Daelemans W.. 2000. Memory-based word sense disambiguation. Computers and the Humanities 34, 1–2 (2000), 171177.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Martínez D.. 2007. Supervised corpus-based methods for WSD Word Sense Disambiguation Springer. In Text, Speech and Language Technology Book Series (TLTB'07), volume 33, Agirre Eneko, Edmonds Philip (Eds.). 167216.Google ScholarGoogle Scholar
  8. [8] Hoste V., Hendrickx I., Daelemans W., and van den Bosch A.. 2002. Parameter optimization for machine-learning of word sense disambiguation. Natural Language Engineering 8, 04 (2002), 311325.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Dinu G. and Kübler S.. 2007. Sometimes less is more: Romanian word sense disambiguation revisited. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP’07). Designed and Printed by INCOMA Ltd, 173177.Google ScholarGoogle Scholar
  10. [10] Jain Amita, Jain Minni, Jain Goonjan, and Tayal Devendra K.. 2019. “UTTAM”: An efficient spelling correction system for Hindi language based on supervised learning. ACM Transactions on Asian and Low-Resource Language Information Processing 18, 1, 8:1–8:26. DOI: https://doi.org/10.1145/3264620Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Phani Shanta, Lahiri Shibamouli, and Biswas Arindam. 2017. A supervised learning approach for authorship attribution of Bengali literary texts. ACM Transactions on Asian and Low-Resource Language Information Processing 16, 4, 28:1–28:15. DOI: https://doi.org/10.1145/3099473Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Berleant D.. 1995. Engineering “word experts” for word disambiguation. Natural Language Engineering 1, 04 (1995), 339362.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Młodzki R., Kopeć M., and Przepiórkowski A.. 2012. Word sense disambiguation in the national corpus of polish. Prace Filologiczne (LXIII). 155166.Google ScholarGoogle Scholar
  14. [14] Mihalcea R. F.. 2002. Word sense disambiguation with pattern learning and automatic feature selection. Natural Language Engineering 8, 04 (2002), 343358.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Martínez D., Agirre E., and Màrquez L.. 2002. Syntactic features for high precision word sense disambiguation. In Proceedings of the 19th International Conference on Computational Linguistics, Volume 1. New Brunswick. DOI: 10.3115/1072228.1072340Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Lee Y. K. and Ng H. T.. 2002. An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language processing, Volume 10. Association for Computational Linguistics, 4148. https://doi.org/10.3115/1118693.1118699Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Kübler S. and Zhekova D.. 2009. Semi-supervised learning for word sense disambiguation: Quality vs. quantity. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP’09). Association for Computational Linguistics, 197202.Google ScholarGoogle Scholar
  18. [18] Fellbaum C., Palmer M., Dang H. T., Delfs L., and Wolf S.. 2001. Manual and automatic semantic annotation with WordNet. WordNet and Other Lexical Resources. In Proceedings of the NAACL Workshop on WordNet and other Lexical Resources Applications Customizations, Pittsburg Californie, États-Unis, PA. 310.Google ScholarGoogle Scholar
  19. [19] Florian R., Cucerzan S., Schafer C., and Yarowsky D.. 2002. Combining classifiers for word sense disambiguation. Natural Language Engineering 8, 04 (2002), 327341.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Yarowsky D. and Florian R.. 2002. Evaluating sense disambiguation across diverse parameter spaces. Natural Language Engineering 8, 4 (2002), 293.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Bas D., Broda B., and Piasecki M.. 2008. Towards word sense disambiguation of polish. Paper presented at the Computer Science and Information Technology, 2008. IMCSIT 2008. International Multiconference on. Disambiguation. Natural Language Engineering 1, 04 (2008), 339362.Google ScholarGoogle Scholar
  22. [22] Le C. A. and Shimazu A.. 2004. High WSD accuracy using Naïve Bayesian classifier with rich features. Paper presented at the PACLIC. Logico-Linguistic Society of Japan, 105114.Google ScholarGoogle Scholar
  23. [23] Zavrel J., Degroeve S., Kool A., Daelemans W., and Jokinen K.. 2000. Diverse classifiers for NLP disambiguation tasks comparisons, optimization, combination, and evolution. Paper Presented at the Twente Workshops on Language Technology. University Twente, Enschende Ch. Bijoron.Google ScholarGoogle Scholar
  24. [24] Joshi M., Pakhomov S. V., Pedersen T., and Chute C. G.. 2006. A comparative study of supervised learning as applied to acronym expansion in clinical reports. Paper Presented at the AMIA. AMIA Knowledge Center, 399–403.Google ScholarGoogle Scholar
  25. [25] Agirre E. and Martinez D.. 2004. The Basque country university system: English and Basque tasks. Paper presented at the Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text. Association for Computational Linguistics.Google ScholarGoogle Scholar
  26. [26] Wiriyathammabhum P., Kijsirikul B., Takamura H., and Okumura M.. 2012. Applying deep belief networks to word sense disambiguation. arXiv preprint arXiv:1207.0396.Google ScholarGoogle Scholar
  27. [27] Tursun Eziz, Ganguly Debasis, Turghun Osman, Yang Yating, Abdukerim Ghalip, Zhou Junlin, and Liu Qun. 2016. A semisupervised tag-transition-based Markovian model for Uyghur morphology analysis. ACM Transactions on Asian and Low-Resource Language Information Processing 16, 2, 8:1–8:23. DOI: https://doi.org/10.1145/2968410Google ScholarGoogle Scholar
  28. [28] Véronis J.. 2004. Hyperlex: Lexical cartography for information retrieval. Computer Speech & Language 18, 3 (2004), 223252.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] McCarthy D., Koeling R., Weeds J., and Carroll J.. 2004. Finding predominant word senses in untagged text. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. ACL. DOI: 10.3115/1218955.1218991Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Agirre E., Martínez D., de Lacalle O. L., and Soroa A.. 2006. Two graph-based algorithms for state-of-the-art WSD. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. EMNLP. DOI: 10.3115/1610075.1610157Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Bölücü Necva and Can Burcu. 2019. Unsupervised joint PoS tagging and stemming for agglutinative languages. ACM Transactions on Asian and Low-Resource Language Information Processing 18, 3, 25:1–25:21. DOI: https://doi.org/10.1145/3292398Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Qiao Xiuming, Cao Hailong, Zhao Tiejun. 2016. Improving unsupervised dependency parsing with knowledge from query Logs. ACM Transactions on Asian and Low-Resource Language Information Processing 16, 1, 3:1–3:12. DOI: https://doi.org/10.1145/2903720Google ScholarGoogle Scholar
  33. [33] Sun Haipeng, Wang Rui, Utiyama Masao, Marie Benjamin, Chen Kehai, Sumita Eiichiro, and Zhao Tiejun. 2021. Unsupervised neural machine translation for similar and distant language pairs: An empirical study. ACM Transactions on Asian and Low-Resource Language Information Processing 20, 1, 117. DOI: https://doi.org/10.1145/3418059Google ScholarGoogle Scholar
  34. [34] Bölücü Necva and Can Burcu. 2021. A cascaded unsupervised model for PoS tagging. ACM Transactions on Asian and Low-Resource Language Information Processing 20, 1, 123. DOI: https://doi.org/10.1145/3447759Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A New Concept of Electronic Text Based on Semantic Coding System for Machine Translation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 1
      January 2022
      442 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3494068
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 November 2021
      • Accepted: 1 June 2021
      • Revised: 1 May 2021
      • Received: 1 January 2021
      Published in tallip Volume 21, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!