Abstract
In the field of machine translation of texts, the ambiguity in both lexical (dictionary) and structural aspects is still one of the difficult problems. Researchers in this field use different approaches, the most important of which is machine learning in its various types. The goal of the approach that we propose in this article is to define a new concept of electronic text, which makes the electronic text free from any lexical or structural ambiguity. We used a semantic coding system that relies on attaching the original electronic text (via the text editor interface) with the meanings intended by the author. The author defines the meaning desired for each word that can be a source of ambiguity. The proposed approach in this article can be used with any type of electronic text (text processing applications, web pages, email text, etc.). Thanks to the approach that we propose and through the experiments that we have conducted using it, we can obtain a very high accuracy rate. We can say that the problem of lexical and structural ambiguity can be completely solved. With this new concept of electronic text, the text file contains not only the text but also with it the true sense of the exact meaning intended by the writer in the form of symbols. These semantic symbols are used during machine translation to obtain a translated text completely free of any lexical and structural ambiguity.
- [1] . 2011. Automatic Translation: Concept and Methods. Fayoum University, Egypt: Dar Elilm.Google Scholar
- [2] . 2018. Metaheuristic for word sense disambiguation: A review. International Journal of Engineering & Technology 7, (3.20), (2018), 428–434.Google Scholar
Cross Ref
- [3] . 2009. Word sense disambiguation: A survey. ACM Computing Surveys 41, 2, 10.Google Scholar
Digital Library
- [4] . 2006. Knowledge-based methods for WSD. Word Sense Disambiguation: Algorithms and Applications (2006), 107–131.Google Scholar
- [5] . 2009. Word sense disambiguation: An overview. Language and Linguistics Compass 3, 2 (2009), 537–558.Google Scholar
Cross Ref
- [6] . 2000. Memory-based word sense disambiguation. Computers and the Humanities 34, 1–2 (2000), 171–177.Google Scholar
Cross Ref
- [7] . 2007. Supervised corpus-based methods for WSD Word Sense Disambiguation Springer. In Text, Speech and Language Technology Book Series (TLTB'07), volume 33, (Eds.). 167–216.Google Scholar
- [8] . 2002. Parameter optimization for machine-learning of word sense disambiguation. Natural Language Engineering 8, 04 (2002), 311–325.Google Scholar
Digital Library
- [9] . 2007. Sometimes less is more: Romanian word sense disambiguation revisited. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP’07).
Designed and Printed by INCOMA Ltd , 173–177.Google Scholar - [10] . 2019. “UTTAM”: An efficient spelling correction system for Hindi language based on supervised learning. ACM Transactions on Asian and Low-Resource Language Information Processing 18, 1, 8:1–8:26.
DOI:
https://doi.org/10.1145/3264620Google Scholar
Digital Library
- [11] . 2017. A supervised learning approach for authorship attribution of Bengali literary texts. ACM Transactions on Asian and Low-Resource Language Information Processing 16, 4, 28:1–28:15.
DOI:
https://doi.org/10.1145/3099473Google Scholar
Digital Library
- [12] . 1995. Engineering “word experts” for word disambiguation. Natural Language Engineering 1, 04 (1995), 339–362.Google Scholar
Cross Ref
- [13] . 2012. Word sense disambiguation in the national corpus of polish. Prace Filologiczne (LXIII). 155–166.Google Scholar
- [14] . 2002. Word sense disambiguation with pattern learning and automatic feature selection. Natural Language Engineering 8, 04 (2002), 343–358.Google Scholar
Digital Library
- [15] . 2002. Syntactic features for high precision word sense disambiguation. In Proceedings of the 19th International Conference on Computational Linguistics, Volume 1.
New Brunswick .DOI: 10.3115/1072228.1072340Google ScholarDigital Library
- [16] . 2002. An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language processing, Volume 10.
Association for Computational Linguistics , 41–48. https://doi.org/10.3115/1118693.1118699Google ScholarDigital Library
- [17] . 2009. Semi-supervised learning for word sense disambiguation: Quality vs. quantity. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP’09).
Association for Computational Linguistics , 197–202.Google Scholar - [18] . 2001. Manual and automatic semantic annotation with WordNet. WordNet and Other Lexical Resources. In Proceedings of the NAACL Workshop on WordNet and other Lexical Resources Applications Customizations,
Pittsburg Californie , États-Unis, PA. 3–10.Google Scholar - [19] . 2002. Combining classifiers for word sense disambiguation. Natural Language Engineering 8, 04 (2002), 327–341.Google Scholar
Digital Library
- [20] . 2002. Evaluating sense disambiguation across diverse parameter spaces. Natural Language Engineering 8, 4 (2002), 293.Google Scholar
Digital Library
- [21] . 2008. Towards word sense disambiguation of polish. Paper presented at the Computer Science and Information Technology, 2008. IMCSIT 2008. International Multiconference on. Disambiguation. Natural Language Engineering 1, 04 (2008), 339–362.Google Scholar
- [22] . 2004. High WSD accuracy using Naïve Bayesian classifier with rich features. Paper presented at the PACLIC. Logico-Linguistic Society of Japan, 105–114.Google Scholar
- [23] . 2000. Diverse classifiers for NLP disambiguation tasks comparisons, optimization, combination, and evolution. Paper Presented at the Twente Workshops on Language Technology. University Twente, Enschende Ch. Bijoron.Google Scholar
- [24] . 2006. A comparative study of supervised learning as applied to acronym expansion in clinical reports. Paper Presented at the AMIA. AMIA Knowledge Center, 399–403.Google Scholar
- [25] . 2004. The Basque country university system: English and Basque tasks. Paper presented at the Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text. Association for Computational Linguistics.Google Scholar
- [26] . 2012. Applying deep belief networks to word sense disambiguation. arXiv preprint arXiv:1207.0396.Google Scholar
- [27] . 2016. A semisupervised tag-transition-based Markovian model for Uyghur morphology analysis. ACM Transactions on Asian and Low-Resource Language Information Processing 16, 2, 8:1–8:23.
DOI:
https://doi.org/10.1145/2968410Google Scholar
- [28] . 2004. Hyperlex: Lexical cartography for information retrieval. Computer Speech & Language 18, 3 (2004), 223–252.Google Scholar
Cross Ref
- [29] . 2004. Finding predominant word senses in untagged text. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics.
ACL .DOI: 10.3115/1218955.1218991Google ScholarDigital Library
- [30] . 2006. Two graph-based algorithms for state-of-the-art WSD. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing.
EMNLP .DOI: 10.3115/1610075.1610157Google ScholarDigital Library
- [31] . 2019. Unsupervised joint PoS tagging and stemming for agglutinative languages. ACM Transactions on Asian and Low-Resource Language Information Processing 18, 3, 25:1–25:21.
DOI:
https://doi.org/10.1145/3292398Google Scholar
Digital Library
- [32] . 2016. Improving unsupervised dependency parsing with knowledge from query Logs. ACM Transactions on Asian and Low-Resource Language Information Processing 16, 1, 3:1–3:12.
DOI:
https://doi.org/10.1145/2903720Google Scholar
- [33] . 2021. Unsupervised neural machine translation for similar and distant language pairs: An empirical study. ACM Transactions on Asian and Low-Resource Language Information Processing 20, 1, 1–17.
DOI:
https://doi.org/10.1145/3418059Google Scholar
- [34] . 2021. A cascaded unsupervised model for PoS tagging. ACM Transactions on Asian and Low-Resource Language Information Processing 20, 1, 1–23.
DOI:
https://doi.org/10.1145/3447759Google Scholar
Digital Library
Index Terms
A New Concept of Electronic Text Based on Semantic Coding System for Machine Translation
Recommendations
Syntactic and semantic English-Korean machine translation using ontology
ICACT'09: Proceedings of the 11th international conference on Advanced Communication Technology - Volume 3This paper presents the syntactic and semantic method for English-Korean Machine Translation (MT) using ontology for Web-based MT system. We first build word class ontology from the English corpus and calculate the weight of relation between words in ...
Statistical machine translation system for English to Urdu
English and Urdu, both languages, belong to different language families and follow different grammatical structure. If the source and target languages differ in linguistic features, mainly structure of the sentences as is the case with English and Urdu ...
Word Sense Based Hindi-Tamil Statistical Machine Translation
Corpus based natural language processing has emerged with great success in recent years. It is not only used for languages like English, French, Spanish, and Hindi but also is widely used for languages like Tamil, Telugu etc. This paper focuses to ...






Comments