Abstract
The number of sentence pairs in the bilingual corpus is a key to translation accuracy in computational machine translations. However, if the amount goes beyond a certain degree, the increasing number of cases has less impact on the translation while the construction of translation systems requires a considerable amount of time and energy, thus preventing the development of a statistical translation by the computer. This article offers a number of classifications for measuring the amount of information for each pair of sentences, using the Heuristic Bilingual Graph Corpus Network (HBGCN) to form an improved method of corpus selection that takes the difference between the first amount of information between the pairs of sentences into account. Using a graphic-based selector method as a training set, they achieve a close translation result through our experiments with the whole body and achieve better results than basic results for the following based on the Document Inverse Frequency (DIF) ranking approach.
- D. S. Munteanu and D. Marcu. 2005. Improving machine translation performance by exploiting non-parallel corpora. Computational Linguistics 31, 4 (2005), 477–504. Google Scholar
Digital Library
- B. Chen, R. Cattoni, N. Bertoldi, M. Cettolo, and M. Federico. 2005. The ITC-irst SMT system for IWSLT-2005. In International Workshop on Spoken Language Translation (IWSLT'05), Pittsburgh, PA, USA, October 24-25, 2005.Google Scholar
- Y. Nieto, V. García-Díaz, C. Montenegro, and R. G. Crespo. 2019. Supporting academic decision making at higher educational institutions using machine learning-based algorithms. Soft Computing 23, 12 (2019), 4145–4153. Google Scholar
Digital Library
- W. Lewis and S. Eetemadi. 2013. Dramatically reducing training data size through vocabulary saturation. In Proceedings of the 8th Workshop on Statistical Machine Translation (281–291).Google Scholar
- C. J. M. Jutinico, C. E. Montenegro-Marin, D. Burgos, and R. G. Crespo. 2019. Natural language interface model for the evaluation of ergonomic routines in occupational health (ILENA). Journal of Ambient Intelligence and Humanized Computing 10, 4 (2019), 1611–1619.Google Scholar
Cross Ref
- B. Jones, J. Andreas, D. Bauer, K. M. Hermann, and K. Knight. 2012. Semantics-based machine translation with hyperedge replacement grammars. In Proceedings of COLING 2012, Mumbai, India. 1359–1376.Google Scholar
- Y. V. Nieto, V. García-Díaz, and C. E. Montenegro. 2019. Decision-making model at higher educational institutions based on machine learning. Journal of Universal Computer Science 25, 10 (2019), 1301–1322.Google Scholar
- R. Yan, M. Gao, E. Pavlick, and C. Callison-Burch. 2014. Are two heads better than one? Crowdsourced translation via a two-step collaboration of non-professional translators and editors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, Maryland, USA, June 23-25, 2014. 1134--1144.Google Scholar
Cross Ref
- Y. Nieto, V. Gacía-Díaz, C. Montenegro, C. C. González, and R. G. Crespo. 2019. Usage of machine learning for strategic decision making at higher educational institutions. IEEE Access 7, 75007–75017.Google Scholar
Cross Ref
- R. Wang, H. Zhao, S. Ploux, B. L. Lu, M. Utiyama, and E. Sumita. 2018. Graph-based bilingual word embedding for statistical machine translation. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 17, 4 (2018), 31. Google Scholar
Digital Library
- A. A. Menezes, C. B. Quirk, and C. A. Cherry. 2010. U. S. Patent No. 7, 698, 124. Washington, DC: U.S. Patent and Trademark Office.Google Scholar
- J. Bobadilla, F. Ortega, A. Gutiérrez, and S. Alonso. 2020. Classification-based deep neural network architecture for collaborative filtering recommender systems. International Journal of Interactive Multimedia and Artificial Intelligence 6, 1 (2020), 68--77.Google Scholar
Cross Ref
- S. Liu, C. H. Li, M. Li, and M. Zhou. 2012. Learning translation consensus with structured label propagation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1 (302–310). ACL. Google Scholar
Digital Library
- S. Kumar, V. K. Solanki, S. K. Choudhary, A. Selamat, and R. González Crespo. 2020. Comparative study on ant colony optimization (ACO) and k-means clustering approaches for jobs scheduling and energy optimization model in internet of things (IoT). International Journal of Interactive Multimedia and Artificial Intelligence 6, 1 (2020), 107--116.Google Scholar
Cross Ref
- E. Sumita, Y. Akiba, T. Doi, A. M. Finch, K. Imamura, H. Okuma, and T. Watanabe. 2004. EBMT, SMT, hybrid and more: ATR spoken language translation system. In IWSLT (13–20).Google Scholar
- M. Abdel‐Basset, G. Manogaran, M. Mohamed, and E. Rushdy. 2019. Internet of things in smart education environment: Supportive framework in the decision‐making process. Concurrency and Computation: Practice and Experience 31, 10 (2019), e4515.Google Scholar
Cross Ref
- C. Hardmeier. 2012. Discourse in statistical machine translation. A survey and a case study. Discours. Revue de Linguistique, Psycholinguistique et Informatique. A Journal of Linguistics, Psycholinguistics and Computational Linguistics, (11). DOI:https://doi.org/10.4000/discours.8726Google Scholar
- G. Manogaran, C. Thota, and D. Lopez. 2018. Human-computer interaction with big data analytics. In HCI Challenges and Privacy Preservation in Big Data Security. IGI Global, 1--22. DOI:10.4018/978-1-5225-2863-0.CH001Google Scholar
- N. Ueffing and H. Ney. 2007. Word-level confidence estimation for machine translation. Computational Linguistics 33, 1 (2007), 9–40. Google Scholar
Digital Library
- A. Barrón-Cedeño, C. España-Bonet, J. Boldoba, and L. Màrquez. 2015. A factory of comparable corpora from Wikipedia. In Proceedings of the 8th Workshop on Building and Using Comparable Corpora, Beijing, China, July 30, 2015. 3--13. http://hdl.handle.net/2117/76611.Google Scholar
Cross Ref
- P. M. Shakeel and S. Baskar. 2020. Automatic human emotion classification in web document using fuzzy inference system (FIS): Human emotion classification. International Journal of Technology and Human Interaction (IJTHI) 16, 1 (2020), 94–104.Google Scholar
Cross Ref
- J. Steinberger and M. Turchi. 2012. Machine translation for multilingual summary content evaluation. In Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization, Montréal, Canada. Association for Computational Linguistics, 19--27. Google Scholar
Digital Library
- T. Nguyen, P. C. Rigby, A. T. Nguyen, M. Karanfil, and T. N. Nguyen. 2016. T2API: Synthesizing API code usage templates from english texts with statistical translation. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 1013--1017. DOI:https://doi.org/10.1145/2950290.2983931 Google Scholar
Digital Library
- M. A. Jeff, S. Matsoukas, and R. Schwartz. 2011. Improving low-resource statistical machine translation with a novel semantic word clustering algorithm. In Proceedings of the MT Summit XIII. 352--359.Google Scholar
- H. Khayrallah and P. Koehn. 2018. On the impact of various types of noise on neural machine translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. Association for Computational Linguistics, 74--83. DOI:10.18653/v1/W18-2709Google Scholar
- K. S. Tilekar and H. V. Kumbhar. 2017. Enhancing machine translation using graph approach. International Journal of Innovative Research and Advanced Studies (IJIRAS) 4, 1 (January 2017), 397--401.Google Scholar
- A. Balahur and M. Turchi. 2014. Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis. Computer Speech and Language 28, 1 (2014), 56–75. Google Scholar
Digital Library
- C. Hardmeier. 2014. Discourse in Statistical Machine Translation (Doctoral dissertation, Acta Universitatis Upsaliensis).Google Scholar
- Y. Kim, J. Huang, and Y. Billawala. 2008. U. S. Patent Application No. 11/645,926.Google Scholar
- S. Eetemadi, W. Lewis, K. Toutanova, and H. Radha. 2015. Survey of data-selection methods in statistical machine translation. Machine Translation 29(3--4), 189–223. Google Scholar
Digital Library
- B. Chen, M. Cettolo, and M. Federico. 2006. Reordering rules for phrase-based statistical machine translation. In International Workshop on Spoken Language Translation (IWSLT'06). 182--189.Google Scholar
- E. Biçici and D. Yuret. 2011. RegMT system for machine translation, system combination, and evaluation. In Proceedings of the 6th Workshop on Statistical Machine Translation, Edinburgh, Scotland. ACL, 323--329. Google Scholar
Digital Library
- A. Balahur and M. Turchi. 2012. Multilingual sentiment analysis using machine translation? In Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis, Jeju, Korea. ACL, 52--60. Google Scholar
Digital Library
Index Terms
Heuristic Bilingual Graph Corpus Network to Improve English Instruction Methodology Based on Statistical Translation Approach
Recommendations
Post-Ordering by Parsing with ITG for Japanese-English Statistical Machine Translation
Word reordering is a difficult task for translation between languages with widely different word orders, such as Japanese and English. A previously proposed post-ordering method for Japanese-to-English translation first translates a Japanese sentence ...
A Neural Network Classifier Based on Dependency Tree for English-Vietnamese Statistical Machine Translation
Computational Linguistics and Intelligent Text ProcessingAbstractReordering in MT is a major challenge when translating between languages with different of sentence structures. In Phrase-based statistical machine translation (PBSMT) systems, syntactic pre-ordering is a commonly used pre-processing technique. ...
HPSG-Based Preprocessing for English-to-Japanese Translation
Japanese sentences have completely different word orders from corresponding English sentences. Typical phrase-based statistical machine translation (SMT) systems such as Moses search for the best word permutation within a given distance limit (...






Comments