Abstract
Grammatical Error Correction (GEC) is a challenge in Natural Language Processing research. Although many researchers have been focusing on GEC in universal languages such as English or Chinese, few studies focus on Indonesian, which is a low-resource language. In this article, we proposed a GEC framework that has the potential to be a baseline method for Indonesian GEC tasks. This framework treats GEC as a multi-classification task. It integrates different language embedding models and deep learning models to correct 10 types of Part of Speech (POS) error in Indonesian text. In addition, we constructed an Indonesian corpus that can be utilized as an evaluation dataset for Indonesian GEC research. Our framework was evaluated on this dataset. Results showed that the Long Short-Term Memory model based on word-embedding achieved the best performance. Its overall macro-average F0.5 in correcting 10 POS error types reached 0.551. Results also showed that the framework can be trained on a low-resource dataset.
- Swapnali Deelip Baviskar and Sushant S. Bahekar. 2019. Comparative study of rule based approach for grammar checker. Int. J. Manage. Technol. Eng. 9, 1 (2019), 1315–1319.Google Scholar
- Gary F. Simons and Charles D. Fennig, Ed. 2005. Ethnologue: Languages of the world of Asia. SIL Int. Publ. 62, 3 (2005), 1–339.Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780. Google Scholar
Digital Library
- Dale Robert, Anisimoff, Ilya and Narroway George. 2012. HOO 2012: A report on the preposition and determiner error correction shared task. In Proceedings of the 7th Workshop on the Innovative Use of NLP for Building Educational Applications. Association for Computational Linguistics. 54-62. Google Scholar
Digital Library
- Hwee Tou Ng, Siew Mei Wu, Yuanbin Wu, Christian Hadiwinoto, and Joel Tetreault. 2013. The CoNLL-2013 shared task on grammatical error correction. 2013. In Proceedings of the 7th Conference on Computational Natural Language Learning. Association for Computational Linguistics. 1–12.Google Scholar
- Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian Hadiwinoto, Raymond Hendy Susanto, and Christopher Bryant. 2014. The CoNLL-2014 shared task on grammatical error correction. In Proceedings of the 8th Conference on Computational Natural Language Learning. Association for Computational Linguistics. 1–14.Google Scholar
Cross Ref
- Yuanyuan Zhao, Nan Jiang, Weiwei Sun, and Xiaojun Wan. 2018. Overview of the NLPCC 2018 shared task: Grammatical error correction. In Lecture Notes in Computer Science.Google Scholar
- Daniel Naber. 2003. A Rule-based Style and Grammar Checker. GRIN Verlag (2003).Google Scholar
- Shashi Pal Singh, Ajai Kumar, Lenali Singh, Mahesh Bhargava, Kritika Goyal, and Bhanu Sharma. 2016. Frequency based spell checking and rule based grammar checking. In Proceedings of the 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT’16). 4435–4439.Google Scholar
Cross Ref
- Jan Buys and Brink van der Merwe. 2013. A tree transducer model for grammatical error correction. In Proceedings of the 17th Conference on Computational Natural Language Learning. Association for Computational Linguistics. 43–51.Google Scholar
- Mariano Felice, Zheng Yuan, Øistein E. Andersen, Helen Yannakoudakis, and Ekaterina Kochmar. 2015. Grammatical error correction using hybrid systems and type filtering. In Proceedings of the 18th Conference on Computational Natural Language Learning. Association for Computational Linguistics. 15–24.Google Scholar
- Jinnan Yang, Bo Peng, Jin Wang, Jixian Zhang, and Xuejie Zhang. 2016. Chinese grammatical error diagnosis using single word embedding. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA’16). 155–161.Google Scholar
- Christopher Bryant and Ted Briscoe. 2018. Language model based grammatical error correction without annotated training data. In Proceedings of the 13th Workshop on Innovative Use of NLP for Building Educational Applications. 247–253. Association for Computational Linguistics.Google Scholar
Cross Ref
- Chuan Wang, Ruobing Li, and Hui Lin. 2017. Deep context model for grammatical error correction. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications. 155–161.Google Scholar
Cross Ref
- Zhu Kaili, Chuan Wang, Ruobing Li, Yang Liu, Tianlei Hu, and Hui Lin. 2018. A simple but effective classification model for grammatical error correction. CoRR abs/1807.0.Google Scholar
- Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, and Geoffrey Hinton. 2015. Grammar as a foreign language. In Proceedings of the 28th International Conference on Neural Information Processing Systems. 2773–2781. MIT Press, Cambridge, MA. Google Scholar
Digital Library
- Shamil Chollampatt and Hwee Tou Ng. 2017. Connecting the dots: Towards human-level grammatical error correction. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications. 327–333. Association for Computational Linguistics.Google Scholar
Cross Ref
- Zheng Yuan and Ted Briscoe. 2016. Grammatical error correction using neural machine translation. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics. 380–386. Association for Computational Linguistics.Google Scholar
Cross Ref
- Helen Yannakoudakis, Ted Briscoe, and Ben Medlock. 2011. A new dataset and method for automatically grading ESOL texts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 180–189. Association for Computational Linguistics. Google Scholar
Digital Library
- Kai Fu, Jin Huang, and Yitao Duan. 2018. Youdao's winning solution to the NLPCC-2018 task 2 challenge: A neural machine translation approach to chinese grammatical error correction. In Proceedings of the 7th Natural Language Processing and Chinese Computing. 341–350. Springer International Publishing, Cham.Google Scholar
Cross Ref
- Junpei Zhou, Chen Li, Hengyou Liu, Zuyi Bao, Guangwei Xu, and Linlin Li. 2018. Chinese grammatical error correction using statistical and neural models. In Proceedings of the 7th Natural Language Processing and Chinese Computing. 117–128. Springer International Publishing, Cham.Google Scholar
Cross Ref
- Hongkai Ren, Liner Yang, and Endong Xun. 2018. A sequence to sequence learning for chinese grammatical error correction. In Proceedings of the 7th Natural Language Processing and Chinese Computing. 401–410. Springer International Publishing, Cham.Google Scholar
Cross Ref
- Haryanto Atmowardoyo. 2007. Grammatical Errors in Indonesian EFL Learners’ Writing.Google Scholar
- Asanilta Fahda and Ayu Purwarianti. 2017. A statistical and rule-based spelling and grammar checker for Indonesian text. In Proceedings of the 2017 International Conference on Data and Software Engineering (ICoDSE’17). 1–6.Google Scholar
Cross Ref
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. {BERT:} Pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.0.Google Scholar
- Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Sign. Process. 45, 11 (1997), 2673–2681. Google Scholar
Digital Library
- M. A. A. Cox and Trevor F. Cox. 2001. Multidimensional scaling. J. Roy. Stat. Soc. 46, 2 (2001).Google Scholar
- Yiming Yang and Xin Liu. 1999. A re-examination of text categorization methods. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 42–49. ACM, Berkeley, CA. Google Scholar
Digital Library
- G. G. Chowdhury. 1983. Introduction to modern information retrieval. 55, 4 (1983), 239–240.Google Scholar
Index Terms
A Framework for Indonesian Grammar Error Correction
Recommendations
Multilingual fine-tuning for Grammatical Error Correction
AbstractFinding a single model capable of comprehending multiple languages is an area of active research in Natural Language Processing (NLP). Recently developed models such as mBART, mT5 or xProphetNet can solve problems connected with, for ...
Highlights- Single model is capable of solving GEC for multiple languages.
- Pre-trained ...
Exploring Bilingual Word Vectors for Hindi-English Cross-Language Information Retrieval
ICIA-16: Proceedings of the International Conference on Informatics and AnalyticsTodays, The internet has become a source of multi-lingual content. Users are not aware of multiple languages, so the language diversity becomes a great barrier for world communication. Cross-Language Information Retrieval (CLIR) provides a solution for ...
Dataset Enhancement and Multilingual Transfer for Named Entity Recognition in the Indonesian Language
Named entity recognition in the Indonesian language has significantly developed in recent years. However, it still lacks standardized publicly available corpora; a small dataset is available but suffers from inconsistent annotations. Therefore, we re-...






Comments