skip to main content
research-article

A Framework for Indonesian Grammar Error Correction

Authors Info & Claims
Published:26 May 2021Publication History
Skip Abstract Section

Abstract

Grammatical Error Correction (GEC) is a challenge in Natural Language Processing research. Although many researchers have been focusing on GEC in universal languages such as English or Chinese, few studies focus on Indonesian, which is a low-resource language. In this article, we proposed a GEC framework that has the potential to be a baseline method for Indonesian GEC tasks. This framework treats GEC as a multi-classification task. It integrates different language embedding models and deep learning models to correct 10 types of Part of Speech (POS) error in Indonesian text. In addition, we constructed an Indonesian corpus that can be utilized as an evaluation dataset for Indonesian GEC research. Our framework was evaluated on this dataset. Results showed that the Long Short-Term Memory model based on word-embedding achieved the best performance. Its overall macro-average F0.5 in correcting 10 POS error types reached 0.551. Results also showed that the framework can be trained on a low-resource dataset.

References

  1. Swapnali Deelip Baviskar and Sushant S. Bahekar. 2019. Comparative study of rule based approach for grammar checker. Int. J. Manage. Technol. Eng. 9, 1 (2019), 1315–1319.Google ScholarGoogle Scholar
  2. Gary F. Simons and Charles D. Fennig, Ed. 2005. Ethnologue: Languages of the world of Asia. SIL Int. Publ. 62, 3 (2005), 1–339.Google ScholarGoogle Scholar
  3. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Dale Robert, Anisimoff, Ilya and Narroway George. 2012. HOO 2012: A report on the preposition and determiner error correction shared task. In Proceedings of the 7th Workshop on the Innovative Use of NLP for Building Educational Applications. Association for Computational Linguistics. 54-62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hwee Tou Ng, Siew Mei Wu, Yuanbin Wu, Christian Hadiwinoto, and Joel Tetreault. 2013. The CoNLL-2013 shared task on grammatical error correction. 2013. In Proceedings of the 7th Conference on Computational Natural Language Learning. Association for Computational Linguistics. 1–12.Google ScholarGoogle Scholar
  6. Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian Hadiwinoto, Raymond Hendy Susanto, and Christopher Bryant. 2014. The CoNLL-2014 shared task on grammatical error correction. In Proceedings of the 8th Conference on Computational Natural Language Learning. Association for Computational Linguistics. 1–14.Google ScholarGoogle ScholarCross RefCross Ref
  7. Yuanyuan Zhao, Nan Jiang, Weiwei Sun, and Xiaojun Wan. 2018. Overview of the NLPCC 2018 shared task: Grammatical error correction. In Lecture Notes in Computer Science.Google ScholarGoogle Scholar
  8. Daniel Naber. 2003. A Rule-based Style and Grammar Checker. GRIN Verlag (2003).Google ScholarGoogle Scholar
  9. Shashi Pal Singh, Ajai Kumar, Lenali Singh, Mahesh Bhargava, Kritika Goyal, and Bhanu Sharma. 2016. Frequency based spell checking and rule based grammar checking. In Proceedings of the 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT’16). 4435–4439.Google ScholarGoogle ScholarCross RefCross Ref
  10. Jan Buys and Brink van der Merwe. 2013. A tree transducer model for grammatical error correction. In Proceedings of the 17th Conference on Computational Natural Language Learning. Association for Computational Linguistics. 43–51.Google ScholarGoogle Scholar
  11. Mariano Felice, Zheng Yuan, Øistein E. Andersen, Helen Yannakoudakis, and Ekaterina Kochmar. 2015. Grammatical error correction using hybrid systems and type filtering. In Proceedings of the 18th Conference on Computational Natural Language Learning. Association for Computational Linguistics. 15–24.Google ScholarGoogle Scholar
  12. Jinnan Yang, Bo Peng, Jin Wang, Jixian Zhang, and Xuejie Zhang. 2016. Chinese grammatical error diagnosis using single word embedding. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA’16). 155–161.Google ScholarGoogle Scholar
  13. Christopher Bryant and Ted Briscoe. 2018. Language model based grammatical error correction without annotated training data. In Proceedings of the 13th Workshop on Innovative Use of NLP for Building Educational Applications. 247–253. Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  14. Chuan Wang, Ruobing Li, and Hui Lin. 2017. Deep context model for grammatical error correction. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications. 155–161.Google ScholarGoogle ScholarCross RefCross Ref
  15. Zhu Kaili, Chuan Wang, Ruobing Li, Yang Liu, Tianlei Hu, and Hui Lin. 2018. A simple but effective classification model for grammatical error correction. CoRR abs/1807.0.Google ScholarGoogle Scholar
  16. Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, and Geoffrey Hinton. 2015. Grammar as a foreign language. In Proceedings of the 28th International Conference on Neural Information Processing Systems. 2773–2781. MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Shamil Chollampatt and Hwee Tou Ng. 2017. Connecting the dots: Towards human-level grammatical error correction. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications. 327–333. Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  18. Zheng Yuan and Ted Briscoe. 2016. Grammatical error correction using neural machine translation. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics. 380–386. Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  19. Helen Yannakoudakis, Ted Briscoe, and Ben Medlock. 2011. A new dataset and method for automatically grading ESOL texts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 180–189. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kai Fu, Jin Huang, and Yitao Duan. 2018. Youdao's winning solution to the NLPCC-2018 task 2 challenge: A neural machine translation approach to chinese grammatical error correction. In Proceedings of the 7th Natural Language Processing and Chinese Computing. 341–350. Springer International Publishing, Cham.Google ScholarGoogle ScholarCross RefCross Ref
  21. Junpei Zhou, Chen Li, Hengyou Liu, Zuyi Bao, Guangwei Xu, and Linlin Li. 2018. Chinese grammatical error correction using statistical and neural models. In Proceedings of the 7th Natural Language Processing and Chinese Computing. 117–128. Springer International Publishing, Cham.Google ScholarGoogle ScholarCross RefCross Ref
  22. Hongkai Ren, Liner Yang, and Endong Xun. 2018. A sequence to sequence learning for chinese grammatical error correction. In Proceedings of the 7th Natural Language Processing and Chinese Computing. 401–410. Springer International Publishing, Cham.Google ScholarGoogle ScholarCross RefCross Ref
  23. Haryanto Atmowardoyo. 2007. Grammatical Errors in Indonesian EFL Learners’ Writing.Google ScholarGoogle Scholar
  24. Asanilta Fahda and Ayu Purwarianti. 2017. A statistical and rule-based spelling and grammar checker for Indonesian text. In Proceedings of the 2017 International Conference on Data and Software Engineering (ICoDSE’17). 1–6.Google ScholarGoogle ScholarCross RefCross Ref
  25. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. {BERT:} Pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.0.Google ScholarGoogle Scholar
  26. Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Sign. Process. 45, 11 (1997), 2673–2681. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. A. A. Cox and Trevor F. Cox. 2001. Multidimensional scaling. J. Roy. Stat. Soc. 46, 2 (2001).Google ScholarGoogle Scholar
  28. Yiming Yang and Xin Liu. 1999. A re-examination of text categorization methods. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 42–49. ACM, Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. G. G. Chowdhury. 1983. Introduction to modern information retrieval. 55, 4 (1983), 239–240.Google ScholarGoogle Scholar

Index Terms

  1. A Framework for Indonesian Grammar Error Correction

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!