skip to main content
short-paper

Context-Dependent Sequence-to-Sequence Turkish Spelling Correction

Authors Info & Claims
Published:17 April 2020Publication History
Skip Abstract Section

Abstract

In this article, we make use of sequence-to-sequence (seq2seq) models for spelling correction in the agglutinative Turkish language. In the baseline system, misspelled and target words are split into their letters and the letter sequences are fed into the seq2seq model. We prefer letters as the unit of the model due to the agglutinative nature of Turkish, which results in an impractical dictionary size when words are used as a dictionary unit. In order to improve the baseline performance, we incorporate right and left context of the misspelled words. All context words are represented with their first three consonants in the context-dependent model. We train the seq2seq models using a large text corpus collected automatically from the Internet. The corpus contains approximately 4 million sentences. We randomly introduce substitution, deletion, and insertion spelling errors to the words in the corpus. We test the performance of the proposed context-dependent seq2seq model using synthetic and realistic test sets. The synthetic test set is constructed similar to the training set. The realistic test set contains human-made misspellings from Twitter messages. In the experiments, we observed that the proposed context-dependent model performs significantly better than the baseline system. Its correction accuracy reaches 94% on the synthetic dataset. Additionally, the proposed method provides 2.1% absolute improvement over a state-of-the-art Turkish spelling correction system on the Twitter test set.

References

  1. Ahmet Afsin Akın and Mehmet Dündar Akın. 2007. Zemberek, an open source NLP framework for Turkic languages. Structure 10 (2007), 1--5.Google ScholarGoogle Scholar
  2. Ouais Alsharif, Tom Ouyang, Françoise Beaufays, Shumin Zhai, Thomas Breuel, and Johan Schalkwyk. 2015. Long short term memory neural network for keyboard gesture decoding. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2076--2080.Google ScholarGoogle Scholar
  3. Ebru Arisoy, Tara N. Sainath, Brian Kingsbury, and Bhuvana Ramabhadran. 2012. Deep neural network language models. In Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT. ACL, 20--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Arxiv Preprint Arxiv:1409.0473 (2014).Google ScholarGoogle Scholar
  5. Loghman Barari and Behrang QasemiZadeh. 2005. Clonizer spell checker adaptive, language independent spell checker. In AIML 2005 Conference CICC, Cairo, Egypt. 19--21.Google ScholarGoogle Scholar
  6. Xiaojun Bi, Tom Ouyang, and Shumin Zhai. 2014. Both complete and correct?: Multi-objective optimization of touchscreen keyboard. In Proceedings of the 32nd Annual ACM Conference on Human Factors in Computing Systems. ACM, 2297--2306.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Osman Büyük. 2005. Sub-world Language Modelling for Turkish Speech Recognition. Ph.D. Dissertation.Google ScholarGoogle Scholar
  8. Çagri Çöltekin. 2014. A set of open source tools for Turkish natural language processing. In LREC. 1079--1086.Google ScholarGoogle Scholar
  9. Hakan Erdogan, Osman Buyuk, and Kemal Oflazer. 2005. Incorporating language constraints in sub-word based speech recognition. In IEEE Workshop on Automatic Speech Recognition and Understanding, 2005. IEEE, 98--103.Google ScholarGoogle ScholarCross RefCross Ref
  10. Pravallika Etoori, Manoj Chinnakotla, and Radhika Mamidi. 2018. Automatic spelling correction for resource-scarce languages using deep learning. In Proceedings of ACL 2018, Student Research Workshop. 146--152.Google ScholarGoogle ScholarCross RefCross Ref
  11. Pieter Fivez, Simon Šuster, and Walter Daelemans. 2017. Unsupervised context-sensitive spelling correction of clinical free-text with word and character n-gram embedding. In 16th Workshop on Biomedical Natural Language Processing of the Association for Computational Linguistics. 143--148.Google ScholarGoogle ScholarCross RefCross Ref
  12. Shaona Ghosh and Per Ola Kristensson. 2017. Neural networks for text correction and completion in keyboard decoding. Arxiv Preprint Arxiv:1709.06429 (2017).Google ScholarGoogle Scholar
  13. Saša Hasan, Carmen Heger, and Saab Mansour. 2015. Spelling correction of user search queries through statistical machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 451--460.Google ScholarGoogle ScholarCross RefCross Ref
  14. Ahmed Hassan, Sara Noeman, and Hany Hassan. 2008. Language independent text correction using finite state automata. In Proceedings of the 3rd International Joint Conference on Natural Language Processing: Volume-II.Google ScholarGoogle Scholar
  15. Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Brian Kingsbury, et al. 2012. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine 29 (2012).Google ScholarGoogle Scholar
  16. Kimmo Koskenniemi. 1983. Two-level Morphology: A General Computational Model for Word-form Recognition and Production. Vol. 11. University of Helsinki, Department of General Linguistics Helsinki.Google ScholarGoogle Scholar
  17. Per-Ola Kristensson and Shumin Zhai. 2005. Relaxing stylus typing precision by geometric pattern matching. In Proceedings of the 10th International Conference on Intelligent User Interfaces. ACM, 151--158.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. Arxiv Preprint Arxiv:1508.04025 (2015).Google ScholarGoogle Scholar
  19. Kemal Oflazer. 1994. Two-level description of Turkish morphology. Literary and Linguistic Computing 9, 2 (1994), 137--148.Google ScholarGoogle ScholarCross RefCross Ref
  20. Kemal Oflazer. 1996. Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Computational Linguistics 22, 1 (1996), 73--89.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kemal Oflazer, Elvan Göçmen, and Cem Bozşahin. 1994. An outline of Turkish morphology. Report to NATO Science Division SfS III (TU-LANGUAGE), Brussels (1994).Google ScholarGoogle Scholar
  22. Fred Richardson, Douglas Reynolds, and Najim Dehak. 2015. Deep neural network approaches to speaker and language recognition. IEEE Signal Processing Letters 22, 10 (2015), 1671--1675.Google ScholarGoogle ScholarCross RefCross Ref
  23. Annette Rios. 2011. Spell checking an agglutinative language: Quechua. In Proceedings of the 5th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics. 51--55.Google ScholarGoogle Scholar
  24. Frank Seide. 2017. Keynote: The computer science behind the Microsoft cognitive toolkit: An open source large-scale deep learning toolkit for Windows and Linux. In Proceedings of the 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, xi–xi.Google ScholarGoogle ScholarCross RefCross Ref
  25. Dilara Torunoglu-Selamet, Eren Bekar, Tugay Ilbay, and Gülsen Eryigit. 2016. Exploring spelling correction approaches for Turkish. In Proceedings of the 1st International Conference on Turkic Computational Linguistics at CICLING, Konya. 7--11.Google ScholarGoogle Scholar
  26. Keith Vertanen, Haythem Memmi, Justin Emge, Shyam Reyal, and Per Ola Kristensson. 2015. VelociTap: Investigating fast mobile text entry using sentence-based decoding of touchscreen keyboard input. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 659--668.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Oriol Vinyals and Quoc Le. 2015. A neural conversational model. Arxiv Preprint Arxiv:1506.05869 (2015).Google ScholarGoogle Scholar
  28. Casey Whitelaw, Ben Hutchinson, Grace Y. Chung, and Gerard Ellis. 2009. Using the web for language independent spellchecking and autocorrection. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Volume 2. ACL, 890--899.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. Arxiv Preprint Arxiv:1609.08144 (2016).Google ScholarGoogle Scholar
  30. Heiga Ze, Andrew Senior, and Mike Schuster. 2013. Statistical parametric speech synthesis using deep neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 7962--7966.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Context-Dependent Sequence-to-Sequence Turkish Spelling Correction

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 19, Issue 4
      July 2020
      291 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3391538
      Issue’s Table of Contents

      Copyright © 2020 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 April 2020
      • Accepted: 1 February 2020
      • Revised: 1 November 2019
      • Received: 1 March 2019
      Published in tallip Volume 19, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!