skip to main content
research-article

Cleaning of Online Bangla Free-form Handwritten Text

Published:02 November 2017Publication History
Skip Abstract Section

Abstract

In the normal free-form handwritten text, repetition (repeated writing of the same stroke several times in the same place), over-writing, and crossing out are very common. In this article, we call the presence of these three types of writing as “noise.” Cleaning to extract useful text from such types of noisy text is an important task for robust recognition. To the best of our knowledge, no work has been reported on cleaning of such noise from online text in any scripts and hence, in this article, we propose an automatic text-cleaning approach for online handwriting recognition. Here, at first, crossing out noise with straight strike-through lines is detected using the straightness criteria of online strokes. Next, regions containing repetition, over-writing, and other types of crossing out are located using the positional information of the overlapping strokes. Stroke density, self-intersections of strokes etc. are computed from the strokes of located regions to predict the type of noise and this type of information is used as follows for their cleaning. For cleaning of crossing outs, all strokes of the crossing-out region are removed. For cleaning repetition and over-writing, strokes written earlier are removed, keeping the latest strokes. Finally, delayed strokes are properly arranged and word is passed to online recognizer. Though recognition of free-form handwriting is quite difficult, in this attempt, we obtained up to 70.71% improvement in word-recognition accuracy after noise cleaning.

References

  1. C. Adak and B. B. Chaudhuri. 2014. An approach of strike-through text identification from handwritten documents. In Proceedings of International Conference on Frontiers in Handwriting Recognition. 643--648. Google ScholarGoogle ScholarCross RefCross Ref
  2. A. Bharath and S. Madhvanath. 2012. HMM-based lexicon-driven and lexicon-free word recognition for online handwritten Indic scripts. IEEE Trans. Pattern Anal. Mach. Intell. 34, 4 (Apr. 2012), 670--682. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Bhattacharya and U. Pal. 2012. Stroke segmentation and recognition from Bangla online handwritten text. In Proceedings of International Conference on Frontiers in Handwriting Recognition. 740--745. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Bhattacharya, U. Pal, and F. Kimura. 2013. A system for Bangla online handwritten text. In Proceedings of International Conference on Document Analysis and Recognition, 1335--1339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. Bhattacharya, V. Frinken, U. Pal, and P. P. Roy. 2015. Overwriting repetition and crossing-out detection in online handwritten text. In Proceedings of Asian Conference on Pattern Recognition. 680--684. Google ScholarGoogle ScholarCross RefCross Ref
  6. U. Bhattacharya, A. Nigam, Y. S. Rawat, and S. K. Parui. 2008. An analytic scheme for online handwritten Bangla cursive word recognition. In Proceedings of International Conference on Frontiers in Handwriting Recognition, 320--325.Google ScholarGoogle Scholar
  7. C. Bishop. 2006. Pattern Recognition and Machine Learning. Springer Verlag.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. B. Chaudhuri and C. Adak. An approach for detecting and cleaning of struck-out handwritten text. Pattern Recogn. Retrieved from http://dx.doi.org/10.1016/j.patcog.2016.07.032. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. V. Frinken, N. Bhattacharya, and U. Pal. 2014. Design of unsupervised feature extraction system for on-line Bangla handwriting recognition. In Proceedings of International Workshop on Document Analysis Systems. 355--359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. V. Frinken, N. Bhattacharya, S. Uchida, and U. Pal. 2014. Improved BLSTM neural networks for recognition of on-line Bangla complex words. Proceedings of the Joint International Workshops on Statistical Techniques in Pattern Recognition and Structural and Syntactic Pattern Recognition, Lecture Notes in Computer Science. Springer, 404--413. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Hao, B. Zhu, and M. Nakagawa. 2016. A line-direction-free and character-orientation-free on-line handwritten Japanese text recognition system. IEICE Trans. Info. Syst. E99-D, 1, 197--207.Google ScholarGoogle ScholarCross RefCross Ref
  12. S. Jaeger, S. Manke, J. Reichert, and A. Waibel. 2001. Online handwriting recognition: The NPen++ recognizer. Int. J. Doc. Anal. Recogn. 3, 169--180. Google ScholarGoogle ScholarCross RefCross Ref
  13. F. Jelinek. 1994. Satistical Methods for Speech Recognition. MIT Press.Google ScholarGoogle Scholar
  14. L. Likforman-Sulem and A. Vinciarelli. 2008. HMM-based offline recognition of handwritten words crossed out with different kinds of strokes. In Proceedings of International Conference on Frontiers in Handwriting Recognition. 70--75.Google ScholarGoogle Scholar
  15. R. Plamondon and S. N. Srihari. 2000. On-line and off-line handwriting recognition: A comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22, 1 (Jan. 2000), 63--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Rabiner. 1989. A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77, 2, 257--286. Google ScholarGoogle Scholar
  17. A. I. Rusu, A. Thomas, and V. Govindaraju. 2010. Generation and use of handwritten CAPTCHAs. Int. J. Doc. Anal. Recogn. 13, 1 (Jan. 2010), 49--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Shridhar, G. F. Houle, and F. Kimura. 2007. Comprehensive check image reader. In Proceedings of International Conference on Computing: Theory and Applications. 407--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Shridhar, G. F. Houle, and F. Kimura. 2009. Document recognition strategies for bank cheques. In Proceedings of International Conference on Electro/Information Technology. 170--173. Google ScholarGoogle ScholarCross RefCross Ref
  20. D. Tuganbaev and D. Deriaguine. 2013. Method of stricken-out character recognition in handwritten text. (Jun. 2013). Patent No. U.S. 8472719 B2, filed Jan 22, 2003, issued Jun 25, 2013.Google ScholarGoogle Scholar
  21. K. Y. Wang, R. G. Casey, and F. M. Wahl. 1982. Document analysis system. IBM J. Res. Dev. 26, 647--656. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. X. Wang, M. Shilman, and S. Raghupathy. 2006. Parsing ink annotations on heterogeneous documents. In Proceedings of Eurographics Conference on Sketch-Based Interfaces and Modeling (SBM’06). 43--50.Google ScholarGoogle Scholar
  23. X. Wang and S. Raghupathy. 2007. Ink annotations and their anchoring in heterogeneous digital documents. In Proceedings of International Conference on Document Analysis and Recognition. 163--167. Google ScholarGoogle ScholarCross RefCross Ref
  24. S. J. Young et al. 1995. The HTK Hidden Markov Model Toolkit Book. Entropic Cambridge Research Laboratory.Google ScholarGoogle Scholar
  25. B. Zhu and M. Nakagawa. 2014. A robust method for coarse classifier construction from a large number of basic recognizers for on-line handwritten Chinese/Japanese character recognition. Pattern Recogn. 47, 2 (Feb. 2014), 685--693. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cleaning of Online Bangla Free-form Handwritten Text

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!