Abstract
In the normal free-form handwritten text, repetition (repeated writing of the same stroke several times in the same place), over-writing, and crossing out are very common. In this article, we call the presence of these three types of writing as “noise.” Cleaning to extract useful text from such types of noisy text is an important task for robust recognition. To the best of our knowledge, no work has been reported on cleaning of such noise from online text in any scripts and hence, in this article, we propose an automatic text-cleaning approach for online handwriting recognition. Here, at first, crossing out noise with straight strike-through lines is detected using the straightness criteria of online strokes. Next, regions containing repetition, over-writing, and other types of crossing out are located using the positional information of the overlapping strokes. Stroke density, self-intersections of strokes etc. are computed from the strokes of located regions to predict the type of noise and this type of information is used as follows for their cleaning. For cleaning of crossing outs, all strokes of the crossing-out region are removed. For cleaning repetition and over-writing, strokes written earlier are removed, keeping the latest strokes. Finally, delayed strokes are properly arranged and word is passed to online recognizer. Though recognition of free-form handwriting is quite difficult, in this attempt, we obtained up to 70.71% improvement in word-recognition accuracy after noise cleaning.
- C. Adak and B. B. Chaudhuri. 2014. An approach of strike-through text identification from handwritten documents. In Proceedings of International Conference on Frontiers in Handwriting Recognition. 643--648. Google Scholar
Cross Ref
- A. Bharath and S. Madhvanath. 2012. HMM-based lexicon-driven and lexicon-free word recognition for online handwritten Indic scripts. IEEE Trans. Pattern Anal. Mach. Intell. 34, 4 (Apr. 2012), 670--682. Google Scholar
Digital Library
- N. Bhattacharya and U. Pal. 2012. Stroke segmentation and recognition from Bangla online handwritten text. In Proceedings of International Conference on Frontiers in Handwriting Recognition. 740--745. Google Scholar
Digital Library
- N. Bhattacharya, U. Pal, and F. Kimura. 2013. A system for Bangla online handwritten text. In Proceedings of International Conference on Document Analysis and Recognition, 1335--1339. Google Scholar
Digital Library
- N. Bhattacharya, V. Frinken, U. Pal, and P. P. Roy. 2015. Overwriting repetition and crossing-out detection in online handwritten text. In Proceedings of Asian Conference on Pattern Recognition. 680--684. Google Scholar
Cross Ref
- U. Bhattacharya, A. Nigam, Y. S. Rawat, and S. K. Parui. 2008. An analytic scheme for online handwritten Bangla cursive word recognition. In Proceedings of International Conference on Frontiers in Handwriting Recognition, 320--325.Google Scholar
- C. Bishop. 2006. Pattern Recognition and Machine Learning. Springer Verlag.Google Scholar
Digital Library
- B. B. Chaudhuri and C. Adak. An approach for detecting and cleaning of struck-out handwritten text. Pattern Recogn. Retrieved from http://dx.doi.org/10.1016/j.patcog.2016.07.032. Google Scholar
Digital Library
- V. Frinken, N. Bhattacharya, and U. Pal. 2014. Design of unsupervised feature extraction system for on-line Bangla handwriting recognition. In Proceedings of International Workshop on Document Analysis Systems. 355--359. Google Scholar
Digital Library
- V. Frinken, N. Bhattacharya, S. Uchida, and U. Pal. 2014. Improved BLSTM neural networks for recognition of on-line Bangla complex words. Proceedings of the Joint International Workshops on Statistical Techniques in Pattern Recognition and Structural and Syntactic Pattern Recognition, Lecture Notes in Computer Science. Springer, 404--413. Google Scholar
Digital Library
- Y. Hao, B. Zhu, and M. Nakagawa. 2016. A line-direction-free and character-orientation-free on-line handwritten Japanese text recognition system. IEICE Trans. Info. Syst. E99-D, 1, 197--207.Google Scholar
Cross Ref
- S. Jaeger, S. Manke, J. Reichert, and A. Waibel. 2001. Online handwriting recognition: The NPen++ recognizer. Int. J. Doc. Anal. Recogn. 3, 169--180. Google Scholar
Cross Ref
- F. Jelinek. 1994. Satistical Methods for Speech Recognition. MIT Press.Google Scholar
- L. Likforman-Sulem and A. Vinciarelli. 2008. HMM-based offline recognition of handwritten words crossed out with different kinds of strokes. In Proceedings of International Conference on Frontiers in Handwriting Recognition. 70--75.Google Scholar
- R. Plamondon and S. N. Srihari. 2000. On-line and off-line handwriting recognition: A comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22, 1 (Jan. 2000), 63--84. Google Scholar
Digital Library
- L. Rabiner. 1989. A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77, 2, 257--286. Google Scholar
- A. I. Rusu, A. Thomas, and V. Govindaraju. 2010. Generation and use of handwritten CAPTCHAs. Int. J. Doc. Anal. Recogn. 13, 1 (Jan. 2010), 49--64. Google Scholar
Digital Library
- M. Shridhar, G. F. Houle, and F. Kimura. 2007. Comprehensive check image reader. In Proceedings of International Conference on Computing: Theory and Applications. 407--416. Google Scholar
Digital Library
- M. Shridhar, G. F. Houle, and F. Kimura. 2009. Document recognition strategies for bank cheques. In Proceedings of International Conference on Electro/Information Technology. 170--173. Google Scholar
Cross Ref
- D. Tuganbaev and D. Deriaguine. 2013. Method of stricken-out character recognition in handwritten text. (Jun. 2013). Patent No. U.S. 8472719 B2, filed Jan 22, 2003, issued Jun 25, 2013.Google Scholar
- K. Y. Wang, R. G. Casey, and F. M. Wahl. 1982. Document analysis system. IBM J. Res. Dev. 26, 647--656. Google Scholar
Digital Library
- X. Wang, M. Shilman, and S. Raghupathy. 2006. Parsing ink annotations on heterogeneous documents. In Proceedings of Eurographics Conference on Sketch-Based Interfaces and Modeling (SBM’06). 43--50.Google Scholar
- X. Wang and S. Raghupathy. 2007. Ink annotations and their anchoring in heterogeneous digital documents. In Proceedings of International Conference on Document Analysis and Recognition. 163--167. Google Scholar
Cross Ref
- S. J. Young et al. 1995. The HTK Hidden Markov Model Toolkit Book. Entropic Cambridge Research Laboratory.Google Scholar
- B. Zhu and M. Nakagawa. 2014. A robust method for coarse classifier construction from a large number of basic recognizers for on-line handwritten Chinese/Japanese character recognition. Pattern Recogn. 47, 2 (Feb. 2014), 685--693. Google Scholar
Digital Library
Index Terms
Cleaning of Online Bangla Free-form Handwritten Text
Recommendations
Stroke Segmentation and Recognition from Bangla Online Handwritten Text
ICFHR '12: Proceedings of the 2012 International Conference on Frontiers in Handwriting RecognitionThis paper deals with recognition of online handwritten Bangla (Bengali) text. Here, at first, we segment cursive words into strokes. A stroke may represent a character or a part of a character. We selected a set of Bangla words written by different ...
A System for Bangla Online Handwritten Text
ICDAR '13: Proceedings of the 2013 12th International Conference on Document Analysis and RecognitionRecognition of Bangla compound characters has rarely got attention from researchers. This paper deals with segmentation and recognition of online handwritten Bangla cursive text containing basic and compound characters and all types of modifiers. Here, ...
Bangla and English City Name Recognition for Indian Postal Automation
ICPR '10: Proceedings of the 2010 20th International Conference on Pattern RecognitionBecause of multi-lingual behavior destination address block of a postal document of an Indian state may be written in two or more scripts. From a statistical analysis of Indian postal document we noted that about 22.04% of Indian postal documents are ...






Comments