Abstract
In this article, we propose a new framework for segmentation of Bangla handwritten word images into meaningful individual symbols or pseudo-characters. Existing segmentation algorithms are not usually treated as a classification problem. However, in the present study, the segmentation algorithm is looked upon as a two-class supervised classification problem. The method employs an SVM classifier to select the segmentation points on the word image on the basis of various structural features. For training of the SVM classifier, an unannotated training set is prepared first using candidate segmenting points. The training set is then clustered, and each cluster is labeled manually with minimal manual intervention. A semi-automatic bootstrapping technique is also employed to enlarge the training set from new samples. The overall architecture describes a basic step toward building an annotation system for the segmentation problem, which has not so far been investigated. The experimental results show that our segmentation method is quite efficient in segmenting not only word images but also handwritten texts. As a part of this work, a database of Bangla handwritten word images has also been developed. Considering our data collection method and a statistical analysis of our lexicon set, we claim that the relevant characteristics of an ideal lexicon set are present in our handwritten word image database.
- R. Sabourin A. El-Yacoubi, M. Gilloux, and C. Y. Suen. 1999. An HMM-based approach for off-line unconstrained handwritten word modeling and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 21, 8 (1999), 752--760. Google Scholar
Digital Library
- Samy Bengio Alessandro Vinciarelli and Horst Bunke. 2004. Offline recognition of unconstrained handwritten texts using hmms and statistical language models. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 6 (2004), 709--720. Google Scholar
Digital Library
- M. Balestri and L. Masera. 1988. A system for isolating characters in cursive script. Signal Processing IV: Theories and Applications (1988), 845--846.Google Scholar
- S. Basu, R. Sankar, N. Das, M. Kundu, M. Nasipuri, and D. K. Basu. 2007. A fuzzy technique for segmentation of handwritten bangla word images. In Proceedings of the 7th International Conference on Computing: Theory and Application (ICCTA). IEEE, 367--371. Google Scholar
Digital Library
- U. Bhattacharya and B. B. Chaudhuri. 2005. Database for research on recognition of handwritten characters of indian scipts. In Proceedings of the 8th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 322--326. Google Scholar
Digital Library
- Ujjwal Bhattacharya, Malayappan Shridhar, Swapan K. Parui, P. K. Sen, and B. B. Chaudhuri. 2012. Offline recognition of handwritten Bangla characters: An efficient two-stage approach. Pattern Analysis Applications 15, 4 (2012), 445--458. Google Scholar
Digital Library
- Tapan Kumar Bhowmik, Pradip Ghanty, Anandarup Roy, and Swapan Kumar Parui. 2009. SVM-based hierarchical architectures for handwritten Bangla character recognition. International Journal on Document Analysis and Recognition (IJDAR) 12, 2 (2009), 97--108. Google Scholar
Digital Library
- T. K. Bhowmik, A. Roy, and U. Roy. 2005. Character segmentation for handwritten Bangla word recognition using artificial neural networks. In Proceedings of International Workshop on Neural Networks and Learning in Document Analysis and Recognition (NNLDAR). 28--32.Google Scholar
- Arijit Bishnu and B. B. Chaudhuri. 1999. Segmentation of Bangla handwritten text into characters by recursive contour following. In Proceedings of the 5th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 402--405. Google Scholar
Digital Library
- R. M. Bozinovic and S. N. Srihari. 1989. Off-line cursive script word recognition. IEEE Transactions on Pattern Analysis Machine Intelligence 11 (1989), 68--83. Google Scholar
Digital Library
- Marius Bulacu, Axel Brink, Tijn van der Zant, and Lambert Schomaker. 2009. Recognition of handwritten numerical fields in a large single-writer historical collection. In Proceedings of the 10th International Conference of Document Analysis and Recognition. IEEE, 808--812. Google Scholar
Digital Library
- R. G. Casey and E. Lecolinet. 1996. A survey of method and strategies in character segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 18, 7 (1996), 690--706. Google Scholar
Digital Library
- R. G. Casey and G. Nagy. 1982. Recursive segmentation and classification of composite character patterns. In Proceedings of the 6th International Conference on Pattern Recognition. 1023--1026.Google Scholar
- R. G. Casey and J. van Horne. 1992. Segmenting of touching characters in postal addresses. In U.S. Postal Service 5th Advanced Technical Conference, Vol. 3. Washington, DC, 743--745.Google Scholar
- B. B. Chaudhuri and S. Ghosh. 1998. A statistical study of Bangla corpus. In Proceedings of the International Conference on Computational Linguistics, Speech and Document Processing (ICCLSDP). CVPR Unit, Indian Statistical Institute, Kolkata, C32--C37.Google Scholar
- G. Kim and V. Govindaraju. 1997. A lexicon driven approach to handwritten word recognition for real-time applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 4 (1997), 366--379. Google Scholar
Digital Library
- F. Kimura, M. Shridhar, and Z. Chen. 1993a. Improvements of a lexicon directed algorithm for recognition of unconstrained handwritten words. In Proceedings of the 2nd International Conference on Document Analysis and Recognition. IEEE, 18--22.Google Scholar
- F. Kimura, M. Shridhar, and N. Narasimhamurthi. 1993b. Lexicon directed segmentation-recognition procedure for unconstrained handwritten words. In Proceedings of the 3rd International Workshop on Frontiers in Handwriting Recognition. 122--131.Google Scholar
- Alessandro L. Koerich, Robert Sabourin, and Ching Y. Suen. 2003. Lexicon-driven HMM decoding for large vocabulary handwriting recognition with multiple character models. International Journal of Document Analysis and Recognition (IJDAR) 6 (2003), 126--144. Google Scholar
Digital Library
- Y. Lu and M. Shridhar. 1996. Character segmentation in handwritten words - an overview. Pattern Recognition 29, 1 (1996), 77--96.Google Scholar
Cross Ref
- M. Maragoudakis, E. Kavallieratou, N. Fakotakis, and G. Kokkinakis. 2003. An effective stochastic estimation of handwritten character segmentation bounds. In Competitive Environment, Renewable Energy, Distributed Generation.Google Scholar
- C. R. Nohl, C. J. C. Burges, and J. I. Ben. 1992. Character-based handwritten address word recognition with lexicon. In Proceedings of the U.S. Postal Service 5th Advanced Technical Conference, Vol. 3. Washington, DC, 167--182.Google Scholar
- U. Pal and Sagarika Datta. 2003. Segmentation of Bangla unconstrained handwritten text. In Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 1128--1132. Google Scholar
Digital Library
- A. Roy, T. K. Bhowmik, S. K. Parui, and U. Roy. 2005. A novel approach to skew detection and character segmentation for handwritten Bangla words. In Proceedings of International Conference on Digital Image Computing: Techniques and Applications (DICTA). IEEE, 203--210. Google Scholar
Digital Library
- R. Sankar, N. Das, S. Basu, M. Kundu, M. Nasipuri, and D. K. Basu. 2009. A two-stage approach for segmentation of handwritten Bangla word images. In Proceedings of the 11th International Conference on Frontiers in Handwriting Recognition (ICFHR). 403--408.Google Scholar
- R. Sarkar, N. Das, S. Basu, M. Kundu, M. Nasipuri, and D. K. Basu. 2012. CMATERdb1: A database of unconstrained handwritten Bangla and Bangla-English mixed script document image. International Journal of Document Analysis and Recognition (IJDAR) 15 (2012), 71--83. Google Scholar
Digital Library
- Lambert Schomaker, Marius Bulacu, and Katrin Franke. 2004. Automatic writer identification using fragmented connected-component contours. In Proceedings of the 9th International Workshop on Frontiers in Handwriting Recognition (IWFHR). IEEE, 185--190. Google Scholar
Digital Library
- J. C. Simon. 1992. Off-line cursive word recognition. In Proceedings of the IEEE 80(7). IEEE, 1150--1161.Google Scholar
Cross Ref
- V. Vapnik. 1998. Statistical Learning Theory. Wiley, New York.Google Scholar
- Berrin Yanikoglu and Peter A. Sandon. 1998. Segmentation of off-line cursive handwriting using linear programming. Pattern Recognition 31, 12 (1998), 1825--1833.Google Scholar
Cross Ref
- M. L. Yu, P. C. K. Kwok, C. H. Leung, and K. W. Tse. 2001. Segmentation and recognition of Chinese bank check amounts. International Journal of Document Analysis and Recognition (IJDAR) 3 (2001), 207--217.Google Scholar
Cross Ref
Index Terms
Bangla Handwritten Character Segmentation Using Structural Features: A Supervised and Bootstrapping Approach
Recommendations
Segmentation-based recognition of handwritten touching pairs of digits using structural features
In this paper, we propose a segmentation-based recognition method of handwritten touching pairs of digits using structural features of contour. Four kinds of candidate break points are obtained from contour and six touching types are defined based on an ...
Segmentation of Printed Urdu Scripts Using Structural Features
VIZ '09: Proceedings of the 2009 Second International Conference in VisualisationCharacter segmentation forms the basis for optical character recognition. In this paper, we have proposed a character segmentation approach for printed Urdu script. Urdu is cursive by nature and its script is written from right to left. Both these ...
Accented Handwritten Character Recognition Using SVM - Application to French
ICFHR '10: Proceedings of the 2010 12th International Conference on Frontiers in Handwriting RecognitionThis paper deals with the problem of recognizing accented and non-accented characters in French handwriting. Accented characters increase the number of classes to be recognized. The performances of powerful classifier such as SVM are declined by the ...






Comments