skip to main content
research-article

Bangla Handwritten Character Segmentation Using Structural Features: A Supervised and Bootstrapping Approach

Authors Info & Claims
Published:12 April 2016Publication History
Skip Abstract Section

Abstract

In this article, we propose a new framework for segmentation of Bangla handwritten word images into meaningful individual symbols or pseudo-characters. Existing segmentation algorithms are not usually treated as a classification problem. However, in the present study, the segmentation algorithm is looked upon as a two-class supervised classification problem. The method employs an SVM classifier to select the segmentation points on the word image on the basis of various structural features. For training of the SVM classifier, an unannotated training set is prepared first using candidate segmenting points. The training set is then clustered, and each cluster is labeled manually with minimal manual intervention. A semi-automatic bootstrapping technique is also employed to enlarge the training set from new samples. The overall architecture describes a basic step toward building an annotation system for the segmentation problem, which has not so far been investigated. The experimental results show that our segmentation method is quite efficient in segmenting not only word images but also handwritten texts. As a part of this work, a database of Bangla handwritten word images has also been developed. Considering our data collection method and a statistical analysis of our lexicon set, we claim that the relevant characteristics of an ideal lexicon set are present in our handwritten word image database.

References

  1. R. Sabourin A. El-Yacoubi, M. Gilloux, and C. Y. Suen. 1999. An HMM-based approach for off-line unconstrained handwritten word modeling and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 21, 8 (1999), 752--760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Samy Bengio Alessandro Vinciarelli and Horst Bunke. 2004. Offline recognition of unconstrained handwritten texts using hmms and statistical language models. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 6 (2004), 709--720. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Balestri and L. Masera. 1988. A system for isolating characters in cursive script. Signal Processing IV: Theories and Applications (1988), 845--846.Google ScholarGoogle Scholar
  4. S. Basu, R. Sankar, N. Das, M. Kundu, M. Nasipuri, and D. K. Basu. 2007. A fuzzy technique for segmentation of handwritten bangla word images. In Proceedings of the 7th International Conference on Computing: Theory and Application (ICCTA). IEEE, 367--371. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. U. Bhattacharya and B. B. Chaudhuri. 2005. Database for research on recognition of handwritten characters of indian scipts. In Proceedings of the 8th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 322--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ujjwal Bhattacharya, Malayappan Shridhar, Swapan K. Parui, P. K. Sen, and B. B. Chaudhuri. 2012. Offline recognition of handwritten Bangla characters: An efficient two-stage approach. Pattern Analysis Applications 15, 4 (2012), 445--458. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Tapan Kumar Bhowmik, Pradip Ghanty, Anandarup Roy, and Swapan Kumar Parui. 2009. SVM-based hierarchical architectures for handwritten Bangla character recognition. International Journal on Document Analysis and Recognition (IJDAR) 12, 2 (2009), 97--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. K. Bhowmik, A. Roy, and U. Roy. 2005. Character segmentation for handwritten Bangla word recognition using artificial neural networks. In Proceedings of International Workshop on Neural Networks and Learning in Document Analysis and Recognition (NNLDAR). 28--32.Google ScholarGoogle Scholar
  9. Arijit Bishnu and B. B. Chaudhuri. 1999. Segmentation of Bangla handwritten text into characters by recursive contour following. In Proceedings of the 5th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 402--405. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. M. Bozinovic and S. N. Srihari. 1989. Off-line cursive script word recognition. IEEE Transactions on Pattern Analysis Machine Intelligence 11 (1989), 68--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Marius Bulacu, Axel Brink, Tijn van der Zant, and Lambert Schomaker. 2009. Recognition of handwritten numerical fields in a large single-writer historical collection. In Proceedings of the 10th International Conference of Document Analysis and Recognition. IEEE, 808--812. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. G. Casey and E. Lecolinet. 1996. A survey of method and strategies in character segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 18, 7 (1996), 690--706. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. G. Casey and G. Nagy. 1982. Recursive segmentation and classification of composite character patterns. In Proceedings of the 6th International Conference on Pattern Recognition. 1023--1026.Google ScholarGoogle Scholar
  14. R. G. Casey and J. van Horne. 1992. Segmenting of touching characters in postal addresses. In U.S. Postal Service 5th Advanced Technical Conference, Vol. 3. Washington, DC, 743--745.Google ScholarGoogle Scholar
  15. B. B. Chaudhuri and S. Ghosh. 1998. A statistical study of Bangla corpus. In Proceedings of the International Conference on Computational Linguistics, Speech and Document Processing (ICCLSDP). CVPR Unit, Indian Statistical Institute, Kolkata, C32--C37.Google ScholarGoogle Scholar
  16. G. Kim and V. Govindaraju. 1997. A lexicon driven approach to handwritten word recognition for real-time applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 4 (1997), 366--379. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. Kimura, M. Shridhar, and Z. Chen. 1993a. Improvements of a lexicon directed algorithm for recognition of unconstrained handwritten words. In Proceedings of the 2nd International Conference on Document Analysis and Recognition. IEEE, 18--22.Google ScholarGoogle Scholar
  18. F. Kimura, M. Shridhar, and N. Narasimhamurthi. 1993b. Lexicon directed segmentation-recognition procedure for unconstrained handwritten words. In Proceedings of the 3rd International Workshop on Frontiers in Handwriting Recognition. 122--131.Google ScholarGoogle Scholar
  19. Alessandro L. Koerich, Robert Sabourin, and Ching Y. Suen. 2003. Lexicon-driven HMM decoding for large vocabulary handwriting recognition with multiple character models. International Journal of Document Analysis and Recognition (IJDAR) 6 (2003), 126--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Lu and M. Shridhar. 1996. Character segmentation in handwritten words - an overview. Pattern Recognition 29, 1 (1996), 77--96.Google ScholarGoogle ScholarCross RefCross Ref
  21. M. Maragoudakis, E. Kavallieratou, N. Fakotakis, and G. Kokkinakis. 2003. An effective stochastic estimation of handwritten character segmentation bounds. In Competitive Environment, Renewable Energy, Distributed Generation.Google ScholarGoogle Scholar
  22. C. R. Nohl, C. J. C. Burges, and J. I. Ben. 1992. Character-based handwritten address word recognition with lexicon. In Proceedings of the U.S. Postal Service 5th Advanced Technical Conference, Vol. 3. Washington, DC, 167--182.Google ScholarGoogle Scholar
  23. U. Pal and Sagarika Datta. 2003. Segmentation of Bangla unconstrained handwritten text. In Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 1128--1132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Roy, T. K. Bhowmik, S. K. Parui, and U. Roy. 2005. A novel approach to skew detection and character segmentation for handwritten Bangla words. In Proceedings of International Conference on Digital Image Computing: Techniques and Applications (DICTA). IEEE, 203--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. Sankar, N. Das, S. Basu, M. Kundu, M. Nasipuri, and D. K. Basu. 2009. A two-stage approach for segmentation of handwritten Bangla word images. In Proceedings of the 11th International Conference on Frontiers in Handwriting Recognition (ICFHR). 403--408.Google ScholarGoogle Scholar
  26. R. Sarkar, N. Das, S. Basu, M. Kundu, M. Nasipuri, and D. K. Basu. 2012. CMATERdb1: A database of unconstrained handwritten Bangla and Bangla-English mixed script document image. International Journal of Document Analysis and Recognition (IJDAR) 15 (2012), 71--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Lambert Schomaker, Marius Bulacu, and Katrin Franke. 2004. Automatic writer identification using fragmented connected-component contours. In Proceedings of the 9th International Workshop on Frontiers in Handwriting Recognition (IWFHR). IEEE, 185--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. C. Simon. 1992. Off-line cursive word recognition. In Proceedings of the IEEE 80(7). IEEE, 1150--1161.Google ScholarGoogle ScholarCross RefCross Ref
  29. V. Vapnik. 1998. Statistical Learning Theory. Wiley, New York.Google ScholarGoogle Scholar
  30. Berrin Yanikoglu and Peter A. Sandon. 1998. Segmentation of off-line cursive handwriting using linear programming. Pattern Recognition 31, 12 (1998), 1825--1833.Google ScholarGoogle ScholarCross RefCross Ref
  31. M. L. Yu, P. C. K. Kwok, C. H. Leung, and K. W. Tse. 2001. Segmentation and recognition of Chinese bank check amounts. International Journal of Document Analysis and Recognition (IJDAR) 3 (2001), 207--217.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Bangla Handwritten Character Segmentation Using Structural Features: A Supervised and Bootstrapping Approach

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!