skip to main content
research-article

BYANJON: A Ground Truth Preparation System for Online Handwritten Bangla Documents

Authors Info & Claims
Published:12 August 2021Publication History
Skip Abstract Section

Abstract

The work reported in this article deals with the ground truth generation scheme for online handwritten Bangla documents at text-line, word, and stroke levels. The aim of the proposed scheme is twofold: firstly, to build a document level database so that future researchers can use the database to do research in this field. Secondly, the ground truth information will help other researchers to evaluate the performance of their algorithms developed for text-line extraction, word extraction, word segmentation, stroke recognition, and word recognition. The reported ground truth generation scheme starts with text-line extraction from the online handwritten Bangla documents, then words extraction from the text-lines, and finally segmentation of those words into basic strokes. After word segmentation, the basic strokes are assigned appropriate class labels by using modified distance-based feature extraction procedure and the MLP (Multi-layer Perceptron) classifier. The Unicode for the words are then generated from the sequence of stroke labels. XML files are used to store the stroke, word, and text-line levels ground truth information for the corresponding documents. The proposed system is semi-automatic and each step such as text-line extraction, word extraction, word segmentation, and stroke recognition has been implemented by using different algorithms. Thus, the proposed ground truth generation procedure minimizes huge manual intervention by reducing the number of mouse clicks required to extract text-lines, words from the document, and segment the words into basic strokes. The integrated stroke recognition module also helps to minimize the manual labor needed to assign appropriate stroke labels. The freely available and can be accessed at https://byanjon.herokuapp.com/.

References

  1. S. M. Obaidullah, C. Halder, and K. C. Santosh. 2018. PHDIndic_11: Page-level handwritten document image dataset of 11 official Indic scripts for script identification. Multimedia Tools and Applications 77 (2018), 1643–1678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. K. Singh, R. Sarkar, and N. Das. 2018. Benchmark databases of handwritten Bangla-Roman and Devanagari-Roman mixed-script document images. Multimedia Tools and Applications 77 (2018), 8441–8473. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Ghosh, S. Shanu, S. Ranjan, and K. Kumari. 2019. An approach based on classifier combination for online handwritten text and non-text classification in Devanagari script. Sadhana 44, 8 (2019), 1–8.Google ScholarGoogle ScholarCross RefCross Ref
  4. Z. Xie, Z. Sun, L. Jin, H. Ni, and T. Lyons. 2017. Learning spatial-semantic context with fully convolutional recurrent network for online handwritten chinese text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 8 (2017), 1903–1917.Google ScholarGoogle ScholarCross RefCross Ref
  5. Y. C. Wu, F. Yin, and C. L. Liu. 2017. Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models. Pattern Recognition 65 (2017), 251–264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. I. Ahmad, R. Leonard, G. A. Fink, and A. S. Mahmoud. 2013. Novel sub-character HMM models for Arabic text recognition. In International Conference on Document Analysis and Recognition, 658–662. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Irfan, G. A. Fink, and S. A. Mahmoud. 2014. Improvements in sub-character HMM model based Arabic text recognition. In International Conference on Frontiers in Handwriting Recognition, 537–542.Google ScholarGoogle Scholar
  8. S. Sen, M. Mitra, S. Chowdhury, R. Sarkar, and K. Roy. 2016. Quad-tree based Image segmentation and feature extraction to recognize online handwritten Bangla characters. In 7th IAPR TC3 Workshop on Artificial Neural Networks in Pattern Recognition, 246–256.Google ScholarGoogle Scholar
  9. S. Sen, D. Shaoo, M. Mitra, R. Sarkar, and K. Roy. 2018. DFA based online Bangla character recognition. In International Conference on Information Technology & Applied Mathematics, 175–183.Google ScholarGoogle Scholar
  10. U. Bhattacharya, B. K. Gupta, and S. K. Parui. 2007. Direction code based features for recognition of online handwritten characters of Bangla. In International Conference on Document Analysis and Recognition, 58–62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Roy. 2012. Stroke-database design for online handwriting recognition in Bangla. In International Journal of Modern Engineering Research, 2534–2540.Google ScholarGoogle Scholar
  12. S. Sen, A. Bhattacharyya, R. Sarkar, K. Roy, and D. Doermann. 2018. Application of structural and topological features to recognize online handwritten Bangla characters. Transaction of Asian Low Resource Language Information Processing 17 (2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. U. Bhattacharya, R. Plamondon, S. Dutta Chowdhury, P. Goyal, and S. K. Parui. 2017. A sigma-lognormal model-based approach to generating large synthetic online handwriting sample databases. In International Journal on Document Analysis and Recognition, 1–17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Ghosh, C. Vamsi, and P. Kumar. 2018. RNN based online handwritten word recognition in Devanagari and Bengali scripts using horizontal zoning. Pattern Recognition 92 (2018), 203–218.Google ScholarGoogle ScholarCross RefCross Ref
  15. S. Sen, S. Chowdhury, M. Mitra, F. Schwenker, R. Sarkar, and K. Roy. 2018. A novel segmentation technique for online handwritten Bangla words. Pattern Recognition Letters 139 (2018), 26–33.Google ScholarGoogle ScholarCross RefCross Ref
  16. G. A. Fink, S. Vajda, U. Bhattacharya, S. K. Parui, and B. B. Chaudhuri. 2010. Online Bangla word recognition using sub-stroke level features and hidden Markov models. In International Conference on Frontiers in Handwriting Recognition, 393–398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. U. Bhattacharya, A. Nigam, Y. S. Rawat, and S. K. Parui. 2008. An analytic scheme for online handwritten Bangla cursive word recognition. In International Conference on Frontiers in Handwriting Recognition, 320–325.Google ScholarGoogle Scholar
  18. S. Mohiuddin, U. Bhattacharya, and S. K. Parui. 2011. Unconstrained Bangla online handwriting recognition based on MLP and SVM. In Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data, 16 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Chowdhury, U. Garai, and T. Chattopadhyay. 2011. A weighted finite-state transducer (WFST)-based language model for online Indic script handwriting recognition. In International Conference on Document Analysis and Recognition, 599–602. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. N. Bhattacharya and U. Pal. 2012. Stroke segmentation and recognition from Bangla online handwritten text. In International Conference on Frontiers in Handwriting Recognition, 740–745. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. N. Bhattacharya, U. Pal, and K. Roy. 2011. Individual character segmentation from single stroke of Bangla online handwritten text. International Journal of Machine Intelligence 3 (2011), 980–984.Google ScholarGoogle Scholar
  22. E. Indermühle, M. Liwicki, and H. Bunke. 2010. IAMonDo-database: An online handwritten document database with non-uniform contents. In IAPR International Workshop on Document Analysis Systems, 97–104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Schenk, J. Lenz, and G. Rigoll. 2009. Novel script line identification method for script normalization and feature extraction in online handwritten whiteboard note recognition. Pattern Recognition 42, 12 (2009), 3383–3393. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. M. Namboodiri and A. K. Jain. 2004. Online handwritten script recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 1 (2004), 124–130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. I. Guyon, L. Schomaker, R. Plamondon, M. Liberman, and S. Janet. 1994. Unipen project of on-line data exchange and benchmarks. In Proceedings of IAPR International Conference on Pattern Recognition, 29–33.Google ScholarGoogle Scholar
  26. H. Singh, R. K. Sharma, R. Kumar, K. Verma, R. Kumar, and M. Kumar. 2019. A benchmark dataset of online handwritten Gurmukhi script words and numerals. In International Conference on Computer Vision and Image Processing, 457–466.Google ScholarGoogle Scholar
  27. B. Nethravathi, C. P. Archana, K. Shashikiran, A. G. Ramakrishnan, and V. Vijay Kumar. 2010. Creation of a huge annotated database for Tamil and Kannada OHR. In Proceedings of International Conference on Frontiers in Handwriting Recognition, 415–420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. U. Marti and H. Bunke. 1999. A full English sentence database for off-line handwriting recognition. In Proceedings of International Conference on Document Analysis and Recognition, 705–708. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Hindi and Bengali among top 10 most common languages in the world. (2013). Retrieved on 15 November, 2019 from https://timesofindia.indiatimes.com/world/uk/Hindi-and-Bengali-among-top-10-most-common-languages-in-the-world/articleshow/26104249.cms.Google ScholarGoogle Scholar

Index Terms

  1. BYANJON: A Ground Truth Preparation System for Online Handwritten Bangla Documents

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!