Abstract
Handwritten Arabic character recognition systems face several challenges, including the unlimited variation in human handwriting and the unavailability of large public databases of handwritten characters and words. The use of synthetic data for training and testing handwritten character recognition systems is one of the possible solutions to provide several variations for these characters and to overcome the lack of large databases. While this can be using arbitrary distortions, such as image noise and randomized affine transformations, such distortions are not realistic. In this work, we model real distortions in handwriting using real handwritten Arabic character examples and then use these distortion models to synthesize handwritten examples that are more realistic. We show that the use of our proposed approach leads to significant improvements across different machine-learning classification algorithms.
- B. Al-Badr and S. A. Mahmoud. 1995. Survey and bibliography of arabic optical text recognition. Signal Processing 41, 1 (1995), 49--77. Google Scholar
Digital Library
- R. A.-H. Mohamad, L. Likforman-Sulem, and C. Mokbel. 2009. Combining slanted-frame classifiers for improved HMM-based Arabic handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 7 (2009), 1165--1177. Google Scholar
Digital Library
- Y. Al-Ohali, M. Cheriet, and C. Suen. 2003. Databases for recognition of handwritten Arabic cheques. Pattern Recognition 36, 1 (2003), 111--121.Google Scholar
Cross Ref
- A. Amin. 1998. Off-line Arabic character recognition: The state of the art. Pattern Recognition 31 (1998), 513--530. DOI:http://dx.doi.org/science/article/B6V14-3WH50NV-3/2/bdea5cf6fdb37081d189f011a8110a06Google Scholar
Cross Ref
- L. Dinges, A. Al-Hamadi, and M. Elzobi. 2013. An approach for Arabic handwriting synthesis based on active shape models. In 2013 12th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 1260--1264. Google Scholar
Digital Library
- Y. S. Elarian, H. A. Al-Muhsateb, and L. M. Ghouti. 2011. Arabic handwriting synthesis. In 1st International Workshop on Frontiers in Arabic Handwriting Recognition.Google Scholar
- A. Graves and J. Schmidhuber. 2009. Offline handwriting recognition with multidimensional recurrent neural networks. In Advances in Neural Information Processing Systems. 545--552.Google Scholar
- T. M. Ha and H. Bunke. 1997. Off-line, handwritten numeral recognition by perturbation method. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 5 (1997), 535--539. Google Scholar
Digital Library
- N. Habash and R. M. Roth. 2011. Using deep morphology to improve automatic error detection in Arabic handwriting recognition. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics. 875--884. Google Scholar
Digital Library
- M. Hamdani, H. El Abed, M. Kherallah, and A. M. Alimi. 2009. Combining multiple HMMs using on-line and off-line features for off-line Arabic handwriting recognition. In 10th International Conference on Document Analysis and Recognition, 2009 (ICDAR’09). IEEE, 201--205. Google Scholar
Digital Library
- T. Kanungo. 1996. Document Degradation Models and a Methodology for Degradation Model Validation. Ph.D. Dissertation. University of Washington. Google Scholar
Digital Library
- N. Kharma, M. Ahmed, and R. Ward. 1999. A new comprehensive database of handwritten Arabic words, numbers, and signatures used for OCR testing. 1999 IEEE Canadian Conference on Electrical and Computer Engineering 2 (1999).Google Scholar
- M. Z. Khedher, G. A. Abandah, and A. M. Al-Khawaldeh. 2005. Optimizing feature selection for recognizing handwritten Arabic characters. In The 2nd World Enformatika Conference, 2005 (WEC’05).Google Scholar
- E. G. Learned-Miller. 2006. Data driven image models through continuous joint alignment. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 2 (2006), 236--250. Google Scholar
Digital Library
- S. A. Mahmoud and M. H. Abu-Amara. 2010. Recognition of handwritten Arabic (Indian) numerals using radon-fourier-based features. Recent Advances in Signal Processing, Robotics and Automation (2010), 158--163. Google Scholar
Digital Library
- S. A. Mahmoud and S. M. Awaida. 2009. Recognition of off-line handwritten Arabic (Indian) numerals using multi-scale features and support vector machines vs. hidden markov models. Arabian Journal for Science and Engineering 34, 2B (2009), 429--444.Google Scholar
- V. Margner and M. Pechwitz. 2001. Synthetic data for Arabic OCR system development. In Proceedings of the 6th International Conference on Document Analysis and Recognition (2001). Google Scholar
Digital Library
- E. G. Miller. 2002. Learning from One Example in Machine Vision by Sharing Probability Densities. Ph.D. Dissertation. Massachusetts Institute of Technology. Google Scholar
Digital Library
- H. Miyao and M. Maruyama. 2006. Virtual example synthesis based on PCA for off-line handwritten character recognition. Lecture Notes in Computer Science 3872 (2006), 96. Google Scholar
Digital Library
- M. Pechwitz, S. S. Maddouri, V. Märgner, N. Ellouze, and H. Amiri. 2002. IFN/ENIT-database of handwritten Arabic words. In Proceedings of of CIFED, Vol. 2. Citeseer, 127--136.Google Scholar
- A. Sahloul and C. Suen. 2014. OFF-line system for the recognition of handwritten arabic character. Fourth International Conference on Computer Science & Information Technology. 227--244.Google Scholar
- S. Saleem, H. Cao, K. Subramanian, M. Kamali, R. Prasad, and P. Natarajan. 2009. Improvements in BBN’s HMM-based offline Arabic handwriting recognition system. In 10th International Conference on Document Analysis and Recognition, 2009 (ICDAR’09). IEEE, 773--777. Google Scholar
Digital Library
- M. Shatnawi. 2015. Offline handwritten Arabic character recognition: A survey. In 2015 International Conference on Image Processing, Computer Vision, and Pattern Recognition, 2015 (IPCV’15). The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp). 52--58.Google Scholar
- M. T. Parvez and S. A. Mahmoud. 2013. Arabic handwriting recognition using structural and syntactic pattern attributes. Pattern Recognition 46, 1 (Jan. 2013), 141--154. DOI:http://dx.doi.org/10.1016/j.patcog.2012.07.012 Google Scholar
Digital Library
- N. Tomeh, N. Habash, R. Roth, N. Farra, P. Dasigi, and M. T. Diab. 2013. Reranking with linguistic and semantic features for arabic optical character recognition. In ACL (2). 549--555.Google Scholar
- T. Wakahara, Y. Kimura, and A. Tomono. 2001. Affine-invariant recognition of gray-scale characters using globalaffine transformation correlation. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 4 (2001), 384--395. Google Scholar
Digital Library
- T. Wakahara and K. Odaka. 1998. Adaptive normalization of handwritten characters using global/localaffine transformation. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 12 (1998), 1332--1341. Google Scholar
Digital Library
- N. Zaki, S. Wolfsheimer, G. Nuel, and S. Khuri. 2011. Conotoxin protein classification using free scores of words and support vector machines. BMC Bioinformatics 12, 1 (2011), 217.Google Scholar
Cross Ref
- N. M. Zaki, S. Deris, and R. M. Illias. 2004. Features extraction for protein homology detection using hidden markov models combining scores. International Journal of Computational Intelligence and Applications 4, 01 (2004), 1--12.Google Scholar
Cross Ref
Index Terms
Improving Handwritten Arabic Character Recognition by Modeling Human Handwriting Distortions
Recommendations
Recognising handwritten Arabic manuscripts using a single hidden Markov model
This paper presents a new method on off-line recognition of handwritten Arabic script. The method does not require segmentation into characters, and is applied to cursive Arabic script, where ligatures, overlaps and style variation pose challenges to ...
Semantic-Based Handwritten Chinese Character Recognition Model
ICCMS '10: Proceedings of the 2010 Second International Conference on Computer Modeling and Simulation - Volume 01There have been many different literals discussing algorithms for handwritten Chinese character recognition, but most algorithms aim at recognizing isolated Chinese character one by one. Therefore, their recognition accuracy isn’t good enough for the ...
Arabic Character Recognition: Progress and Challenges
An optical character recognition (OCR) system may provide a solution to the data entry problems, a bottleneck for the data processing industry. Therefore, OCR systems are being developed for almost all major languages and Arabic language is no exception ...






Comments