skip to main content
research-article

QEST: Quantized and Efficient Scene Text Detector Using Deep Learning

Authors Info & Claims
Published:08 May 2023Publication History
Skip Abstract Section

Abstract

Scene text detection is complicated and one of the most challenging tasks due to different environmental restrictions, such as illuminations, lighting conditions, tiny and curved texts, and many more. Most of the works on scene text detection have overlooked the primary goal of increasing model accuracy and efficiency, resulting in heavy-weight models that require more processing resources. A novel lightweight model has been developed in this article to improve the accuracy and efficiency of scene text detection. The proposed model relies on ResNet50 and MobileNetV2 as backbones with quantization used to make the resulting model lightweight. During quantization, the precision has been changed from float32 to float16 and int8 for making the model lightweight. In terms of inference time and Floating-Point Operations Per Second, the proposed method outperforms the state-of-the-art techniques by around 30–100 times. Here, well-known datasets, i.e., ICDAR2015 and ICDAR2019, have been utilized for training and testing to validate the performance of the proposed model. Finally, the findings and discussion indicate that the proposed model is more efficient than the existing schemes.

REFERENCES

  1. [1] Long Jonathan, Shelhamer Evan, and Darrell Trevor. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 34313440.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Liu Wei, Anguelov Dragomir, Erhan Dumitru, Szegedy Christian, Reed Scott, Fu Cheng-Yang, and Berg Alexander C.. 2016. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision. Springer, 2137.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Ren Shaoqing, He Kaiming, Girshick Ross, and Sun Jian. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Retrieved from https://arXiv:1506.01497.Google ScholarGoogle Scholar
  4. [4] Manjari Kanak, Verma Madhushi, and Singal Gaurav. 2020. A survey on assistive technology for visually impaired. Elsevier Internet Things 11 (2020), 100188.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Hasan Y. M. Y. and Karam L. J.. 2000. Morphological text extraction from images. IEEE Trans. Image Process. 9, 11 (2000), 19781983.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Zhu Yixing and Du Jun. 2021. Textmountain: Accurate scene text detection via instance segmentation. Elsevier Pattern Recogn. 110 (2021), 107336.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Revanasiddappa M. B. and Harish B. S.. 2018. A new feature selection method based on intuitionistic fuzzy entropy to categorize text documents. Int. J. Interact. Multimedia Artific. Intell. 5, 3 (2018), 106117.Google ScholarGoogle Scholar
  8. [8] Shi Weisong, Cao Jie, Zhang Quan, Li Youhuizi, and Xu Lanyu. 2016. Edge computing: Vision and challenges. IEEE Internet Things J. 3, 5 (2016), 637646.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Namasudra Suyel and Deka Ganesh Chandra. 2021. Applications of Blockchain in Healthcare. Springer.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Subedi Bharat, Yunusov Jahongir, Gaybulayev Abdulaziz, and Kim Tae-Hyong. 2020. Development of a low-cost industrial OCR system with an end-to-end deep learning technology. IEEK J. Embed. Syst. Appl. 15, 2 (2020), 5160.Google ScholarGoogle Scholar
  11. [11] Foong Oi-Mean, Sulaiman Suziah, and Ling Kiing Kiu. 2013. Text signage recognition in Android mobile devices. Citeseer J. Comput. Sci. 9, 12 (2013), 1793.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Hong Sanghoon, Roh Byungseok, Kim Kye-Hyeon, Cheon Yeongjae, and Park Minje. 2016. PVANet: Lightweight deep neural networks for real-time object detection. In Proceedings of the 1st International Workshop on Efficient Methods for Deep Neural Networks (EMDNN’16).Google ScholarGoogle Scholar
  13. [13] Zhou Xinyu, Yao Cong, Wen He, Wang Yuzhi, Zhou Shuchang, He Weiran, and Liang Jiajun. 2017. East: An efficient and accurate scene text detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 55515560.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Ruan Shaohui, Lu Junguo, Xie Fengming, and Jin Zhongxiao. 2018. A novel method for fast arbitrary-oriented scene text detection. In Proceedings of the Chinese Control And Decision Conference (CCDC’18). IEEE, 16521657.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Shi Baoguang, Bai Xiang, and Belongie Serge. 2017. Detecting oriented text in natural images by linking segments. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 25502558.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Namasudra Suyel. 2021. Data access control in the cloud computing environment for bioinformatics. Int. J. Appl. Res. Bioinform. 11, 1 (2021), 4050.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Namasudra Suyel, Deka Ganesh Chandra, Johri Prashant, Hosseinpour Mohammad, and Gandomi Amir H.. 2021. The revolution of blockchain: State-of-the-art and research challenges. Arch. Comput. Methods Eng. 28, 3 (2021), 14971515.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Namasudra Suyel. 2020. Fast and secure data accessing by using DNA computing for the cloud environment. IEEE Trans. Serv. Comput. (2020), 1–1. DOI: 10.1109/TSC.2020.3046471Google ScholarGoogle Scholar
  20. [20] Manjari Kanak, Verma Madhushi, and Singal Gaurav. 2019. A travel aid for visually impaired: R-Cane. In Proceedings of the International Conference on Smart City and Informatization. Springer, 404417.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Manjari Kanak, Verma Madhushi, and Singal Gaurav. 2019. CREATION: Computational ConstRained travel aid for object detection in outdoor eNvironment. In Proceedings of the 15th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS’19). IEEE, 247254.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Jacob Benoit, Kligys Skirmantas, Chen Bo, Zhu Menglong, Tang Matthew, Howard Andrew, Adam Hartwig, and Kalenichenko Dmitry. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 27042713.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Traldi Lorenzo. 2007. Generalized dice: Many questions and a few answers. Graph Theory Notes New York 53 (2007), 3942.Google ScholarGoogle Scholar
  24. [24] Liu Xuebo, Liang Ding, Yan Shi, Chen Dagui, Qiao Yu, and Yan Junjie. 2018. Fots: Fast oriented text spotting with a unified network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 56765685.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Long Shangbang, Ruan Jiaqiang, Zhang Wenjie, He Xin, Wu Wenhao, and Yao Cong. 2018. Textsnake: A flexible representation for detecting text of arbitrary shapes. In Proceedings of the European Conference on Computer Vision (ECCV’18). Springer, 2036.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Liao Minghui, Shi Baoguang, and Bai Xiang. 2018. Textboxes++: A single-shot oriented scene text detector. IEEE Trans. Image Process. 27, 8 (2018), 36763690.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Juang Li-Hong, Wu Ming-Ni, and Lin Cian-Huei. 2020. Affective computing study of attention recognition for the 3D guide system. CAAI Trans. Intell. Technol. 5, 4 (2020), 260267.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Agrawal Divyansh, Minocha Sachin, Namasudra Suyel, and Kumar Sathish. 2021. Ensemble algorithm using transfer learning for sheep breed classification. In Proceedings of the IEEE 15th International Symposium on Applied Computational Intelligence and Informatics (SACI’21). IEEE, 199204.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Shao Yinan, Lin Jerry Chun-Wei, Srivastava Gautam, Jolfaei Alireza, Guo Dongdong, and Hu Yi. 2021. Self-attention-based conditional random fields latent variables model for sequence labeling. Pattern Recogn. Lett. 145 (2021), 157164.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Gupta A. and Namasudra S.. 2022. A novel technique for accelerating live migration in cloud computing. Auto. Softw. Eng. 29 (2022), 1–21. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Jiang Rui, Mou Xiaozheng, Shi Shunshun, Zhou Yueyin, Wang Qinyi, Dong Meng, and Chen Shoushun. 2020. Object tracking on event cameras with offline–online learning. CAAI Trans. Intell. Technol. 5, 3 (2020), 165171.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Jeon Minjun and Jeong Young-Seob. 2020. Compact and accurate scene text detector. Multidisc. Dig. Publish. Inst. Appl. Sci. 10, 6 (2020), 2096.Google ScholarGoogle Scholar
  33. [33] Ahmed Usman, Srivastava Gautam, Yun Unil, and Lin Jerry Chun-Wei. 2021. EANDC: An explainable attention network based deep adaptive clustering model for mental health treatments. Future Gen. Comput. Syst. 130 (2021), 106–113. DOI: 10.1016/j.future.2021.12.008Google ScholarGoogle Scholar
  34. [34] Zhang Jianxin and Feng Yunhai. 2020. Advanced Chinese character detection for natural scene based on EAST. In J. Phys.: Conf. Ser., Vol. 1550. IOP Publishing, 032050.Google ScholarGoogle Scholar
  35. [35] Dasgupta Kinjal, Das Sudip, and Bhattacharya Ujjwal. 2020. Scale-invariant multi-oriented text detection in wild scene image. In Proceedings of the IEEE International Conference on Image Processing (ICIP’20). IEEE, 20412045.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Lin Jerry Chun-Wei, Shao Yinan, Djenouri Youcef, and Yun Unil. 2021. ASRNN: A recurrent neural network with an attention model for sequence labeling. Knowl.-Based Syst. 212 (2021), 106548.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] He Wenhao, Zhang Xu-Yao, Yin Fei, Luo Zhenbo, Ogier Jean-Marc, and Liu Cheng-Lin. 2020. Realtime multi-scale scene text detection with scale-based region proposal network. Elsevier Patt. Recogn. 98 (2020), 107026.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Liu Maofu, Li Lingjun, Hu Huijun, Guan Weili, and Tian Jing. 2020. Image caption generation with dual attention mechanism. Info. Process. Manage. 57, 2 (2020), 102178.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Rauf Hafiz Tayyab, Gao Jiechao, Almadhor Ahmad, Arif Muhammad, and Nafis Md Tabrez. 2021. Enhanced bat algorithm for COVID-19 short-term forecasting using optimized LSTM. Soft Comput. 25, 20 (2021), 1298912999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Ghosh Subhankar, Shivakumara Palaiahnakote, Roy Prasun, Pal Umapada, and Lu Tong. 2020. Graphology based handwritten character analysis for human behaviour identification. CAAI Trans. Intell. Technol. 5, 1 (2020), 5565.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Wu Jie, Wu Chunlei, Lu Jing, Wang Leiquan, and Cui Xuerong. 2021. Region reinforcement network with topic constraint for image-text matching. IEEE Trans. Circ. Syst. Video Technol. 32, 1 (2021), 388–397.Google ScholarGoogle Scholar
  42. [42] Xu Xiaogang, Chen Ying-Cong, Tao Xin, and Jia Jiaya. 2021. Text-guided human image manipulation via image-text shared space. IEEE Trans. Pattern Anal. Mach. Intell. (2021), 1–1. DOI: 10.1109/TPAMI.2021.3085339Google ScholarGoogle Scholar
  43. [43] Daldali Mehdi and Souhar Abdelghani. 2019. Handwritten Arabic documents segmentation into text lines using seam carving. Int. J. Interact. Multimedia Artific. Intell. 5, 5 (2019), 8996.Google ScholarGoogle Scholar
  44. [44] Srivastava Gautam, Maddikunta Praveen Kumar Reddy, and Gadekallu Thippa Reddy. 2021. A two-stage text feature selection algorithm for improving text classification. ACM Transactions on Asian and Low-Resource Language Information Processing 20, 3 (2021).Google ScholarGoogle Scholar
  45. [45] Boulid Youssef, Souhar Abdelghani, and Elkettani Mohamed Youssfi. 2017. Handwritten character recognition based on the specificity and the singularity of the Arabic language. Int. J. Interact. Multimedia Artific. Intell. 4, 4 (2017).Google ScholarGoogle Scholar
  46. [46] Souhar Abdelghani, Boulid Youssef, Ameur ElB, Ouagague Mly, et al. 2017. Segmentation of Arabic handwritten documents into text lines using watershed transform. Int. J. Interact. Multimedia Artific. Intell. 4, 6 (2017).Google ScholarGoogle Scholar
  47. [47] Wu Hao, Judd Patrick, Zhang Xiaojie, Isaev Mikhail, and Micikevicius Paulius. 2020. Integer quantization for deep learning inference: Principles and empirical evaluation. Retrieved from https://arxiv.org/abs/2004.09602.Google ScholarGoogle Scholar
  48. [48] Li Rundong, Wang Yan, Liang Feng, Qin Hongwei, Yan Junjie, and Fan Rui. 2019. Fully quantized network for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 28102819.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Liebenwein Lucas, Baykal Cenk, Carter Brandon, Gifford David, and Rus Daniela. 2021. Lost in pruning: The effects of pruning neural networks beyond test accuracy. Proc. Mach. Learn. Syst. 3 (2021).Google ScholarGoogle Scholar
  50. [50] Cygert Sebastian and Czyzewski Andrzej. 2021. Robustness in compressed neural networks for object detection. Retrieved from https://arXiv:2102.05509.Google ScholarGoogle Scholar
  51. [51] Dhall Sakshi, Dwivedi Ashutosh Dhar, Pal Saibal K., and Srivastava Gautam. 2021. Blockchain-based framework for reducing fake or vicious news spread on social media/messaging platforms. Trans. Asian Low-Res. Lang. Info. Process. 21, 1 (2021), 133.Google ScholarGoogle Scholar
  52. [52] Gao Jiechao, Wang Wenpeng, Liu Zetian, Billah Md Fazlay Rabbi Masum, and Campbell Bradford. 2021. Decentralized federated learning framework for the neighborhood: A case study on residential building load forecasting. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems. 453459.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Ali Hafiz Munsub, Liu Jun, Bukhari Syed Ahmad Chan, and Rauf Hafiz Tayyab. 2021. Planning a secure and reliable IoT-enabled FOG-assisted computing infrastructure for healthcare. Cluster Comput. (2021), 119.Google ScholarGoogle Scholar
  54. [54] Karatzas Dimosthenis, Gomez-Bigorda Lluis, Nicolaou Anguelos, Ghosh Suman, Bagdanov Andrew, Iwamura Masakazu, Matas Jiri, Neumann Lukas, Chandrasekhar Vijay Ramaseshan, Lu Shijian, et al. 2015. ICDAR 2015 competition on robust reading. In Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR’15). IEEE, 11561160.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Nayef Nibal, Patel Yash, Busta Michal, Chowdhury Pinaki Nath, Karatzas Dimosthenis, Khlif Wafa, Matas Jiri, Pal Umapada, Burie Jean-Christophe, Liu Cheng-lin, et al. 2019. ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition–RRC-MLT-2019. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR’19). IEEE, 15821587.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Li Bin and Lima Dimas. 2021. Facial expression recognition via ResNet-50. Elsevier Int. J. Cogn. Comput. Eng. 2 (2021), 5764.Google ScholarGoogle Scholar
  57. [57] Ioffe Sergey and Szegedy Christian. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning. PMLR, 448456.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Lin Jerry Chun-Wei, Shao Yinan, Zhou Yujie, Pirouz Matin, and Chen Hsing-Chung. 2019. A Bi-LSTM mention hypergraph model with encoding schema for mention extraction. Eng. Appl. Artific. Intell. 85 (2019), 175181.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Smith Ray. 2007. An overview of the Tesseract OCR engine. In Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR’07), Vol. 2. IEEE, 629633.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Deng Dan, Liu Haifeng, Li Xuelong, and Cai Deng. 2018. Pixellink: Detecting scene text via instance segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Baek Youngmin, Lee Bado, Han Dongyoon, Yun Sangdoo, and Lee Hwalsuk. 2019. Character region awareness for text detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 93659374.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Xing Linjie, Tian Zhi, Huang Weilin, and Scott Matthew R.. 2019. Convolutional character networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 91269136.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. QEST: Quantized and Efficient Scene Text Detector Using Deep Learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 5
      May 2023
      653 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3596451
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 May 2023
      • Online AM: 26 March 2022
      • Accepted: 12 March 2022
      • Revised: 3 March 2022
      • Received: 14 October 2021
      Published in tallip Volume 22, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!