Abstract
Scene text detection is complicated and one of the most challenging tasks due to different environmental restrictions, such as illuminations, lighting conditions, tiny and curved texts, and many more. Most of the works on scene text detection have overlooked the primary goal of increasing model accuracy and efficiency, resulting in heavy-weight models that require more processing resources. A novel lightweight model has been developed in this article to improve the accuracy and efficiency of scene text detection. The proposed model relies on ResNet50 and MobileNetV2 as backbones with quantization used to make the resulting model lightweight. During quantization, the precision has been changed from float32 to float16 and int8 for making the model lightweight. In terms of inference time and Floating-Point Operations Per Second, the proposed method outperforms the state-of-the-art techniques by around 30–100 times. Here, well-known datasets, i.e., ICDAR2015 and ICDAR2019, have been utilized for training and testing to validate the performance of the proposed model. Finally, the findings and discussion indicate that the proposed model is more efficient than the existing schemes.
- [1] . 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431–3440.Google Scholar
Cross Ref
- [2] . 2016. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision. Springer, 21–37.Google Scholar
Cross Ref
- [3] . 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Retrieved from https://arXiv:1506.01497.Google Scholar
- [4] . 2020. A survey on assistive technology for visually impaired. Elsevier Internet Things 11 (2020), 100188.Google Scholar
Cross Ref
- [5] . 2000. Morphological text extraction from images. IEEE Trans. Image Process. 9, 11 (2000), 1978–1983.Google Scholar
Digital Library
- [6] . 2021. Textmountain: Accurate scene text detection via instance segmentation. Elsevier Pattern Recogn. 110 (2021), 107336.Google Scholar
Cross Ref
- [7] . 2018. A new feature selection method based on intuitionistic fuzzy entropy to categorize text documents. Int. J. Interact. Multimedia Artific. Intell. 5, 3 (2018), 106–117.Google Scholar
- [8] . 2016. Edge computing: Vision and challenges. IEEE Internet Things J. 3, 5 (2016), 637–646.Google Scholar
Cross Ref
- [9] . 2021. Applications of Blockchain in Healthcare. Springer.Google Scholar
Cross Ref
- [10] . 2020. Development of a low-cost industrial OCR system with an end-to-end deep learning technology. IEEK J. Embed. Syst. Appl. 15, 2 (2020), 51–60.Google Scholar
- [11] . 2013. Text signage recognition in Android mobile devices. Citeseer J. Comput. Sci. 9, 12 (2013), 1793.Google Scholar
Cross Ref
- [12] . 2016. PVANet: Lightweight deep neural networks for real-time object detection. In Proceedings of the 1st International Workshop on Efficient Methods for Deep Neural Networks (EMDNN’16).Google Scholar
- [13] . 2017. East: An efficient and accurate scene text detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5551–5560.Google Scholar
Cross Ref
- [14] . 2018. A novel method for fast arbitrary-oriented scene text detection. In Proceedings of the Chinese Control And Decision Conference (CCDC’18). IEEE, 1652–1657.Google Scholar
Cross Ref
- [15] . 2017. Detecting oriented text in natural images by linking segments. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2550–2558.Google Scholar
Cross Ref
- [16] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google Scholar
Cross Ref
- [17] . 2021. Data access control in the cloud computing environment for bioinformatics. Int. J. Appl. Res. Bioinform. 11, 1 (2021), 40–50.Google Scholar
Digital Library
- [18] . 2021. The revolution of blockchain: State-of-the-art and research challenges. Arch. Comput. Methods Eng. 28, 3 (2021), 1497–1515.Google Scholar
Cross Ref
- [19] . 2020. Fast and secure data accessing by using DNA computing for the cloud environment. IEEE Trans. Serv. Comput. (2020), 1–1.
DOI: 10.1109/TSC.2020.3046471Google Scholar - [20] . 2019. A travel aid for visually impaired: R-Cane. In Proceedings of the International Conference on Smart City and Informatization. Springer, 404–417.Google Scholar
Cross Ref
- [21] . 2019. CREATION: Computational ConstRained travel aid for object detection in outdoor eNvironment. In Proceedings of the 15th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS’19). IEEE, 247–254.Google Scholar
Cross Ref
- [22] . 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2704–2713.Google Scholar
Cross Ref
- [23] . 2007. Generalized dice: Many questions and a few answers. Graph Theory Notes New York 53 (2007), 39–42.Google Scholar
- [24] . 2018. Fots: Fast oriented text spotting with a unified network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5676–5685.Google Scholar
Cross Ref
- [25] . 2018. Textsnake: A flexible representation for detecting text of arbitrary shapes. In Proceedings of the European Conference on Computer Vision (ECCV’18). Springer, 20–36.Google Scholar
Digital Library
- [26] . 2018. Textboxes++: A single-shot oriented scene text detector. IEEE Trans. Image Process. 27, 8 (2018), 3676–3690.Google Scholar
Cross Ref
- [27] . 2020. Affective computing study of attention recognition for the 3D guide system. CAAI Trans. Intell. Technol. 5, 4 (2020), 260–267.Google Scholar
Digital Library
- [28] . 2021. Ensemble algorithm using transfer learning for sheep breed classification. In Proceedings of the IEEE 15th International Symposium on Applied Computational Intelligence and Informatics (SACI’21). IEEE, 199–204.Google Scholar
Cross Ref
- [29] . 2021. Self-attention-based conditional random fields latent variables model for sequence labeling. Pattern Recogn. Lett. 145 (2021), 157–164.Google Scholar
Digital Library
- [30] . 2022. A novel technique for accelerating live migration in cloud computing. Auto. Softw. Eng. 29 (2022), 1–21.
DOI: Google ScholarDigital Library
- [31] . 2020. Object tracking on event cameras with offline–online learning. CAAI Trans. Intell. Technol. 5, 3 (2020), 165–171.Google Scholar
Digital Library
- [32] . 2020. Compact and accurate scene text detector. Multidisc. Dig. Publish. Inst. Appl. Sci. 10, 6 (2020), 2096.Google Scholar
- [33] . 2021. EANDC: An explainable attention network based deep adaptive clustering model for mental health treatments. Future Gen. Comput. Syst. 130 (2021), 106–113.
DOI: 10.1016/j.future.2021.12.008Google Scholar - [34] . 2020. Advanced Chinese character detection for natural scene based on EAST. In J. Phys.: Conf. Ser., Vol. 1550. IOP Publishing, 032050.Google Scholar
- [35] . 2020. Scale-invariant multi-oriented text detection in wild scene image. In Proceedings of the IEEE International Conference on Image Processing (ICIP’20). IEEE, 2041–2045.Google Scholar
Cross Ref
- [36] . 2021. ASRNN: A recurrent neural network with an attention model for sequence labeling. Knowl.-Based Syst. 212 (2021), 106548.Google Scholar
Cross Ref
- [37] . 2020. Realtime multi-scale scene text detection with scale-based region proposal network. Elsevier Patt. Recogn. 98 (2020), 107026.Google Scholar
Digital Library
- [38] . 2020. Image caption generation with dual attention mechanism. Info. Process. Manage. 57, 2 (2020), 102178.Google Scholar
Digital Library
- [39] . 2021. Enhanced bat algorithm for COVID-19 short-term forecasting using optimized LSTM. Soft Comput. 25, 20 (2021), 12989–12999.Google Scholar
Digital Library
- [40] . 2020. Graphology based handwritten character analysis for human behaviour identification. CAAI Trans. Intell. Technol. 5, 1 (2020), 55–65.Google Scholar
Digital Library
- [41] . 2021. Region reinforcement network with topic constraint for image-text matching. IEEE Trans. Circ. Syst. Video Technol. 32, 1 (2021), 388–397.Google Scholar
- [42] . 2021. Text-guided human image manipulation via image-text shared space. IEEE Trans. Pattern Anal. Mach. Intell. (2021), 1–1.
DOI: 10.1109/TPAMI.2021.3085339Google Scholar - [43] . 2019. Handwritten Arabic documents segmentation into text lines using seam carving. Int. J. Interact. Multimedia Artific. Intell. 5, 5 (2019), 89–96.Google Scholar
- [44] . 2021. A two-stage text feature selection algorithm for improving text classification. ACM Transactions on Asian and Low-Resource Language Information Processing 20, 3 (2021).Google Scholar
- [45] . 2017. Handwritten character recognition based on the specificity and the singularity of the Arabic language. Int. J. Interact. Multimedia Artific. Intell. 4, 4 (2017).Google Scholar
- [46] . 2017. Segmentation of Arabic handwritten documents into text lines using watershed transform. Int. J. Interact. Multimedia Artific. Intell. 4, 6 (2017).Google Scholar
- [47] . 2020. Integer quantization for deep learning inference: Principles and empirical evaluation. Retrieved from https://arxiv.org/abs/2004.09602.Google Scholar
- [48] . 2019. Fully quantized network for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2810–2819.Google Scholar
Cross Ref
- [49] . 2021. Lost in pruning: The effects of pruning neural networks beyond test accuracy. Proc. Mach. Learn. Syst. 3 (2021).Google Scholar
- [50] . 2021. Robustness in compressed neural networks for object detection. Retrieved from https://arXiv:2102.05509.Google Scholar
- [51] . 2021. Blockchain-based framework for reducing fake or vicious news spread on social media/messaging platforms. Trans. Asian Low-Res. Lang. Info. Process. 21, 1 (2021), 1–33.Google Scholar
- [52] . 2021. Decentralized federated learning framework for the neighborhood: A case study on residential building load forecasting. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems. 453–459.Google Scholar
Digital Library
- [53] . 2021. Planning a secure and reliable IoT-enabled FOG-assisted computing infrastructure for healthcare. Cluster Comput. (2021), 1–19.Google Scholar
- [54] . 2015. ICDAR 2015 competition on robust reading. In Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR’15). IEEE, 1156–1160.Google Scholar
Digital Library
- [55] . 2019. ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition–RRC-MLT-2019. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR’19). IEEE, 1582–1587.Google Scholar
Cross Ref
- [56] . 2021. Facial expression recognition via ResNet-50. Elsevier Int. J. Cogn. Comput. Eng. 2 (2021), 57–64.Google Scholar
- [57] . 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning. PMLR, 448–456.Google Scholar
Digital Library
- [58] . 2019. A Bi-LSTM mention hypergraph model with encoding schema for mention extraction. Eng. Appl. Artific. Intell. 85 (2019), 175–181.Google Scholar
Cross Ref
- [59] . 2007. An overview of the Tesseract OCR engine. In Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR’07), Vol. 2. IEEE, 629–633.Google Scholar
Cross Ref
- [60] . 2018. Pixellink: Detecting scene text via instance segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google Scholar
Cross Ref
- [61] . 2019. Character region awareness for text detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9365–9374.Google Scholar
Cross Ref
- [62] . 2019. Convolutional character networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9126–9136.Google Scholar
Cross Ref
Index Terms
QEST: Quantized and Efficient Scene Text Detector Using Deep Learning
Recommendations
A novel genetic algorithm-based approach for compression and acceleration of deep learning convolution neural network: an application in computer tomography lung cancer data
AbstractDeep learning (DL) models are computationally expensive in space and time, which makes it difficult to deploy DL models in edge computing devices, such as Raspberry-Pi or Jetson Nano. The current strategy uses genetic algorithm (GA), which ...
An Efficient Deep Quantized Compressed Sensing Coding Framework of Natural Images
MM '18: Proceedings of the 26th ACM international conference on MultimediaTraditional image compressed sensing (CS) coding frameworks solve an inverse problem that is based on the measurement coding tools (prediction, quantization, entropy coding, etc.) and the optimization based image reconstruction method. These CS coding ...
Deep neural network based single pixel prediction for unified video coding
Classical video prediction methods exploit directly and shallowly the intra-frame, inter-frame and multi-view similarities within the video sequences; the proposed video prediction methods indirectly and intensively transform the frame correlations into ...






Comments