Abstract
Edge Analytics and Artificial Intelligence are important features of the current smart connected living community. In a society where people, homes, cities, and workplaces are simultaneously connected through various devices, primarily through mobile devices, a considerable amount of data is exchanged, and the processing and storage of these data are laborious and difficult tasks. Edge Analytics allows the collection and analysis of such data on mobile devices, such as smartphones and tablets, without involving any cloud-centred architecture that cannot guarantee real-time responsiveness. Meanwhile, Artificial Intelligence techniques can constitute a valid instrument to process data, limiting the computation time, and optimising decisional processes and predictions in several sectors, such as healthcare. Within this field, in this article, an approach able to evaluate the voice quality condition is proposed. A fully automatic algorithm, based on Deep Learning, classifies a voice as healthy or pathological by analysing spectrogram images extracted by means of the recording of vowel /a/, in compliance with the traditional medical protocol. A light Convolutional Neural Network is embedded in a mobile health application in order to provide an instrument capable of assessing voice disorders in a fast, easy, and portable way. Thus, a straightforward mobile device becomes a screening tool useful for the early diagnosis, monitoring, and treatment of voice disorders. The proposed approach has been tested on a broad set of voice samples, not limited to the most common voice diseases but including all the pathologies present in three different databases achieving F1-scores, over the testing set, equal to 80%, 90%, and 73%. Although the proposed network consists of a reduced number of layers, the results are very competitive compared to those of other “cutting edge” approaches constructed using more complex neural networks, and compared to the classic deep neural networks, for example, VGG-16 and ResNet-50.
- Ahmed Al-Nasheri, Ghulam Muhammad, Mansour Alsulaiman, Zulfiqar Ali, Khalid H. Malki, Tamer A. Mesallam, and Mohamed Farahat Ibrahim. 2017. Voice pathology detection and classification using auto-correlation and entropy features in different frequency regions. IEEE Access 6 (2017), 6961–6974.Google Scholar
Cross Ref
- Ahmed Ali Mohammed Al-Saffar, Hai Tao, and Mohammed Ahmed Talab. 2017. Review of deep convolution neural network in image classification. In 2017 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications (ICRAMET'17). IEEE, 26–31.Google Scholar
Cross Ref
- Musaed Alhussein and Ghulam Muhammad. 2018. Voice pathology detection using deep learning on mobile healthcare framework. IEEE Access 6 (2018), 41034–41041.Google Scholar
Cross Ref
- Akbar Ali and Sanjay Ganar. 2018. Intelligent pathological voice detection. International Journal of Innovative Research in Technology 5, 5 (2018), 92–95.Google Scholar
- Jefferson S. Almeida, Pedro P. Rebouças Filho, Tiago Carneiro, Wei Wei, Robertas Damaševičius, Rytis Maskeliūnas, and Victor Hugo C. de Albuquerque. 2019. Detecting Parkinson's disease with sustained phonation and speech signals using machine learning techniques. Pattern Recognition Letters 125 (2019), 55–62.Google Scholar
Digital Library
- Fethi Amara, Mohamed Fezari, and Hocine Bourouba. 2016. An improved GMM-SVM system based on distance metric for voice pathology detection. Applied Mathematics and Information Science 10, 3 (2016), 1061–1070.Google Scholar
Cross Ref
- Ofer Amir, Michael Wolf, and Noam Amir. 2007. A clinical comparison between MDVP and Praat softwares: Is there a difference? In Fifth International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications. ISCA, Firenze University Press, 37–40.Google Scholar
- Syed Muhammad Anwar, Muhammad Majid, Adnan Qayyum, Muhammad Awais, Majdi Alnowami, and Muhammad Khurram Khan. 2018. Medical image analysis using convolutional neural networks: A review. Journal of Medical Systems 42, 11 (2018), 226. Google Scholar
Cross Ref
- Ben Barsties and Marc De Bodt. 2015. Assessment of voice quality: Current state-of-the-art. Auris Nasus Larynx 42, 3 (2015), 183–188.Google Scholar
Cross Ref
- Paul Boersma and David Weenink. 2009. Praat: Doing phonetics by computer (Version 5.1. 05) [Computer program]. Retrieved August 30, 2020 fromhttps://www.fon.hum.uva.nl/praat/.Google Scholar
- Boyan Boyanov and Stefan Hadjitodorov. 1997. Acoustic analysis of pathological voices. A voice analysis system for the screening of laryngeal diseases. IEEE Engineering in Medicine and Biology Magazine 16, 4 (1997), 74–82.Google Scholar
Cross Ref
- Ugo Cesari, Giuseppe De Pietro, Elio Marciano, Ciro Niri, Giovanna Sannino, and Laura Verde. 2018. A new database of healthy and pathological voices. Computers & Electrical Engineering 68 (2018), 310–321.Google Scholar
Cross Ref
- Ugo Cesari, Giuseppe De Pietro, Elio Marciano, Ciro Niri, Giovanna Sannino, and Laura Verde. 2018. VOICED (VOice ICar fEDerico II) Database. PhysioNet. January 30, 2020 https://physionet.org/physiobank/database/voiced/.Google Scholar
- Lili Chen and Junjiang Chen. 2020. Deep neural network for automatic classification of pathological voice signals. Journal of Voice (2020).Google Scholar
- Weiping Ding, Mohamed Abdel-Basset, Khalid A. Eldrandaly, Laila Abdel-Fatah, and Victor Hugo C. de Albuquerque. 2020. Smart supervision of cardiomyopathy based on fuzzy Harris hawks optimizer and wearable sensing data optimization: A new model. IEEE Transactions on Cybernetics (2020), 1–15.Google Scholar
- Carlos M. J. M. Dourado, Suane Pires P. Da Silva, Raul Victor M. Da Nóbrega, Pedro P. Rebouças Filho, Khan Muhammad, and Victor Hugo C. De Albuquerque. 2020. An open IoHT-based deep learning framework for online medical image recognition. IEEE Journal on Selected Areas in Communications (2020).Google Scholar
Cross Ref
- Massachusetts Eye and Ear Infirmary. 1994. Elemetrics Disordered Voice Database (Version 1.03).Google Scholar
- Shih-Hau Fang, Yu Tsao, Min-Jing Hsiao, Ji-Ying Chen, Ying-Hui Lai, Feng-Chuan Lin, and Chi-Te Wang. 2019. Detection of pathological voice using cepstrum vectors: A deep learning approach. Journal of Voice 33, 5 (2019), 634–641.Google Scholar
Cross Ref
- G. Friedrich and P. H. Dejonckere. 2005. The voice evaluation protocol of the European Laryngological Society (ELS)—First results of a multicenter study. Laryngo-rhino-otologie 84, 10 (2005), 744–752.Google Scholar
Cross Ref
- Karimollah Hajian-Tilaki. 2013. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian Journal of Internal Medicine 4, 2 (2013), 627.Google Scholar
- Yixue Hao, Yiming Miao, Long Hu, M. Shamim Hossain, Ghulam Muhammad, and Syed Umar Amin. 2019. Smart-Edge-CoCaCo: AI-enabled smart edge with joint computation, caching, and communication in heterogeneous IoT. IEEE Network 33, 2 (2019), 58–64.Google Scholar
Cross Ref
- Pavol Harar, Zoltan Galaz, Jesus B. Alonso-Hernandez, Jiri Mekyska, Radim Burget, and Zdenek Smekal. 2018. Towards robust voice pathology detection. Neural Computing and Applications 32 (2018), 15747–15757.Google Scholar
Cross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 770–778.Google Scholar
Cross Ref
- Patricia Henríquez, Jesús B. Alonso, Miguel A. Ferrer, Carlos M. Travieso, Juan I. Godino-Llorente, and Fernando Díaz-de María. 2009. Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Transactions on Audio, Speech, and Language Processing 17, 6 (2009), 1186–1195.Google Scholar
Digital Library
- M. Shamim Hossain. 2015. Cloud-supported cyber–physical localization framework for patients monitoring. IEEE Systems Journal 11, 1 (2015), 118–127.Google Scholar
Cross Ref
- M. Shamim Hossain, Syed Umar Amin, Mansour Alsulaiman, and Ghulam Muhammad. 2019. Applying deep learning for epilepsy seizure detection and brain mapping visualization. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15, 1s (2019), 1–17. Google Scholar
Digital Library
- M. Shamim Hossain, Ghulam Muhammad, and Atif Alamri. 2019. Smart healthcare monitoring: A voice pathology detection paradigm for smart cities. Multimedia Systems 25, 5 (2019), 565–575.Google Scholar
Cross Ref
- M. Shamim Hossain, Ghulam Muhammad, and Nadra Guizani. 2020. Explainable AI and mass surveillance system-based healthcare framework to combat COVID-19 like pandemics. IEEE Network 34, 4 (2020), 1–7.Google Scholar
Cross Ref
- Rumana Islam, Mohammed Tarique, and Esam Abdel-Raheem. 2020. A survey on signal processing based pathological voice detection techniques. IEEE Access 8 (2020), 66749–66776.Google Scholar
Cross Ref
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In 22nd ACM International Conference on Multimedia. ACM, 675–678. Google Scholar
Digital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS'12), Vol. 1, 1097–1105. Google Scholar
Digital Library
- Ben Barsties v. Latoszek, Nora Ulozaitė-Stanienė, Youri Maryn, Tadas Petrauskas, and Virgilijus Uloza. 2019. The influence of gender and age on the acoustic voice quality index and dysphonia severity index: A normative study. Journal of Voice 33, 3 (2019), 340–345.Google Scholar
Cross Ref
- Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324.Google Scholar
Cross Ref
- Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. 2017. A survey on deep learning in medical image analysis. Medical Image Analysis 42 (2017), 60–88.Google Scholar
Cross Ref
- Xiaoxuan Liu, Livia Faes, Aditya U. Kale, Siegfried K. Wagner, Dun Jack Fu, Alice Bruynseels, Thushika Mahendiran, Gabriella Moraes, Mohith Shamdas, Christoph Kern, et al. 2019. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: A systematic review and meta-analysis. The Lancet Digital Health 1, 6 (2019), e271–e297.Google Scholar
Cross Ref
- Leonardo Wanderley Lopes, Layssa Batista Simões, Jocélio Delfino da Silva, Deyverson da Silva Evangelista, Ana Celiane da Nóbrega e Ugulino, Priscila Oliveira Costa Silva, and Vinícius Jefferson Dias Vieira. 2017. Accuracy of acoustic analysis measurements in the evaluation of patients with different laryngeal diagnoses. Journal of Voice 31, 3 (2017), 382.e15–382.e26.Google Scholar
Cross Ref
- A. Ricci Maccarini and E. Lucchini. 2002. La valutazione soggettiva ed oggettiva della disfonia. Il protocollo SIFEL. Acta Phoniatrica Latina 24, 1/2 (2002), 13–42.Google Scholar
- Tamer A. Mesallam, Mohamed Farahat, Khalid H. Malki, Mansour Alsulaiman, Zulfiqar Ali, Ahmed Al-Nasheri, and Ghulam Muhammad. 2017. Development of the arabic voice pathology database and its evaluation by using speech features and machine learning algorithms. Journal of Healthcare Engineering 2017 (2017), 1–13.Google Scholar
Cross Ref
- Mazin Abed Mohammed, Karrar Hameed Abdulkareem, Salama A. Mostafa, Mohd Khanapi Abd Ghani, Mashael S. Maashi, Begonya Garcia-Zapirain, Ibon Oleagordia, Hosam Alhakami, and Fahad Taha AL-Dhief. 2020. Voice pathology detection and classification using convolutional neural network model. Applied Sciences 10, 11 (2020), 3723.Google Scholar
Cross Ref
- Ghulam Muhammad, Mohammed F. Alhamid, Mansour Alsulaiman, and Brij Gupta. 2018. Edge computing with cloud for voice disorder assessment and treatment. IEEE Communications Magazine 56, 4 (2018), 60–65.Google Scholar
Cross Ref
- Khan Muhammad, Salman Khan, Javier Del Ser, and Victor Hugo C. de Albuquerque. 2020. Deep learning for multigrade brain tumor classification in smart healthcare systems: A prospective survey. IEEE Transactions on Neural Networks and Learning Systems (2020), 1–8.Google Scholar
- Juan Rafael Orozco-Arroyave, Julián David Arias-Londoño, Jesús Francisco Vargas-Bonilla, María Claudia Gonzalez-Rátiva, and Elmar Nöth. 2014. New Spanish speech corpus database for the analysis of people suffering from Parkinson's disease. In Ninth International Conference on Language Resources and Evaluation (LREC'14). European Language Resources Association (ELRA), 342–347.Google Scholar
- Manfred Pützer and Jacques Koreman. 1997. A German database of patterns of pathological vocal fold vibration. Phonus 3 (1997), 143–153.Google Scholar
- Alice Rueda and Sridhar Krishnan. 2019. Augmenting dysphonia voice using Fourier-based synchrosqueezing transform for a CNN classifier. In 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'19). IEEE, 6415–6419.Google Scholar
Cross Ref
- Brahim Sabir, Fatima Rouda, Yassine Khazri, Bouzekri Touri, and Mohamed Moussetad. 2017. Improved algorithm for pathological and normal voices identification. International Journal of Electrical and Computer Engineering 7, 1 (2017), 238.Google Scholar
- Giovanna Sannino, Ivanoe De Falco, and Giuseppe De Pietro. 2018. A continuous noninvasive arterial pressure (CNAP) approach for health 4.0 systems. IEEE Transactions on Industrial Informatics 15, 1 (2018), 498–506.Google Scholar
Cross Ref
- Marcus A. G. Santos, Roberto Munoz, Rodrigo Olivares, Pedro P. Rebouças Filho, Javier Del Ser, and Victor Hugo C. de Albuquerque. 2020. Online heart monitoring systems on the internet of health things environments: A survey, a reference model and an outlook. Information Fusion 53 (2020), 222–239.Google Scholar
Digital Library
- R. T. Sataloff, K. M. Kost, and S. E. Linville. 2005. The effects of age on the voice. In Vocal Health and Pedagogy - Science, Assessment and Treatment (3rd Edition), R. T. Sataloff (Ed.). Plural Publishing, Inc., San Diego, 319–338.Google Scholar
- Valson Sheyona and Usha Devadas. 2020. The prevalence and impact of voice problems in nonprofessional voice users: Preliminary findings. Journal of Voice (2020).Google Scholar
- Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations (ICLR'15). 1–14.Google Scholar
- Joseph R. Spiegel, Robert Thayer Sataloff, and Kate A. Emerich. 1997. The young adult voice. Journal of Voice 11, 2 (1997), 138–143.Google Scholar
Cross Ref
- Zoë Thijs, Kristie Knickerbocker, and Christopher R. Watts. 2020. Epidemiological patterns and treatment outcomes in a private practice community voice clinic. Journal of Voice (2020).Google Scholar
- Trinh Nam and Darragh O'Brien. 2019. Pathological speech classification using a convolutional neural network. In Irish Machine Vision and Image Processing Conference (IMVIP'19). Technological University Dublin, Dublin, Ireland, 72–75.Google Scholar
- Laura Verde, Giuseppe De Pietro, and Giovanna Sannino. 2018. Voice disorder identification by using machine learning techniques. IEEE Access 6 (2018), 16246–16255.Google Scholar
Cross Ref
- Huiyi Wu, John Soraghan, Anja Lowit, and Gaetano Di Caterina. 2018. Convolutional neural networks for pathological voice detection. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC'18). IEEE, 1–4.Google Scholar
Cross Ref
- Huiyi Wu, John Soraghan, Anja Lowit, and Gaetano Di Caterina. 2018. A deep learning method for pathological voice detection using convolutional deep belief networks. In Interspeech 2018. ISCA, 446–450.Google Scholar
Cross Ref
Index Terms
A Deep Learning Approach for Voice Disorder Detection for Smart Connected Living Environments
Recommendations
Malaria parasite detection using deep learning algorithms based on (CNNs) technique
AbstractMalaria is a life-threatening disease caused by female anopheles mosquito bites that are prevalent in many regions of the world. We introduce a deep convolutional neural network (CNN) to improve malaria diagnosis accuracy using patches ...
Multi-Variate vocal data analysis for Detection of Parkinson disease using Deep Learning
AbstractMachine learning (ML) and Deep learning (DL) methods are differently implemented with various decision-making abilities. Particularly, the usage of ML and DL techniques in disease detection is inevitable in the near future. This work uses the ...
Abnormality Detection Approach using Deep Learning Models in Smart Home Environments
ICCBN '19: Proceedings of the 7th International Conference on Communications and Broadband NetworkingThe rising number of elderly populations has become a common concern in many countries. As one of the solutions, smart homes have been developed to help them live independently in their own homes. However, the accurate interpretation in monitoring human ...






Comments