skip to main content
research-article

A Deep Learning Approach for Voice Disorder Detection for Smart Connected Living Environments

Authors Info & Claims
Published:15 October 2021Publication History
Skip Abstract Section

Abstract

Edge Analytics and Artificial Intelligence are important features of the current smart connected living community. In a society where people, homes, cities, and workplaces are simultaneously connected through various devices, primarily through mobile devices, a considerable amount of data is exchanged, and the processing and storage of these data are laborious and difficult tasks. Edge Analytics allows the collection and analysis of such data on mobile devices, such as smartphones and tablets, without involving any cloud-centred architecture that cannot guarantee real-time responsiveness. Meanwhile, Artificial Intelligence techniques can constitute a valid instrument to process data, limiting the computation time, and optimising decisional processes and predictions in several sectors, such as healthcare. Within this field, in this article, an approach able to evaluate the voice quality condition is proposed. A fully automatic algorithm, based on Deep Learning, classifies a voice as healthy or pathological by analysing spectrogram images extracted by means of the recording of vowel /a/, in compliance with the traditional medical protocol. A light Convolutional Neural Network is embedded in a mobile health application in order to provide an instrument capable of assessing voice disorders in a fast, easy, and portable way. Thus, a straightforward mobile device becomes a screening tool useful for the early diagnosis, monitoring, and treatment of voice disorders. The proposed approach has been tested on a broad set of voice samples, not limited to the most common voice diseases but including all the pathologies present in three different databases achieving F1-scores, over the testing set, equal to 80%, 90%, and 73%. Although the proposed network consists of a reduced number of layers, the results are very competitive compared to those of other “cutting edge” approaches constructed using more complex neural networks, and compared to the classic deep neural networks, for example, VGG-16 and ResNet-50.

References

  1. Ahmed Al-Nasheri, Ghulam Muhammad, Mansour Alsulaiman, Zulfiqar Ali, Khalid H. Malki, Tamer A. Mesallam, and Mohamed Farahat Ibrahim. 2017. Voice pathology detection and classification using auto-correlation and entropy features in different frequency regions. IEEE Access 6 (2017), 6961–6974.Google ScholarGoogle ScholarCross RefCross Ref
  2. Ahmed Ali Mohammed Al-Saffar, Hai Tao, and Mohammed Ahmed Talab. 2017. Review of deep convolution neural network in image classification. In 2017 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications (ICRAMET'17). IEEE, 26–31.Google ScholarGoogle ScholarCross RefCross Ref
  3. Musaed Alhussein and Ghulam Muhammad. 2018. Voice pathology detection using deep learning on mobile healthcare framework. IEEE Access 6 (2018), 41034–41041.Google ScholarGoogle ScholarCross RefCross Ref
  4. Akbar Ali and Sanjay Ganar. 2018. Intelligent pathological voice detection. International Journal of Innovative Research in Technology 5, 5 (2018), 92–95.Google ScholarGoogle Scholar
  5. Jefferson S. Almeida, Pedro P. Rebouças Filho, Tiago Carneiro, Wei Wei, Robertas Damaševičius, Rytis Maskeliūnas, and Victor Hugo C. de Albuquerque. 2019. Detecting Parkinson's disease with sustained phonation and speech signals using machine learning techniques. Pattern Recognition Letters 125 (2019), 55–62.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Fethi Amara, Mohamed Fezari, and Hocine Bourouba. 2016. An improved GMM-SVM system based on distance metric for voice pathology detection. Applied Mathematics and Information Science 10, 3 (2016), 1061–1070.Google ScholarGoogle ScholarCross RefCross Ref
  7. Ofer Amir, Michael Wolf, and Noam Amir. 2007. A clinical comparison between MDVP and Praat softwares: Is there a difference? In Fifth International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications. ISCA, Firenze University Press, 37–40.Google ScholarGoogle Scholar
  8. Syed Muhammad Anwar, Muhammad Majid, Adnan Qayyum, Muhammad Awais, Majdi Alnowami, and Muhammad Khurram Khan. 2018. Medical image analysis using convolutional neural networks: A review. Journal of Medical Systems 42, 11 (2018), 226. Google ScholarGoogle ScholarCross RefCross Ref
  9. Ben Barsties and Marc De Bodt. 2015. Assessment of voice quality: Current state-of-the-art. Auris Nasus Larynx 42, 3 (2015), 183–188.Google ScholarGoogle ScholarCross RefCross Ref
  10. Paul Boersma and David Weenink. 2009. Praat: Doing phonetics by computer (Version 5.1. 05) [Computer program]. Retrieved August 30, 2020 fromhttps://www.fon.hum.uva.nl/praat/.Google ScholarGoogle Scholar
  11. Boyan Boyanov and Stefan Hadjitodorov. 1997. Acoustic analysis of pathological voices. A voice analysis system for the screening of laryngeal diseases. IEEE Engineering in Medicine and Biology Magazine 16, 4 (1997), 74–82.Google ScholarGoogle ScholarCross RefCross Ref
  12. Ugo Cesari, Giuseppe De Pietro, Elio Marciano, Ciro Niri, Giovanna Sannino, and Laura Verde. 2018. A new database of healthy and pathological voices. Computers & Electrical Engineering 68 (2018), 310–321.Google ScholarGoogle ScholarCross RefCross Ref
  13. Ugo Cesari, Giuseppe De Pietro, Elio Marciano, Ciro Niri, Giovanna Sannino, and Laura Verde. 2018. VOICED (VOice ICar fEDerico II) Database. PhysioNet. January 30, 2020 https://physionet.org/physiobank/database/voiced/.Google ScholarGoogle Scholar
  14. Lili Chen and Junjiang Chen. 2020. Deep neural network for automatic classification of pathological voice signals. Journal of Voice (2020).Google ScholarGoogle Scholar
  15. Weiping Ding, Mohamed Abdel-Basset, Khalid A. Eldrandaly, Laila Abdel-Fatah, and Victor Hugo C. de Albuquerque. 2020. Smart supervision of cardiomyopathy based on fuzzy Harris hawks optimizer and wearable sensing data optimization: A new model. IEEE Transactions on Cybernetics (2020), 1–15.Google ScholarGoogle Scholar
  16. Carlos M. J. M. Dourado, Suane Pires P. Da Silva, Raul Victor M. Da Nóbrega, Pedro P. Rebouças Filho, Khan Muhammad, and Victor Hugo C. De Albuquerque. 2020. An open IoHT-based deep learning framework for online medical image recognition. IEEE Journal on Selected Areas in Communications (2020).Google ScholarGoogle ScholarCross RefCross Ref
  17. Massachusetts Eye and Ear Infirmary. 1994. Elemetrics Disordered Voice Database (Version 1.03).Google ScholarGoogle Scholar
  18. Shih-Hau Fang, Yu Tsao, Min-Jing Hsiao, Ji-Ying Chen, Ying-Hui Lai, Feng-Chuan Lin, and Chi-Te Wang. 2019. Detection of pathological voice using cepstrum vectors: A deep learning approach. Journal of Voice 33, 5 (2019), 634–641.Google ScholarGoogle ScholarCross RefCross Ref
  19. G. Friedrich and P. H. Dejonckere. 2005. The voice evaluation protocol of the European Laryngological Society (ELS)—First results of a multicenter study. Laryngo-rhino-otologie 84, 10 (2005), 744–752.Google ScholarGoogle ScholarCross RefCross Ref
  20. Karimollah Hajian-Tilaki. 2013. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian Journal of Internal Medicine 4, 2 (2013), 627.Google ScholarGoogle Scholar
  21. Yixue Hao, Yiming Miao, Long Hu, M. Shamim Hossain, Ghulam Muhammad, and Syed Umar Amin. 2019. Smart-Edge-CoCaCo: AI-enabled smart edge with joint computation, caching, and communication in heterogeneous IoT. IEEE Network 33, 2 (2019), 58–64.Google ScholarGoogle ScholarCross RefCross Ref
  22. Pavol Harar, Zoltan Galaz, Jesus B. Alonso-Hernandez, Jiri Mekyska, Radim Burget, and Zdenek Smekal. 2018. Towards robust voice pathology detection. Neural Computing and Applications 32 (2018), 15747–15757.Google ScholarGoogle ScholarCross RefCross Ref
  23. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 770–778.Google ScholarGoogle ScholarCross RefCross Ref
  24. Patricia Henríquez, Jesús B. Alonso, Miguel A. Ferrer, Carlos M. Travieso, Juan I. Godino-Llorente, and Fernando Díaz-de María. 2009. Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Transactions on Audio, Speech, and Language Processing 17, 6 (2009), 1186–1195.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Shamim Hossain. 2015. Cloud-supported cyber–physical localization framework for patients monitoring. IEEE Systems Journal 11, 1 (2015), 118–127.Google ScholarGoogle ScholarCross RefCross Ref
  26. M. Shamim Hossain, Syed Umar Amin, Mansour Alsulaiman, and Ghulam Muhammad. 2019. Applying deep learning for epilepsy seizure detection and brain mapping visualization. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15, 1s (2019), 1–17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Shamim Hossain, Ghulam Muhammad, and Atif Alamri. 2019. Smart healthcare monitoring: A voice pathology detection paradigm for smart cities. Multimedia Systems 25, 5 (2019), 565–575.Google ScholarGoogle ScholarCross RefCross Ref
  28. M. Shamim Hossain, Ghulam Muhammad, and Nadra Guizani. 2020. Explainable AI and mass surveillance system-based healthcare framework to combat COVID-19 like pandemics. IEEE Network 34, 4 (2020), 1–7.Google ScholarGoogle ScholarCross RefCross Ref
  29. Rumana Islam, Mohammed Tarique, and Esam Abdel-Raheem. 2020. A survey on signal processing based pathological voice detection techniques. IEEE Access 8 (2020), 66749–66776.Google ScholarGoogle ScholarCross RefCross Ref
  30. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In 22nd ACM International Conference on Multimedia. ACM, 675–678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS'12), Vol. 1, 1097–1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ben Barsties v. Latoszek, Nora Ulozaitė-Stanienė, Youri Maryn, Tadas Petrauskas, and Virgilijus Uloza. 2019. The influence of gender and age on the acoustic voice quality index and dysphonia severity index: A normative study. Journal of Voice 33, 3 (2019), 340–345.Google ScholarGoogle ScholarCross RefCross Ref
  33. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324.Google ScholarGoogle ScholarCross RefCross Ref
  34. Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. 2017. A survey on deep learning in medical image analysis. Medical Image Analysis 42 (2017), 60–88.Google ScholarGoogle ScholarCross RefCross Ref
  35. Xiaoxuan Liu, Livia Faes, Aditya U. Kale, Siegfried K. Wagner, Dun Jack Fu, Alice Bruynseels, Thushika Mahendiran, Gabriella Moraes, Mohith Shamdas, Christoph Kern, et al. 2019. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: A systematic review and meta-analysis. The Lancet Digital Health 1, 6 (2019), e271–e297.Google ScholarGoogle ScholarCross RefCross Ref
  36. Leonardo Wanderley Lopes, Layssa Batista Simões, Jocélio Delfino da Silva, Deyverson da Silva Evangelista, Ana Celiane da Nóbrega e Ugulino, Priscila Oliveira Costa Silva, and Vinícius Jefferson Dias Vieira. 2017. Accuracy of acoustic analysis measurements in the evaluation of patients with different laryngeal diagnoses. Journal of Voice 31, 3 (2017), 382.e15–382.e26.Google ScholarGoogle ScholarCross RefCross Ref
  37. A. Ricci Maccarini and E. Lucchini. 2002. La valutazione soggettiva ed oggettiva della disfonia. Il protocollo SIFEL. Acta Phoniatrica Latina 24, 1/2 (2002), 13–42.Google ScholarGoogle Scholar
  38. Tamer A. Mesallam, Mohamed Farahat, Khalid H. Malki, Mansour Alsulaiman, Zulfiqar Ali, Ahmed Al-Nasheri, and Ghulam Muhammad. 2017. Development of the arabic voice pathology database and its evaluation by using speech features and machine learning algorithms. Journal of Healthcare Engineering 2017 (2017), 1–13.Google ScholarGoogle ScholarCross RefCross Ref
  39. Mazin Abed Mohammed, Karrar Hameed Abdulkareem, Salama A. Mostafa, Mohd Khanapi Abd Ghani, Mashael S. Maashi, Begonya Garcia-Zapirain, Ibon Oleagordia, Hosam Alhakami, and Fahad Taha AL-Dhief. 2020. Voice pathology detection and classification using convolutional neural network model. Applied Sciences 10, 11 (2020), 3723.Google ScholarGoogle ScholarCross RefCross Ref
  40. Ghulam Muhammad, Mohammed F. Alhamid, Mansour Alsulaiman, and Brij Gupta. 2018. Edge computing with cloud for voice disorder assessment and treatment. IEEE Communications Magazine 56, 4 (2018), 60–65.Google ScholarGoogle ScholarCross RefCross Ref
  41. Khan Muhammad, Salman Khan, Javier Del Ser, and Victor Hugo C. de Albuquerque. 2020. Deep learning for multigrade brain tumor classification in smart healthcare systems: A prospective survey. IEEE Transactions on Neural Networks and Learning Systems (2020), 1–8.Google ScholarGoogle Scholar
  42. Juan Rafael Orozco-Arroyave, Julián David Arias-Londoño, Jesús Francisco Vargas-Bonilla, María Claudia Gonzalez-Rátiva, and Elmar Nöth. 2014. New Spanish speech corpus database for the analysis of people suffering from Parkinson's disease. In Ninth International Conference on Language Resources and Evaluation (LREC'14). European Language Resources Association (ELRA), 342–347.Google ScholarGoogle Scholar
  43. Manfred Pützer and Jacques Koreman. 1997. A German database of patterns of pathological vocal fold vibration. Phonus 3 (1997), 143–153.Google ScholarGoogle Scholar
  44. Alice Rueda and Sridhar Krishnan. 2019. Augmenting dysphonia voice using Fourier-based synchrosqueezing transform for a CNN classifier. In 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'19). IEEE, 6415–6419.Google ScholarGoogle ScholarCross RefCross Ref
  45. Brahim Sabir, Fatima Rouda, Yassine Khazri, Bouzekri Touri, and Mohamed Moussetad. 2017. Improved algorithm for pathological and normal voices identification. International Journal of Electrical and Computer Engineering 7, 1 (2017), 238.Google ScholarGoogle Scholar
  46. Giovanna Sannino, Ivanoe De Falco, and Giuseppe De Pietro. 2018. A continuous noninvasive arterial pressure (CNAP) approach for health 4.0 systems. IEEE Transactions on Industrial Informatics 15, 1 (2018), 498–506.Google ScholarGoogle ScholarCross RefCross Ref
  47. Marcus A. G. Santos, Roberto Munoz, Rodrigo Olivares, Pedro P. Rebouças Filho, Javier Del Ser, and Victor Hugo C. de Albuquerque. 2020. Online heart monitoring systems on the internet of health things environments: A survey, a reference model and an outlook. Information Fusion 53 (2020), 222–239.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. R. T. Sataloff, K. M. Kost, and S. E. Linville. 2005. The effects of age on the voice. In Vocal Health and Pedagogy - Science, Assessment and Treatment (3rd Edition), R. T. Sataloff (Ed.). Plural Publishing, Inc., San Diego, 319–338.Google ScholarGoogle Scholar
  49. Valson Sheyona and Usha Devadas. 2020. The prevalence and impact of voice problems in nonprofessional voice users: Preliminary findings. Journal of Voice (2020).Google ScholarGoogle Scholar
  50. Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations (ICLR'15). 1–14.Google ScholarGoogle Scholar
  51. Joseph R. Spiegel, Robert Thayer Sataloff, and Kate A. Emerich. 1997. The young adult voice. Journal of Voice 11, 2 (1997), 138–143.Google ScholarGoogle ScholarCross RefCross Ref
  52. Zoë Thijs, Kristie Knickerbocker, and Christopher R. Watts. 2020. Epidemiological patterns and treatment outcomes in a private practice community voice clinic. Journal of Voice (2020).Google ScholarGoogle Scholar
  53. Trinh Nam and Darragh O'Brien. 2019. Pathological speech classification using a convolutional neural network. In Irish Machine Vision and Image Processing Conference (IMVIP'19). Technological University Dublin, Dublin, Ireland, 72–75.Google ScholarGoogle Scholar
  54. Laura Verde, Giuseppe De Pietro, and Giovanna Sannino. 2018. Voice disorder identification by using machine learning techniques. IEEE Access 6 (2018), 16246–16255.Google ScholarGoogle ScholarCross RefCross Ref
  55. Huiyi Wu, John Soraghan, Anja Lowit, and Gaetano Di Caterina. 2018. Convolutional neural networks for pathological voice detection. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC'18). IEEE, 1–4.Google ScholarGoogle ScholarCross RefCross Ref
  56. Huiyi Wu, John Soraghan, Anja Lowit, and Gaetano Di Caterina. 2018. A deep learning method for pathological voice detection using convolutional deep belief networks. In Interspeech 2018. ISCA, 446–450.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Deep Learning Approach for Voice Disorder Detection for Smart Connected Living Environments

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Internet Technology
          ACM Transactions on Internet Technology  Volume 22, Issue 1
          February 2022
          717 pages
          ISSN:1533-5399
          EISSN:1557-6051
          DOI:10.1145/3483347
          • Editor:
          • Ling Liu
          Issue’s Table of Contents

          Copyright © 2021 Association for Computing Machinery.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 October 2021
          • Accepted: 1 November 2020
          • Revised: 1 October 2020
          • Received: 1 August 2020
          Published in toit Volume 22, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!