skip to main content
research-article

Impact of Feature Extraction and Feature Selection Algorithms on Punjabi Speech Emotion Recognition Using Convolutional Neural Network

Authors Info & Claims
Published:29 April 2022Publication History
Skip Abstract Section

Abstract

As a challenge to refine the spontaneity and productivity of a machine and human coherence, speech emotion recognition has been an overriding area of research. The trustability and fulfillment of emotion recognition are largely involved with the feature extraction and selection processes. An important role is played in exploring and distinguishing audio content during the feature extraction phase. Also, the features that have been extracted should be resilient to a number of disturbances and reliable enough for an adequate classification system. This article focuses on three main components of a Speech Emotion Recognition (SER) process. The first one is the optimal feature extraction method for a Punjabi SER system. The second one is the use of an appropriate feature selection method that selects effectual features from the ones extracted in the first step and removes the redundant features to improve the conduct of emotion recognition. The third one is the classification model that has been used further for emotion recognition. So the scope of this article is to explain the three main steps of the Punjabi SER system: feature extraction, feature selection, and emotion recognition with classifier. The results have been calculated and compared for number of feature set combinations, with and without a feature selection process. A total of 10 experiments are carried out, and various performance metrics such as precision, recall, F1-score, accuracy, and so on, are used to demonstrate the results.

REFERENCES

  1. [1] S. S. Bawa, M. Kumar, and Sangeeta. 2021. A comprehensive survey on machine translation for English, Hindi and Sanskrit languages. J. Ambient Intell. Human. Comput. (2021). DOI:Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Kumar Y., Singh N., Kumar M., and Singh A.. 2021. AutoSSR: An efficient approach for automatic spontaneous speech recognition model for the Punjabi Language. Soft Comput. 25, 2 (2021), 16171630. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Nicholson J., Takahashi K., and Nakatsu R.. 2000. Emotion recognition in speech using neural networks. Neural Comput. Appl. 9, 4 (2000), 290296. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] El Ayadi M., Kamel M. S., and Karray F.. 2011. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recogn. 44, 3 (2011), 572587. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Özseven T.. 2019. A novel feature selection method for speech emotion recognition. Appl. Acoust. 146 (2019), 320326. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Rong J., Li G., and Chen Y. P. P.. 2009. Acoustic feature selection for automatic emotion recognition from speech. Inf. Process. Manage. 45, 3 (2009), 315328. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Liu Z. T., Wu M., Cao W. H., Mao J. W., Xu J. P., and Tan G. Z.. 2018. Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing 273 (2018), 271280. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Fairbanks G. and Hoaglin L. W.. 1941. An experimental study of the durational characteristics of the voice during the expression of emotion. Speech Monogr. 8, 1 (1941), 8590. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Pawar M. D. and Kokate R. D.. 2021. Convolution neural network based automatic speech emotion recognition using mel-frequency cepstrum coefficients. Multimedia Tools Appl. 80 (2021), 1556315587. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Kerkeni L., Serrestou Y., Raoof K., Mbarki M., Mahjoub M. A., and Cleder C.. 2019. Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO. Speech Commun. 114 (2019), 2235. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Singh A., Kadyan V., Kumar M., and Bassan N.. 2020. ASRoIL: A comprehensive survey for automatic speech recognition of Indian languages. Artif. Intell. Rev. 53, 5 (2020), 36733704. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Chen B., Yin Q., and Guo P.. 2014. A study of deep belief network based chinese speech emotion recognition. In Proceedings of the 10th International Conference on Computational Intelligence and Security (CIS’14). 180184. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Milton A. and Tamil Selvi S.. 2014. Class-specific multiple classifiers scheme to recognize emotions from speech signals. Comput Speech Lang. 28, 3 (2014), 727742. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Albornoz E. M. and Milone D. H.. 2017. Emotion recognition in never-seen languages using a novel ensemble method with emotion profiles. IEEE Trans. Affect. Comput. 8, 1 (2017), 4353. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Savargiv M. and Bastanfard A.. 2015. Persian speech emotion recognition. In Proceedings of the 7th Conference on Information and Knowledge Technology (IKT’15), 15. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Jin Y., Zha C., Zhao L., and Song P.. 2015. Speech emotion recognition method based on hidden factor analysis. Electr. Lett. 51, 1 (2015), 112114. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Yang N. et al. 2017. Enhanced multiclass SVM with thresholding fusion for speech-based emotion classification. Int. J. Speech Technol. 20, 1 (2017), 2741. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Majkowski A., Kołodziej M., Rak R. J., and Korczynski R.. 2016. Classification of emotions from speech signal. In Proceedings of the Signal Processing: Algorithms, Architectures, Arrangements, and Applications Conference (SPA’16). 276281. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Meftah A., Alotaibi Y., and Selouani S.-A.. 2016. Emotional speech recognition: A multilingual perspective. In Proceedings of the International Conference on Bio-Engineering for Smart Technologies (Biosmart’16).Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Rajoo R. and Aun C. C.. 2016. Influences of languages in speech emotion recognition: A comparative study using Malay, English and Mandarin languages. In Proceedings of the IEEE Symposium on Computer Applications and Industrial Electronics (ISCAIE’16) 3539. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Cao H., Verma R., and Nenkova A.. 2015. Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech. Comput. Speech Lang. 29, 1 (2015), 186202. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Wang K., An N., Li B. N., Zhang Y., and Li L.. 2015. Speech emotion recognition using Fourier parameters. IEEE Trans. Affect. Comput. (2015).Google ScholarGoogle Scholar
  23. [23] Palo H. K., Mohanty M. N., and Chandra M.. 2016. Efficient feature combination techniques for emotional speech classification. Int. J. Speech Technol. 19, 1 (2016), 135150. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Kandali A. B., Routray A., and Basu T. K.. 2008. Emotion recognition from assamese speeches using MFCC features and GMM classifier. In Proceedings of the IEEE Region 10 Annual International Conference (TENCON’08). DOI:Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Swain M., Routray A., Kabisatpathy P., and Kundu J. N.. 2017. Study of prosodic feature extraction for multidialectal Odia speech emotion recognition. In Proceedings of the IEEE Region 10 Annual International Conference (TENCON’17). 1644—1649. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Ram C. S. and Ponnusamy R.. 2014. An effective automatic speech emotion recognition for tamil language using support vector machine. In Proceedings of the International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT’14). 1923. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Mohanta A. and Sharma U.. 2016. Bengali speech emotion recognition. In Proceedings of the 3rd International Conference on Computing for Sustainable Global Development (INDIACom’16). 28122814.Google ScholarGoogle Scholar
  28. [28] Rajisha T. M., Sunija A. P., and Riyas K. S.. 2016. Performance analysis of malayalam language speech emotion recognition system using ANN/SVM. Proc. Technol. 24 (2016), 10971104. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Koolagudi S. G. and Rao K. S.. 2012. Emotion recognition from speech using source, system, and prosodic features. Int. J. Speech Technol. 15, 2 (2012), 265289. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Krothapalli S. R. and Koolagudi S. G.. 2013. Characterization and recognition of emotions from speech using excitation source information. Int. J. Speech Technol. 16, 2 (2013), 181201. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Koolagudi S. G., Reddy R., Yadav J., and Rao K. S.. 2011. IITKGP-SEHSC : Hindi speech corpus for emotion analysis. In Proceedings of the International Conference on Devices and Communications (ICDeCom’11). DOI:Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Agrawal S. S.. 2011. Emotions in Hindi speech—Analysis, perception and recognition. In Proceedings of the Oriental International Conference on Speech Database and Assessments (O-COCOSDA’11) 713, 2011. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Bansal S. and Dev A.. 2015. Emotional Hindi speech: Feature extraction and classification. In Proceedings of the 2nd International Conference on Computing for Sustainable Global Development (INDIACom’15). 18651868.Google ScholarGoogle Scholar
  34. [34] Kaur K. and Singh P.. 2021. Punjabi emotional speech database: design, recording and verification. Int. J. Intell. Syst. Appl. Eng. 9, 4 (2021). DOI:Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Bhattacharyya S. et al. 2018. Speech background noise removal using different linear filtering techniques. In Advanced Computational and Communication Paradigms. Lecture Notes in Electrical Engineering, S. Bhattacharyya, T. Gandhi, K. Sharma, and P. Dutta (Eds.), vol 475. Springer, Singapore. Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Ma M., Wang M., and Hu J.. 2017. Research on adaptive acoustic echo cancellation algorithm in digital hearing AIDS. In AIP Conference Proceedings. 1864. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Vihari S., Murthy A. S., Soni P., and Naik D. C.. 2016. Comparison of speech enhancement algorithms. Proc. Comput. Sci. 89 (2016), 666676. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Adeel T. M. A. and Hussain A.. 2018. A survey on techniques for enhancing speech. Int. J. Comput. Appl. 179, 17 (2018), 114. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Cowie R. et al. 2001. Emotion recognition in human-computer interaction. IEEE Sign. Process. Mag. 18, 1 (2001), 3280. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Rao K. S. and Yegnanarayana B.. 2006. Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14, 3 (2006), 972980. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Kuchibhotla S., Deepthi H., Koteswara V., and Anne R.. 2016. An optimal two stage feature selection for speech emotion recognition using acoustic features. Int. J. Speech Technol. 19, 4 (2016), 657667. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Kuchibhotla S., Vankayalapati H. D., Vaddi R. S., and Anne K. R.. 2014. A comparative analysis of classifiers in emotion recognition through acoustic features. International Journal of Speech Technology 17, 4 (2014), 401408. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Muda L., Begam M., and Elamvazuthi I.. 2010. Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. J. Comput. 2 (2010).Google ScholarGoogle Scholar
  44. [44] Er M. B.. 2020. A novel approach for classification of speech emotions based on deep and acoustic features. IEEE Access 8 (2020). DOI:Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Pathak B. V., Patil D. R., More S. D., and Mhetre N. R.. 2019. Comparison between five classification techniques for classifying emotions in human speech. In Proceedings of the International Conference on Intelligent Computing and Control Systems (ICCS’19). 201207. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Farhoudi Z., Setayeshi S., and Rabiee A.. 2017. Using learning automata in brain emotional learning for speech emotion recognition. Int. J. Speech Technol. 20, 3 (2017), 553562. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Petrushin V.. 2000. Emotion recognition in speech signal: Experimental study, development, and application. In Proceedings of the International Conference on Spoken Language Processing (ICSLP’00 /INTERSPEECH’00). 222225.Google ScholarGoogle Scholar
  48. [48] Koolagudi S. G., Murthy Y. V. S., and Bhaskar S. P.. 2018. Choice of a classifier, based on properties of a dataset: Case study-speech emotion recognition. Int. J. Speech Technol. 21, 1 (2018), 167183. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Toh A., Togneri R., and Nordholm S.. 2005. Spectral entropy as speech features for speech recognition. In Proceedings of the International Conference on Power, Energy, Environment and Computer Science (PEECS’05).Google ScholarGoogle Scholar
  50. [50] Gupta H. and Gupta D.. 2016. LPC and LPCC method of feature extraction in speech recognition system. In Proceedings of the 6th International Conference on Cloud System and Big Data Engineering (Confluence’16). 498—502. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Prabakaran D. and Shyamala R.. 2019. A review on performance of voice feature extraction techniques. In Proceedings of the 3rd International Conference on Computing and Communications Technologies (ICCCT’19) 221231. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Bankert L.. 1994. Feature selection for case-based classification of cloud types: An empirical comparison.Google ScholarGoogle Scholar
  53. [53] O. Pratiwi, B. Rahardjo, and S. Supangkat. 2015. Attribute selection based on information gain for automatic grouping student system. Proceedings of Communications in Computer and Information Science 516 (2015), 205--211. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Ugoni A. and Walker B.. 1995. The chi square test: An introduction. COMSIG Review/COMSIG, Chiropractors and Osteopaths Musculo-Skeletal Interest Group 4 (1995), 6164.Google ScholarGoogle Scholar
  55. [55] Wu G. and Li F.. 2021. A randomized exponential canonical correlation analysis method for data analysis and dimensionality reduction. Appl. Numer. Math. 164 (2021), 101124. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Mao K. Z.. 2004. Orthogonal forward selection and backward elimination algorithms for feature subset selection. IEEE Trans. Syst. Man Cybernet. B 34, 1 (2004), 629634. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Chen X. and Jeong J. C.. 2007. Enhanced recursive feature elimination. In Proceedings of the 6th International Conference on Machine Learning and Applications (ICMLA’07). 429435. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Fonti V.. 2017. Feature selection using LASSO. VU Amsterdam 126.Google ScholarGoogle Scholar
  59. [59] Jaiswal J. K. and Samikannu R.. 2017. Application of random forest algorithm on feature subset selection and classification and regression. In Proceedings of the World Congress on Computing and Communication Technologies (WCCCT’17). 6568. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Kumar M., Jindal M. K., Sharma R. K., and Jindal S. R.. 2020. Performance evaluation of classifiers for the recognition of offline handwritten gurmukhi characters and numerals: A study. Artif. Intell. Rev. 53, 3 (2020), 20752097. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Burges C. J. C.. 1998. A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2, 2 (1998), 121167. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. [62] Chen Y. and Xie J.. 2012. Emotional speech recognition based on SVM with GMM supervector. J. Electr. (Chin.) 29 (2012). DOI:Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Rabiner L. R.. 1989. A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77, 2 (1989), 257286. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Manjunath R.. 2013. Dimensionality reduction and classification of color features data using svm and knn. Int. J. Image Process. Vis. Commun. 1 (2013), 1621.Google ScholarGoogle Scholar
  65. [65] Wu J.. 2017. Introduction to convolutional neural networks. In Introduction to Convolutional Neural Networks. 131.Google ScholarGoogle Scholar
  66. [66] Hochreiter S. and Schmidhuber J.. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 17351780. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. [67] Graves A., Mohamed A., and Hinton G.. 2013. Speech recognition with deep recurrent neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’13) 66456649. DOI:Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Impact of Feature Extraction and Feature Selection Algorithms on Punjabi Speech Emotion Recognition Using Convolutional Neural Network

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 5
      September 2022
      486 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3533669
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 April 2022
      • Online AM: 9 March 2022
      • Revised: 1 January 2022
      • Accepted: 1 January 2022
      • Received: 1 November 2021
      Published in tallip Volume 21, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!