Abstract
As a challenge to refine the spontaneity and productivity of a machine and human coherence, speech emotion recognition has been an overriding area of research. The trustability and fulfillment of emotion recognition are largely involved with the feature extraction and selection processes. An important role is played in exploring and distinguishing audio content during the feature extraction phase. Also, the features that have been extracted should be resilient to a number of disturbances and reliable enough for an adequate classification system. This article focuses on three main components of a Speech Emotion Recognition (SER) process. The first one is the optimal feature extraction method for a Punjabi SER system. The second one is the use of an appropriate feature selection method that selects effectual features from the ones extracted in the first step and removes the redundant features to improve the conduct of emotion recognition. The third one is the classification model that has been used further for emotion recognition. So the scope of this article is to explain the three main steps of the Punjabi SER system: feature extraction, feature selection, and emotion recognition with classifier. The results have been calculated and compared for number of feature set combinations, with and without a feature selection process. A total of 10 experiments are carried out, and various performance metrics such as precision, recall, F1-score, accuracy, and so on, are used to demonstrate the results.
- [1] S. S. Bawa, M. Kumar, and Sangeeta. 2021. A comprehensive survey on machine translation for English, Hindi and Sanskrit languages. J. Ambient Intell. Human. Comput. (2021).
DOI: Google ScholarCross Ref
- [2] . 2021. AutoSSR: An efficient approach for automatic spontaneous speech recognition model for the Punjabi Language. Soft Comput. 25, 2 (2021), 1617–1630.
DOI: Google ScholarDigital Library
- [3] . 2000. Emotion recognition in speech using neural networks. Neural Comput. Appl. 9, 4 (2000), 290–296.
DOI: Google ScholarCross Ref
- [4] . 2011. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recogn. 44, 3 (2011), 572–587.
DOI: Google ScholarDigital Library
- [5] . 2019. A novel feature selection method for speech emotion recognition. Appl. Acoust. 146 (2019), 320–326.
DOI: Google ScholarCross Ref
- [6] . 2009. Acoustic feature selection for automatic emotion recognition from speech. Inf. Process. Manage. 45, 3 (2009), 315–328.
DOI: Google ScholarDigital Library
- [7] . 2018. Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing 273 (2018), 271–280.
DOI: Google ScholarCross Ref
- [8] . 1941. An experimental study of the durational characteristics of the voice during the expression of emotion. Speech Monogr. 8, 1 (1941), 85–90.
DOI: Google ScholarCross Ref
- [9] . 2021. Convolution neural network based automatic speech emotion recognition using mel-frequency cepstrum coefficients. Multimedia Tools Appl. 80 (2021), 15563–15587.
DOI: Google ScholarDigital Library
- [10] . 2019. Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO. Speech Commun. 114 (2019), 22–35.
DOI: Google ScholarDigital Library
- [11] . 2020. ASRoIL: A comprehensive survey for automatic speech recognition of Indian languages. Artif. Intell. Rev. 53, 5 (2020), 3673–3704.
DOI: Google ScholarDigital Library
- [12] . 2014. A study of deep belief network based chinese speech emotion recognition. In Proceedings of the 10th International Conference on Computational Intelligence and Security (CIS’14). 180–184.
DOI: Google ScholarDigital Library
- [13] . 2014. Class-specific multiple classifiers scheme to recognize emotions from speech signals. Comput Speech Lang. 28, 3 (2014), 727–742.
DOI: Google ScholarDigital Library
- [14] . 2017. Emotion recognition in never-seen languages using a novel ensemble method with emotion profiles. IEEE Trans. Affect. Comput. 8, 1 (2017), 43–53.
DOI: Google ScholarDigital Library
- [15] . 2015. Persian speech emotion recognition. In Proceedings of the 7th Conference on Information and Knowledge Technology (IKT’15), 1–5.
DOI: Google ScholarCross Ref
- [16] . 2015. Speech emotion recognition method based on hidden factor analysis. Electr. Lett. 51, 1 (2015), 112–114.
DOI: Google ScholarCross Ref
- [17] 2017. Enhanced multiclass SVM with thresholding fusion for speech-based emotion classification. Int. J. Speech Technol. 20, 1 (2017), 27–41.
DOI: Google ScholarDigital Library
- [18] . 2016. Classification of emotions from speech signal. In Proceedings of the Signal Processing: Algorithms, Architectures, Arrangements, and Applications Conference (SPA’16). 276–281.
DOI: Google ScholarCross Ref
- [19] . 2016. Emotional speech recognition: A multilingual perspective. In Proceedings of the International Conference on Bio-Engineering for Smart Technologies (Biosmart’16).Google Scholar
Cross Ref
- [20] . 2016. Influences of languages in speech emotion recognition: A comparative study using Malay, English and Mandarin languages. In Proceedings of the IEEE Symposium on Computer Applications and Industrial Electronics (ISCAIE’16) 35–39.
DOI: Google ScholarCross Ref
- [21] . 2015. Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech. Comput. Speech Lang. 29, 1 (2015), 186–202.
DOI: Google ScholarCross Ref
- [22] . 2015. Speech emotion recognition using Fourier parameters. IEEE Trans. Affect. Comput. (2015).Google Scholar
- [23] . 2016. Efficient feature combination techniques for emotional speech classification. Int. J. Speech Technol. 19, 1 (2016), 135–150.
DOI: Google ScholarDigital Library
- [24] . 2008. Emotion recognition from assamese speeches using MFCC features and GMM classifier. In Proceedings of the IEEE Region 10 Annual International Conference (TENCON’08).
DOI: Google ScholarCross Ref
- [25] . 2017. Study of prosodic feature extraction for multidialectal Odia speech emotion recognition. In Proceedings of the IEEE Region 10 Annual International Conference (TENCON’17). 1644—1649.
DOI: Google ScholarCross Ref
- [26] . 2014. An effective automatic speech emotion recognition for tamil language using support vector machine. In Proceedings of the International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT’14). 19–23.
DOI: Google ScholarCross Ref
- [27] . 2016. Bengali speech emotion recognition. In Proceedings of the 3rd International Conference on Computing for Sustainable Global Development (INDIACom’16). 2812–2814.Google Scholar
- [28] . 2016. Performance analysis of malayalam language speech emotion recognition system using ANN/SVM. Proc. Technol. 24 (2016), 1097–1104.
DOI: Google ScholarCross Ref
- [29] . 2012. Emotion recognition from speech using source, system, and prosodic features. Int. J. Speech Technol. 15, 2 (2012), 265–289.
DOI: Google ScholarDigital Library
- [30] . 2013. Characterization and recognition of emotions from speech using excitation source information. Int. J. Speech Technol. 16, 2 (2013), 181–201.
DOI: Google ScholarDigital Library
- [31] . 2011. IITKGP-SEHSC : Hindi speech corpus for emotion analysis. In Proceedings of the International Conference on Devices and Communications (ICDeCom’11).
DOI: Google ScholarCross Ref
- [32] . 2011. Emotions in Hindi speech—Analysis, perception and recognition. In Proceedings of the Oriental International Conference on Speech Database and Assessments (O-COCOSDA’11) 7–13, 2011.
DOI: Google ScholarCross Ref
- [33] . 2015. Emotional Hindi speech: Feature extraction and classification. In Proceedings of the 2nd International Conference on Computing for Sustainable Global Development (INDIACom’15). 1865–1868.Google Scholar
- [34] . 2021. Punjabi emotional speech database: design, recording and verification. Int. J. Intell. Syst. Appl. Eng. 9, 4 (2021).
DOI: Google ScholarCross Ref
- [35] 2018. Speech background noise removal using different linear filtering techniques. In Advanced Computational and Communication Paradigms. Lecture Notes in Electrical Engineering, S. Bhattacharyya, T. Gandhi, K. Sharma, and P. Dutta (Eds.), vol 475. Springer, Singapore. Google Scholar
Cross Ref
- [36] . 2017. Research on adaptive acoustic echo cancellation algorithm in digital hearing AIDS. In AIP Conference Proceedings. 1864.
DOI: Google ScholarCross Ref
- [37] . 2016. Comparison of speech enhancement algorithms. Proc. Comput. Sci. 89 (2016), 666–676.
DOI: Google ScholarCross Ref
- [38] . 2018. A survey on techniques for enhancing speech. Int. J. Comput. Appl. 179, 17 (2018), 1–14.
DOI: Google ScholarCross Ref
- [39] 2001. Emotion recognition in human-computer interaction. IEEE Sign. Process. Mag. 18, 1 (2001), 32–80.
DOI: Google ScholarCross Ref
- [40] . 2006. Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14, 3 (2006), 972–980.
DOI: Google ScholarDigital Library
- [41] . 2016. An optimal two stage feature selection for speech emotion recognition using acoustic features. Int. J. Speech Technol. 19, 4 (2016), 657–667.
DOI: Google ScholarDigital Library
- [42] . 2014. A comparative analysis of classifiers in emotion recognition through acoustic features. International Journal of Speech Technology 17, 4 (2014), 401–408.
DOI: Google ScholarDigital Library
- [43] . 2010. Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. J. Comput. 2 (2010).Google Scholar
- [44] . 2020. A novel approach for classification of speech emotions based on deep and acoustic features. IEEE Access 8 (2020).
DOI: Google ScholarCross Ref
- [45] . 2019. Comparison between five classification techniques for classifying emotions in human speech. In Proceedings of the International Conference on Intelligent Computing and Control Systems (ICCS’19). 201–207.
DOI: Google ScholarCross Ref
- [46] . 2017. Using learning automata in brain emotional learning for speech emotion recognition. Int. J. Speech Technol. 20, 3 (2017), 553–562.
DOI: Google ScholarDigital Library
- [47] . 2000. Emotion recognition in speech signal: Experimental study, development, and application. In Proceedings of the International Conference on Spoken Language Processing (ICSLP’00 /INTERSPEECH’00). 222–225.Google Scholar
- [48] . 2018. Choice of a classifier, based on properties of a dataset: Case study-speech emotion recognition. Int. J. Speech Technol. 21, 1 (2018), 167–183.
DOI: Google ScholarDigital Library
- [49] . 2005. Spectral entropy as speech features for speech recognition. In Proceedings of the International Conference on Power, Energy, Environment and Computer Science (PEECS’05).Google Scholar
- [50] . 2016. LPC and LPCC method of feature extraction in speech recognition system. In Proceedings of the 6th International Conference on Cloud System and Big Data Engineering (Confluence’16). 498—502.
DOI: Google ScholarCross Ref
- [51] . 2019. A review on performance of voice feature extraction techniques. In Proceedings of the 3rd International Conference on Computing and Communications Technologies (ICCCT’19) 221–231.
DOI: Google ScholarCross Ref
- [52] . 1994. Feature selection for case-based classification of cloud types: An empirical comparison.Google Scholar
- [53] O. Pratiwi, B. Rahardjo, and S. Supangkat. 2015. Attribute selection based on information gain for automatic grouping student system. Proceedings of Communications in Computer and Information Science 516 (2015), 205--211.
DOI: Google ScholarCross Ref
- [54] . 1995. The chi square test: An introduction. COMSIG Review/COMSIG, Chiropractors and Osteopaths Musculo-Skeletal Interest Group 4 (1995), 61–64.Google Scholar
- [55] . 2021. A randomized exponential canonical correlation analysis method for data analysis and dimensionality reduction. Appl. Numer. Math. 164 (2021), 101–124.
DOI: Google ScholarCross Ref
- [56] . 2004. Orthogonal forward selection and backward elimination algorithms for feature subset selection. IEEE Trans. Syst. Man Cybernet. B 34, 1 (2004), 629–634.
DOI: Google ScholarDigital Library
- [57] . 2007. Enhanced recursive feature elimination. In Proceedings of the 6th International Conference on Machine Learning and Applications (ICMLA’07). 429–435.
DOI: Google ScholarDigital Library
- [58] . 2017. Feature selection using LASSO. VU Amsterdam 1–26.Google Scholar
- [59] . 2017. Application of random forest algorithm on feature subset selection and classification and regression. In Proceedings of the World Congress on Computing and Communication Technologies (WCCCT’17). 65–68.
DOI: Google ScholarCross Ref
- [60] . 2020. Performance evaluation of classifiers for the recognition of offline handwritten gurmukhi characters and numerals: A study. Artif. Intell. Rev. 53, 3 (2020), 2075–2097.
DOI: Google ScholarCross Ref
- [61] . 1998. A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2, 2 (1998), 121–167.
DOI: Google ScholarDigital Library
- [62] . 2012. Emotional speech recognition based on SVM with GMM supervector. J. Electr. (Chin.) 29 (2012).
DOI: Google ScholarCross Ref
- [63] . 1989. A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77, 2 (1989), 257–286.
DOI: Google ScholarCross Ref
- [64] . 2013. Dimensionality reduction and classification of color features data using svm and knn. Int. J. Image Process. Vis. Commun. 1 (2013), 16–21.Google Scholar
- [65] . 2017. Introduction to convolutional neural networks. In Introduction to Convolutional Neural Networks. 1–31.Google Scholar
- [66] . 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780.
DOI: Google ScholarDigital Library
- [67] . 2013. Speech recognition with deep recurrent neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’13) 6645–6649.
DOI: Google ScholarCross Ref
Index Terms
Impact of Feature Extraction and Feature Selection Algorithms on Punjabi Speech Emotion Recognition Using Convolutional Neural Network
Recommendations
Feature selection for fast speech emotion recognition
MM '09: Proceedings of the 17th ACM international conference on MultimediaIn speech based emotion recognition, both acoustic features extraction and features classification are usually time consuming,which obstruct the system to be real time. In this paper, we proposea novel feature selection (FSalgorithm to filter out the ...
Statistical feature selection for mandarin speech emotion recognition
ICIC'05: Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part IPerformance of speech emotion recognition largely depends on the acoustic features used in a classifier. This paper studies the statistical feature selection problem in Mandarin speech emotion recognition. This study was based on a speaker dependent ...
Feature extraction algorithms to improve the speech emotion recognition rate
AbstractIn this digitally growing era speech emotion recognition plays significant role in several applications such as Human Computer Interface (HCI), lie detection, automotive system to assist steering, intelligent tutoring system, audio mining, ...






Comments