Abstract
Case studies of group discussions are considered an effective way to assess communication skills (CS). This method can help researchers evaluate participants’ engagement with each other in a specific realistic context. In this article, multimodal analysis was performed to estimate CS indices using a three-task-type group discussion dataset, the MATRICS corpus. The current research investigated the effectiveness of engaging both static and time-series modeling, especially in task-independent settings. This investigation aimed to understand three main points: first, the effectiveness of time-series modeling compared to nonsequential modeling; second, multimodal analysis in a task-independent setting; and third, important differences to consider when dealing with task-dependent and task-independent settings, specifically in terms of modalities and prediction models. Several modalities were extracted (e.g., acoustics, speaking turns, linguistic-related movement, dialog tags, head motions, and face feature sets) for inferring the CS indices as a regression task. Three predictive models, including support vector regression (SVR), long short-term memory (LSTM), and an enhanced time-series model (an LSTM model with a combination of static and time-series features), were taken into account in this study. Our evaluation was conducted by using the R2 score in a cross-validation scheme. The experimental results suggested that time-series modeling can improve the performance of multimodal analysis significantly in the task-dependent setting (with the best R2 = 0.797 for the total CS index), with word2vec being the most prominent feature. Unfortunately, highly context-related features did not fit well with the task-independent setting. Thus, we propose an enhanced LSTM model for dealing with task-independent settings, and we successfully obtained better performance with the enhanced model than with the conventional SVR and LSTM models (the best R2 = 0.602 for the total CS index). In other words, our study shows that a particular time-series modeling can outperform traditional nonsequential modeling for automatically estimating the CS indices of a participant in a group discussion with regard to task dependency.
- [1] . 2003. Handbook of Communication and Social Interaction Skills. Lawrence Erlbaum Associates Publishers.
DOI: https://doi.org/10.4324/9781410607133Google Scholar - [2] . 2002. The Harvard Business School Guide to Careers in Management Consulting. Harvard Business School Pr.Google Scholar
- [3] . 2013. One of a kind: Inferring personality impressions in meetings. In Proceedings of the ICMI. 11–18.
DOI: DOI: http://dx.doi.org/10.1145/2522848.2522859 Google ScholarCross Ref
- [4] . 2012. FaceTube: Predicting personality from facial expressions of emotion in online conversational video. In Proceedings of the ICMI. 53–56.
DOI: DOI: http://dx.doi.org/10.1145/2388676.2388689 Google ScholarCross Ref
- [5] . 2019. Multimodal bigfive personality trait analysis using communication skill indices and multiple discussion types dataset. In Proceedings of the HCI International. 370–383.
DOI: DOI: http://dx.doi.org/10.1007/978-3-030-21902-4_27Google ScholarCross Ref
- [6] . 2012. A nonverbal behavior approach to identify emergent leaders in small groups. IEEE Trans. Multimedia 14 (2012), 816–832.
DOI: DOI: http://dx.doi.org/10.1109/TMM.2011.2181941 Google ScholarCross Ref
- [7] . 2012. Multimodal prediction of expertise and leadership in learning groups. In Proceedings of the MLA.
DOI: DOI: http://dx.doi.org/10.1145/2389268.2389269 Google ScholarCross Ref
- [8] . 2015. Evaluating speech, face, emotion and body movement time-series features for automated multimodal presentation scoring. In Proceedings of the ICMI. 23–30.
DOI: DOI: http://dx.doi.org/10.1145/2818346.2820765 Google ScholarCross Ref
- [9] . 2012. Annotation and recognition of personality traits in spoken conversations from the AMI meetings corpus. In Proceedings of INTERSPEECH.Google Scholar
Cross Ref
- [10] . 2017. Annotating and modeling empathy in spoken conversations.
arxiv:1705.04839 . Google ScholarDigital Library
- [11] . 2014. Computational analysis of persuasiveness in social multimedia. In Proceedings of the ICMI. http://dx.doi.org/10.1145/2663204.2663260 Google Scholar
Digital Library
- [12] . 2016. Estimating communication skills using dialogue acts and nonverbal features in multiple discussion datasets. In Proceedings of the ICMI. 169–176.
DOI: DOI: http://dx.doi.org/10.1145/2993148.2993154 Google ScholarCross Ref
- [13] . 2014. Predicting influential statements in group discussions using speech and head motion information. In Proceedings of the ICMI. 136–143.
DOI: DOI: http://dx.doi.org/10.1145/2663204.2663248 Google ScholarCross Ref
- [14] . 1965. The conduct of inquiry: Methodology for behavioral science. Soc. Forces 44, 1 (1965), 126–127.Google Scholar
Cross Ref
- [15] . 2019. The Handbook of Communication Skills. Routledge, London, UK.Google Scholar
- [16] . 2011. Listening and message interpretation. Int. J. Listen. 25 (
01 2011), 47–65.DOI: DOI: http://dx.doi.org/10.1080/10904018.2011.536471Google ScholarCross Ref
- [17] . 1972. On Communicative Competence. Penguin, Harmondsworth, UK.Google Scholar
- [18] . 1990. Fundamental Considerations in Language Testing. Oxford University Press, Oxford, UK.Google Scholar
- [19] . 2018. Automated analysis and prediction of job interview performance. IEEE Trans. Affect. Comput. 9, 2 (2018), 191–204.
DOI: DOI: http://dx.doi.org/10.1109/TAFFC.2016.2614299Google ScholarCross Ref
- [20] . 2015. Automatic recognition of emergent social roles in small group interactions. IEEE Trans. Multimedia 17, 5 (2015), 746–760.
DOI: DOI: http://dx.doi.org/10.1109/TMM.2015.2408437Google ScholarCross Ref
- [21] . 2018. Automatic assessment of communication skill in interview-based interactions. Multimedia Tools Appl. 77, 14 (2018), 18709–18739.
DOI: DOI: http://dx.doi.org/10.1007/s11042-018-5654-9 Google ScholarCross Ref
- [22] . 2019. Recent trends in deep learning based personality detection. Artific. Intell. Rev. 53 (2019).
DOI: DOI: http://dx.doi.org/10.1007/s10462-019-09770-zGoogle Scholar - [23] . 2018. Using interlocutor-modulated attention BLSTM to predict personality traits in small group interaction. In Proceedings of the ICMI. 163–169.
DOI: DOI: http://dx.doi.org/10.1145/3242969.3243001 Google ScholarCross Ref
- [24] . 2019. A sequential data analysis approach to detect emergent leaders in small groups. IEEE Trans. Multimedia (2019).
DOI: DOI: http://dx.doi.org/10.1109/TMM.2019.2895505Google ScholarCross Ref
- [25] . 2002. Communication under the microscope: The theory and practice of microanalysis. Routledge.
DOI: https://doi.org/10.4324/9780203408025Google Scholar - [26] . 2010. openSMILE—The Munich versatile and fast open-source audio feature extractor. In Proceedings of the ACM Multimedia. 1459–1462.
DOI: DOI: http://dx.doi.org/10.1145/1873951.1874246 Google ScholarCross Ref
- [27] . 2012. The INTERSPEECH 2012 speaker trait challenge. In Proceedings of INTERSPEECH.Google Scholar
Cross Ref
- [28] . 2014. A survey on perceived speaker traits: Personality, likability, pathology and the first challenge. Comput. Speech Lang. 29 (2014).
DOI: DOI: http://dx.doi.org/10.1016/j.csl.2014.08.003Google Scholar - [29] . 2004. Applying conditional random fields to Japanese morphological analysis. In Proceedings of EMNLP. 230–237.Google Scholar
- [30] . 2014. Distributed representations of sentences and documents. Proceedings of the 31st International Conference on Machine Learning 32, 2 (2014), 1188–1196. Google Scholar
Digital Library
- [31] . 2013. Efficient estimation of word representations in vector space. 1–12.
arxiv:1301.3781 .Google Scholar - [32] . 1997. Coding dialogs with the DAMSL annotation scheme. In AAAI Fall Symposium on Communicative Action in Humans and Machines, Vol. 56. 28–35.Google Scholar
- [33] . 2004. The ICSI meeting recorder dialogue act (MRDA) corpus. SIGdial (Apr. 2004).Google Scholar
- [34] . 2018. OpenFace 2.0: Facial behavior analysis toolkit. In Proceeding of the FG. 59–66.
DOI: DOI: http://dx.doi.org/10.1109/FG.2018.00019Google ScholarCross Ref
- [35] . 2001. Recognizing action units for facial expression analysis. IEEE Trans. Pattern Anal. Mach. Intell. 23 (2001), 97–115.
DOI: DOI: http://dx.doi.org/10.1109/34.908962 Google ScholarCross Ref
- [36] . 2005. Recognizing facial expression: Machine learning and application to spontaneous behavior. In Proceeding of the IEEE CVPR. 568–573.
DOI: DOI: http://dx.doi.org/10.1109/CVPR.2005.297 Google ScholarCross Ref
- [37] . 2015. Rendering of eyes for eye-shape registration and gaze estimation. In Proceedings of the IEEE ICCV. 3756–3764.
DOI: DOI: http://dx.doi.org/10.1109/ICCV.2015.428 Google ScholarCross Ref
- [38] . 2017. Convolutional experts constrained local model for facial landmark detection. In Proceedings of the IEEE CVPRW. 2051–2059.
DOI: DOI: http://dx.doi.org/10.1109/CVPRW.2017.256Google Scholar - [39] . 2006. Robots in the wild: Observing human-robot social interaction outside the lab. In Proceedings of the IEEE AMC. 596–601.
DOI: DOI: http://dx.doi.org/10.1109/AMC.2006.1631758Google ScholarCross Ref
- [40] . 1997. Long short-term memory. Neural Comput. 9 (1997), 1735–80.
DOI: DOI: http://dx.doi.org/10.1162/neco.1997.9.8.1735 Google ScholarCross Ref
- [41] . 2014. Hire me: Computational inference of hirability in employment interviews based on nonverbal behavior. IEEE Trans. Multimedia 16 (2014), 1018–1031.
DOI: DOI: http://dx.doi.org/10.1109/TMM.2014.2307169 Google ScholarCross Ref
- [42] . 2003. Support vector regression machines. Adv. Neural Info. Process. Syst. 9 (2003).Google Scholar
- [43] . 2016. Estimating communication skills based on multimodal information in group discussions. Trans. Japan. Soc. Artific. Intell. 31 (2016), AI30–E_1.
DOI: DOI: http://dx.doi.org/10.1527/tjsai.AI30-EGoogle ScholarCross Ref
- [44] . 2002. Gene selection for cancer classification using support vector machines. Mach. Learn. 46 (2002), 389–422.
DOI: DOI: http://dx.doi.org/10.1023/A:1012487302797 Google ScholarDigital Library
- [45] . 2012. Analyzing the memory of BLSTM neural networks for enhanced emotion classification in dyadic spoken interactions. In Proceedings of the ICASSP. 4157–4160.
DOI: DOI: http://dx.doi.org/10.1109/ICASSP.2012.6288834Google ScholarCross Ref
Index Terms
Task-independent Recognition of Communication Skills in Group Interaction Using Time-series Modeling
Recommendations
Estimating communication skills using dialogue acts and nonverbal features in multiple discussion datasets
ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal InteractionThis paper focuses on the computational analysis of the individual communication skills of participants in a group. The computational analysis was conducted using three novel aspects to tackle the problem. First, we extracted features from dialogue (...
Time series forecasting by a seasonal support vector regression model
The support vector regression (SVR) model is a novel forecasting approach and has been successfully used to solve time series problems. However, the applications of SVR models in a seasonal time series forecasting has not been widely investigated. This ...
A task independent oral dialogue model
EACL '91: Proceedings of the fifth conference on European chapter of the Association for Computational LinguisticsThis paper presents a human-machine dialogue model in the field of task-oriented dialogues. The originality of this model resides in the clear separation of dialogue knowledge from task knowledge in order to facilitate for the modeling of dialogue ...






Comments