skip to main content
research-article

Task-independent Recognition of Communication Skills in Group Interaction Using Time-series Modeling

Authors Info & Claims
Published:12 November 2021Publication History
Skip Abstract Section

Abstract

Case studies of group discussions are considered an effective way to assess communication skills (CS). This method can help researchers evaluate participants’ engagement with each other in a specific realistic context. In this article, multimodal analysis was performed to estimate CS indices using a three-task-type group discussion dataset, the MATRICS corpus. The current research investigated the effectiveness of engaging both static and time-series modeling, especially in task-independent settings. This investigation aimed to understand three main points: first, the effectiveness of time-series modeling compared to nonsequential modeling; second, multimodal analysis in a task-independent setting; and third, important differences to consider when dealing with task-dependent and task-independent settings, specifically in terms of modalities and prediction models. Several modalities were extracted (e.g., acoustics, speaking turns, linguistic-related movement, dialog tags, head motions, and face feature sets) for inferring the CS indices as a regression task. Three predictive models, including support vector regression (SVR), long short-term memory (LSTM), and an enhanced time-series model (an LSTM model with a combination of static and time-series features), were taken into account in this study. Our evaluation was conducted by using the R2 score in a cross-validation scheme. The experimental results suggested that time-series modeling can improve the performance of multimodal analysis significantly in the task-dependent setting (with the best R2 = 0.797 for the total CS index), with word2vec being the most prominent feature. Unfortunately, highly context-related features did not fit well with the task-independent setting. Thus, we propose an enhanced LSTM model for dealing with task-independent settings, and we successfully obtained better performance with the enhanced model than with the conventional SVR and LSTM models (the best R2 = 0.602 for the total CS index). In other words, our study shows that a particular time-series modeling can outperform traditional nonsequential modeling for automatically estimating the CS indices of a participant in a group discussion with regard to task dependency.

REFERENCES

  1. [1] Greene Jennifer C. and Burleson Brant R.. 2003. Handbook of Communication and Social Interaction Skills. Lawrence Erlbaum Associates Publishers. DOI: https://doi.org/10.4324/9781410607133Google ScholarGoogle Scholar
  2. [2] Lu Maggie. 2002. The Harvard Business School Guide to Careers in Management Consulting. Harvard Business School Pr.Google ScholarGoogle Scholar
  3. [3] Aran Oya and Gatica-Perez Daniel. 2013. One of a kind: Inferring personality impressions in meetings. In Proceedings of the ICMI. 1118. DOI: DOI: http://dx.doi.org/10.1145/2522848.2522859 Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Biel J.-I., Teijeiro-Mosquera L., and Gatica-Perez D.. 2012. FaceTube: Predicting personality from facial expressions of emotion in online conversational video. In Proceedings of the ICMI. 5356. DOI: DOI: http://dx.doi.org/10.1145/2388676.2388689 Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Mawalim Candy Olivia, Okada Shogo, Nakano Yukiko, and Unoki Masashi. 2019. Multimodal bigfive personality trait analysis using communication skill indices and multiple discussion types dataset. In Proceedings of the HCI International. 370383. DOI: DOI: http://dx.doi.org/10.1007/978-3-030-21902-4_27Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Sanchez-Cortes Dairazalia, Aran Oya, Mast Marianne, and Gatica-Perez Daniel. 2012. A nonverbal behavior approach to identify emergent leaders in small groups. IEEE Trans. Multimedia 14 (2012), 816832. DOI: DOI: http://dx.doi.org/10.1109/TMM.2011.2181941 Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Scherer Stefan, Weibel Nadir, Oviatt Sharon, and Morency Louis-Philippe. 2012. Multimodal prediction of expertise and leadership in learning groups. In Proceedings of the MLA. DOI: DOI: http://dx.doi.org/10.1145/2389268.2389269 Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Ramanarayanan Vikram, Leong Chee Wee, Chen Lei, Feng Gary, and Suendermann-Oeft David. 2015. Evaluating speech, face, emotion and body movement time-series features for automated multimodal presentation scoring. In Proceedings of the ICMI. 2330. DOI: DOI: http://dx.doi.org/10.1145/2818346.2820765 Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Valente Fabio, Kim Samuel, and Motlicek Petr. 2012. Annotation and recognition of personality traits in spoken conversations from the AMI meetings corpus. In Proceedings of INTERSPEECH.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Alam Firoj, Danieli Morena, and Riccardi Giuseppe. 2017. Annotating and modeling empathy in spoken conversations. arxiv:1705.04839. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Park Sunghyun, Shim Han, Chatterjee Moitreya, Sagae Kenji, and Morency Louis-Philippe. 2014. Computational analysis of persuasiveness in social multimedia. In Proceedings of the ICMI. http://dx.doi.org/10.1145/2663204.2663260 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Okada Shogo, Ohtake Yoshihiko, Nakano Yukiko, Hayashi Yuki, Huang Hung-Hsuan, Takase Yutaka, and Nitta Katsumi. 2016. Estimating communication skills using dialogue acts and nonverbal features in multiple discussion datasets. In Proceedings of the ICMI. 169176. DOI: DOI: http://dx.doi.org/10.1145/2993148.2993154 Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Nihei Fumio, Nakano Yukiko I., Hayashi Yuki, Hung Hung-Hsuan, and Okada Shogo. 2014. Predicting influential statements in group discussions using speech and head motion information. In Proceedings of the ICMI. 136143. DOI: DOI: http://dx.doi.org/10.1145/2663204.2663248 Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Adler Franz. 1965. The conduct of inquiry: Methodology for behavioral science. Soc. Forces 44, 1 (1965), 126127.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Hargie Owen. 2019. The Handbook of Communication Skills. Routledge, London, UK.Google ScholarGoogle Scholar
  16. [16] Edwards Renee. 2011. Listening and message interpretation. Int. J. Listen. 25 (01 2011), 4765. DOI: DOI: http://dx.doi.org/10.1080/10904018.2011.536471Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Hymes D.. 1972. On Communicative Competence. Penguin, Harmondsworth, UK.Google ScholarGoogle Scholar
  18. [18] Bachman L. F.. 1990. Fundamental Considerations in Language Testing. Oxford University Press, Oxford, UK.Google ScholarGoogle Scholar
  19. [19] Naim I., Tanveer M. I., Gildea D., and Hoque M. E.. 2018. Automated analysis and prediction of job interview performance. IEEE Trans. Affect. Comput. 9, 2 (2018), 191204. DOI: DOI: http://dx.doi.org/10.1109/TAFFC.2016.2614299Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Sapru A. and Bourlard H.. 2015. Automatic recognition of emergent social roles in small group interactions. IEEE Trans. Multimedia 17, 5 (2015), 746760. DOI: DOI: http://dx.doi.org/10.1109/TMM.2015.2408437Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Rasipuram Sowmya and Jayagopi Dinesh Babu. 2018. Automatic assessment of communication skill in interview-based interactions. Multimedia Tools Appl. 77, 14 (2018), 1870918739. DOI: DOI: http://dx.doi.org/10.1007/s11042-018-5654-9 Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Mehta Yash, Majumder Navonil, Gelbukh Alexander, and Cambria Erik. 2019. Recent trends in deep learning based personality detection. Artific. Intell. Rev. 53 (2019). DOI: DOI: http://dx.doi.org/10.1007/s10462-019-09770-zGoogle ScholarGoogle Scholar
  23. [23] Lin Yun-Shao and Lee Chi-Chun. 2018. Using interlocutor-modulated attention BLSTM to predict personality traits in small group interaction. In Proceedings of the ICMI. 163169. DOI: DOI: http://dx.doi.org/10.1145/3242969.3243001 Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Beyan Cigdem, Katsageorgiou Vasiliki-Maria, and Murino Vittorio. 2019. A sequential data analysis approach to detect emergent leaders in small groups. IEEE Trans. Multimedia (2019). DOI: DOI: http://dx.doi.org/10.1109/TMM.2019.2895505Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Bull Peter. 2002. Communication under the microscope: The theory and practice of microanalysis. Routledge. DOI: https://doi.org/10.4324/9780203408025Google ScholarGoogle Scholar
  26. [26] Eyben Florian, Wöllmer Martin, and Schuller Björn. 2010. openSMILE—The Munich versatile and fast open-source audio feature extractor. In Proceedings of the ACM Multimedia. 14591462. DOI: DOI: http://dx.doi.org/10.1145/1873951.1874246 Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Schuller B., Steidl S., Batliner A., Noeth E., Vinciarelli A., Burkhardt F., Son R. van, Weninger Felix, Eyben F., Bocklet T., Mohammadi G., and Weiss B.. 2012. The INTERSPEECH 2012 speaker trait challenge. In Proceedings of INTERSPEECH.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Schuller B., Steidl S., Batliner A., Noeth E., Vinciarelli A., Burkhardt F., Son R. van, Weninger F., Eyben F., Bocklet T., Mohammadi G., and Weiss B.. 2014. A survey on perceived speaker traits: Personality, likability, pathology and the first challenge. Comput. Speech Lang. 29 (2014). DOI: DOI: http://dx.doi.org/10.1016/j.csl.2014.08.003Google ScholarGoogle Scholar
  29. [29] Kudo Taku, Yamamoto Kaoru, and Matsumoto Yuji. 2004. Applying conditional random fields to Japanese morphological analysis. In Proceedings of EMNLP. 230237.Google ScholarGoogle Scholar
  30. [30] Le Quoc and Mikolov Tomas. 2014. Distributed representations of sentences and documents. Proceedings of the 31st International Conference on Machine Learning 32, 2 (2014), 11881196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Mikolov Tomas, Corrado Greg, Chen Kai, and Dean Jeffrey. 2013. Efficient estimation of word representations in vector space. 112. arxiv:1301.3781.Google ScholarGoogle Scholar
  32. [32] Core Mark G. and Allen James. 1997. Coding dialogs with the DAMSL annotation scheme. In AAAI Fall Symposium on Communicative Action in Humans and Machines, Vol. 56. 2835.Google ScholarGoogle Scholar
  33. [33] Shriberg Elizabeth, Dhillon Raj, Bhagat Sonali, Ang Jeremy, and Carvey-Essenburg Hannah. 2004. The ICSI meeting recorder dialogue act (MRDA) corpus. SIGdial (Apr. 2004).Google ScholarGoogle Scholar
  34. [34] Baltrusaitis Tadas, Zadeh Amir, Lim Yao, and Morency Louis-Philippe. 2018. OpenFace 2.0: Facial behavior analysis toolkit. In Proceeding of the FG. 5966. DOI: DOI: http://dx.doi.org/10.1109/FG.2018.00019Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Tian Yingli, Kanade Takeo, and Cohn Jeffrey. 2001. Recognizing action units for facial expression analysis. IEEE Trans. Pattern Anal. Mach. Intell. 23 (2001), 97115. DOI: DOI: http://dx.doi.org/10.1109/34.908962 Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Littlewort Gwen, Frank Mark, Lainscsek Claudia, Fasel Ian, and Movellan Javier. 2005. Recognizing facial expression: Machine learning and application to spontaneous behavior. In Proceeding of the IEEE CVPR. 568573. DOI: DOI: http://dx.doi.org/10.1109/CVPR.2005.297 Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Wood E., Baltruaitis T., Zhang X., Sugano Y., Robinson P., and Bulling A.. 2015. Rendering of eyes for eye-shape registration and gaze estimation. In Proceedings of the IEEE ICCV. 37563764. DOI: DOI: http://dx.doi.org/10.1109/ICCV.2015.428 Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Zadeh Amir, Baltrusaitis Tadas, and Morency Louis-Philippe. 2017. Convolutional experts constrained local model for facial landmark detection. In Proceedings of the IEEE CVPRW. 20512059. DOI: DOI: http://dx.doi.org/10.1109/CVPRW.2017.256Google ScholarGoogle Scholar
  39. [39] Sabanovic S., Michalowski M. P., and Simmons Reid. 2006. Robots in the wild: Observing human-robot social interaction outside the lab. In Proceedings of the IEEE AMC. 596601. DOI: DOI: http://dx.doi.org/10.1109/AMC.2006.1631758Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Hochreiter Sepp and Schmidhuber Jürgen. 1997. Long short-term memory. Neural Comput. 9 (1997), 1735–80. DOI: DOI: http://dx.doi.org/10.1162/neco.1997.9.8.1735 Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Nguyen Laurent, Frauendorfer Denise, Mast Marianne, and Gatica-Perez Daniel. 2014. Hire me: Computational inference of hirability in employment interviews based on nonverbal behavior. IEEE Trans. Multimedia 16 (2014), 10181031. DOI: DOI: http://dx.doi.org/10.1109/TMM.2014.2307169 Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Drucker Harris, Burges Chris J. C., Kaufman Linda, Smola Alex, and Vapnik Vladimir. 2003. Support vector regression machines. Adv. Neural Info. Process. Syst. 9 (2003).Google ScholarGoogle Scholar
  43. [43] Okada Shogo, Matsugi Yoshihiro, Nakano Yukiko, Hayashi Yuki, Huang Hung-Hsuan, Takase Yutaka, and Nitta Katsumi. 2016. Estimating communication skills based on multimodal information in group discussions. Trans. Japan. Soc. Artific. Intell. 31 (2016), AI30–E_1. DOI: DOI: http://dx.doi.org/10.1527/tjsai.AI30-EGoogle ScholarGoogle ScholarCross RefCross Ref
  44. [44] Guyon Isabelle, Weston Jason, Barnhill Stephen, and Vapnik Vladimir. 2002. Gene selection for cancer classification using support vector machines. Mach. Learn. 46 (2002), 389422. DOI: DOI: http://dx.doi.org/10.1023/A:1012487302797 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Wöllmer Martin, Metallinou Angeliki, Katsamanis Nassos, Schuller Björn, and Narayanan Shrikanth. 2012. Analyzing the memory of BLSTM neural networks for enhanced emotion classification in dyadic spoken interactions. In Proceedings of the ICASSP. 41574160. DOI: DOI: http://dx.doi.org/10.1109/ICASSP.2012.6288834Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Task-independent Recognition of Communication Skills in Group Interaction Using Time-series Modeling

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 4
        November 2021
        529 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3492437
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 November 2021
        • Accepted: 1 February 2021
        • Revised: 1 November 2020
        • Received: 1 May 2020
        Published in tomm Volume 17, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!