skip to main content
10.1145/3462244.3479944acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Engagement Rewarded Actor-Critic with Conservative Q-Learning for Speech-Driven Laughter Backchannel Generation

Published: 18 October 2021 Publication History

Abstract

We propose a speech-driven laughter backchannel generation model to reward engagement during human-agent interaction. We formulate the problem as a Markov decision process where speech signal represents the state and the objective is to maximize human engagement. Since online training is often impractical in the case of human-agent interaction, we utilize the existing human-to-human dyadic interaction datasets to train our agent for the backchannel generation task. We address the problem using an actor-critic method based on conservative Q-learning (CQL), that mitigates the distributional shift problem by suppressing Q-value over-estimation during training. The proposed CQL based approach is evaluated objectively on the IEMOCAP dataset for laughter generation task. When compared to the existing off-policy Q-learning methods, we observe an improved compliance with the dataset in terms of laugh generation rate. Furthermore, we show the effectiveness of the learned policy by estimating the expected engagement using off-policy policy evaluation techniques.

References

[1]
Rishabh Agarwal, Dale Schuurmans, and Mohammad Norouzi. 2020. An optimistic perspective on offline reinforcement learning. In International Conference on Machine Learning. 104–114.
[2]
Leemon Baird. 1995. Residual algorithms: Reinforcement learning with function approximation. In Machine Learning Proceedings. 30–37.
[3]
Elif Bozkurt, Yücel Yemez, and Engin Erzin. 2016. Multimodal analysis of speech and arm motion for prosody-driven synthesis of beat gestures. Speech Communication 85(2016), 29–42.
[4]
Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. 2008. IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation 42, 4 (2008), 335.
[5]
Justin Fu, Aviral Kumar, Matthew Soh, and Sergey Levine. 2019. Diagnosing bottlenecks in deep Q-learning algorithms. In International Conference on Machine Learning. 2021–2030.
[6]
Scott Fujimoto, David Meger, and Doina Precup. 2019. Off-Policy Deep Reinforcement Learning without Exploration. In International Conference on Machine Learning, Vol. 97. 2052–2062.
[7]
Goren Gordon, Samuel Spaulding, Jacqueline Kory Westlund, Jin Joo Lee, Luke Plummer, Marayna Martinez, Madhurima Das, and Cynthia Breazeal. 2016. Affective personalization of a social robot tutor for children’s second language skills. In AAAI Conference on Artificial Intelligence.
[8]
Lixing Huang, Louis-Philippe Morency, and Jonathan Gratch. 2011. Virtual Rapport 2.0. In International Workshop on Intelligent Virtual Agents. 68–79.
[9]
Nusrah Hussain, Engin Erzin, T Metin Sezgin, and Yücel Yemez. 2019. Speech Driven Backchannel Generation Using Deep Q-Network for Enhancing Engagement in Human-Robot Interaction. In Proc. Interspeech. 4445–4449.
[10]
V. Hyvönen, T. Pitkänen, S. Tasoulis, E. Jääsaari, R. Tuomainen, L. Wang, J. Corander, and T. Roos. 2016. Fast nearest neighbor search through sparse random projections and voting. In IEEE International Conference on Big Data (Big Data). 881–888.
[11]
Aviral Kumar, Justin Fu, Matthew Soh, George Tucker, and Sergey Levine. 2019. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction. In Advances in Neural Information Processing Systems, Vol. 32.
[12]
Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. 2020. Conservative Q-Learning for Offline Reinforcement Learning. In Advances in Neural Information Processing Systems, Vol. 33. 1179–1191.
[13]
J. Lee and S. C. Marsella. 2010. Predicting Speaker Head Nods and the Effects of Affective Information. IEEE Transactions on Multimedia 12, 6 (2010), 552–562.
[14]
Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. 2020. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv:2005.01643 (2020).
[15]
Raveesh Meena, Gabriel Skantze, and Joakim Gustafson. 2014. Data-driven models for timing feedback responses in a Map Task dialogue system. Computer Speech & Language 28, 4 (2014), 903–922.
[16]
Angeliki Metallinou, Athanasios Katsamanis, and Shrikanth Narayanan. 2013. Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information. Image and Vision Computing 31, 2 (2013), 137–152.
[17]
Louis-Philippe Morency, Iwan de Kok, and Jonathan Gratch. 2008. Predicting Listener Backchannels: A Probabilistic Multimodal Approach. In Intelligent Virtual Agents. 176–190.
[18]
Christina Moro, Shayne Lin, Goldie Nejat, and Alex Mihailidis. 2019. Social robots and seniors: a comparative study on the influence of dynamic social features on human–robot interaction. International Journal of Social Robotics 11, 1 (2019), 5–24.
[19]
Ronald Poppe, Khiet P Truong, and Dirk Heylen. 2011. Backchannels: Quantity, type and timing matters. In International Workshop on Intelligent Virtual Agents. 228–239.
[20]
C. Rich, B. Ponsler, A. Holroyd, and C. L. Sidner. 2010. Recognizing engagement in human-robot interaction. In ACM/IEEE International Conference on Human-Robot Interaction (HRI). 375–382. https://doi.org/10.1109/HRI.2010.5453163
[21]
Martin Riedmiller. 2005. Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method. In European Conference on Machine Learning. 317–328.
[22]
Hannes Ritschel, Tobias Baur, and Elisabeth André. 2017. Adapting a Robot’s linguistic style based on socially-aware reinforcement learning. In IEEE International Symposium on Robot and Human Interactive Communication. 378–384.
[23]
M. Schroder, E. Bevacqua, R. Cowie, F. Eyben, H. Gunes, D. Heylen, M. ter Maat, G. McKeown, S. Pammi, M. Pantic, C. Pelachaud, B. Schuller, E. de Sevin, M. Valstar, and M. Wollmer. 2012. Building Autonomous Sensitive Artificial Listeners. IEEE Transactions on Affective Computing 3, 2 (2012), 165–183.
[24]
Candace L. Sidner, Christopher Lee, Cory D. Kidd, Neal Lesh, and Charles Rich. 2005. Explorations in engagement for humans and robots. Artificial Intelligence 166, 1 (2005), 140 – 164. https://doi.org/10.1016/j.artint.2005.03.005
[25]
Noah Siegel, Jost Tobias Springenberg, Felix Berkenkamp, Abbas Abdolmaleki, Michael Neunert, Thomas Lampe, Roland Hafner, Nicolas Heess, and Martin Riedmiller. 2020. Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning. In International Conference on Learning Representations.
[26]
Gabriel Skantze. 2017. Towards a General, Continuous Model of Turn-taking in Spoken Dialogue using LSTM Recurrent Neural Networks. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue. 220–230.
[27]
Richard S Sutton, Andrew G Barto, 1998. Introduction to reinforcement learning. Vol. 135. MIT press Cambridge.
[28]
Daniel Szafir and Bilge Mutlu. 2012. Pay attention! Designing adaptive agents that monitor and improve user engagement. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 11–20.
[29]
Philip Thomas and Emma Brunskill. 2016. Data-efficient off-policy policy evaluation for reinforcement learning. In International Conference on Machine Learning. 2139–2148.
[30]
Bekir Berker Turker, Engin Erzin, T Metin Sezgin, and Yücel Yemez. 2018. Audio-Visual Prediction of Head-Nod and Turn-Taking Events in Dyadic Interactions. In Proc. Interspeech.
[31]
Bekir Berker Türker, Zana Buçinca, Engin Erzin, Yücel Yemez, and Metin Sezgin. 2017. Analysis of Engagement and User Experience with a Laughter Responsive Social Robot. In Proc. Interspeech. 844–848.
[32]
Cameron Voloshin, Hoang M Le, Nan Jiang, and Yisong Yue. 2019. Empirical study of off-policy policy evaluation for reinforcement learning. arXiv:1911.06854 (2019).
[33]
Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, and Nando de Freitas. 2016. Sample efficient actor-critic with experience replay. arXiv:1611.01224 (2016).

Cited By

View all
  • (2024)Modeling Bellman-error with logistic distribution with applications in reinforcement learningNeural Networks10.1016/j.neunet.2024.106387177(106387)Online publication date: Sep-2024
  • (2022)Offline Reinforcement Learning via Policy Regularization and Ensemble Q-Functions2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI56018.2022.00178(1167-1174)Online publication date: Oct-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction
October 2021
876 pages
ISBN:9781450384810
DOI:10.1145/3462244
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. backchannels
  2. human-agent interaction
  3. offline reinforcement learning
  4. user engagement

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Turkish Scientific and Technical Research Council (TUBITAK)

Conference

ICMI '21
Sponsor:
ICMI '21: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
October 18 - 22, 2021
QC, Montréal, Canada

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Modeling Bellman-error with logistic distribution with applications in reinforcement learningNeural Networks10.1016/j.neunet.2024.106387177(106387)Online publication date: Sep-2024
  • (2022)Offline Reinforcement Learning via Policy Regularization and Ensemble Q-Functions2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI56018.2022.00178(1167-1174)Online publication date: Oct-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media