skip to main content
research-article

Human Feedback as Action Assignment in Interactive Reinforcement Learning

Authors Info & Claims
Published:04 August 2020Publication History
Skip Abstract Section

Abstract

Teaching by demonstrations and teaching by assigning rewards are two popular methods of knowledge transfer in humans. However, showing the right behaviour (by demonstration) may appear more natural to a human teacher than assessing the learner’s performance and assigning a reward or punishment to it. In the context of robot learning, the preference between these two approaches has not been studied extensively. In this article, we propose a method that replaces the traditional method of reward assignment with action assignment (which is similar to providing a demonstration) in interactive reinforcement learning. The main purpose of the suggested action is to compute a reward by seeing if the suggested action was followed by the self-acting agent or not. We compared action assignment with reward assignment via a user study conducted over the web using a two-dimensional maze game. The logs of interactions showed that action assignment significantly improved users’ ability to teach the right behaviour. The survey results showed that both action and reward assignment seemed highly natural and usable, reward assignment required more mental effort, repeatedly assigning rewards and seeing the agent disobey commands caused frustration in users, and many users desired to control the agent’s behaviour directly.

References

  1. Alejandro Agostini, Carme Torras, and Florentin Wörgötter. 2015. Efficient interactive decision-making framework for robotic applications. Artific. Intell. 247, C (2015), 187--212.Google ScholarGoogle Scholar
  2. Tom Anthony, Daniel Polani, and Chrystopher L. Nehaniv. 2014. General self-motivation and strategy identification: Case studies based on Sokoban and Pac-Man. IEEE Trans. Comput. Intell. AI Games 6, 1 (2014), 1--17.Google ScholarGoogle ScholarCross RefCross Ref
  3. Riku Arakawa, Sosuke Kobayashi, Yuya Unno, Yuta Tsuboi, and Shin-ichi Maeda. 2018. DQN-TAMER: Human-in-the-loop reinforcement learning with intractable feedback. CoRR abs/1810.11748 (2018). arXiv:1810.11748. http://arxiv.org/abs/1810.11748.Google ScholarGoogle Scholar
  4. Brenna D. Argall, Brett Browning, and Manuela Veloso. 2008. Learning robot motion control with demonstration and advice-operators. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 399--404.Google ScholarGoogle ScholarCross RefCross Ref
  5. Brenna D. Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. 2009. A survey of robot learning from demonstration. Robot. Auton. Syst. 57, 5 (2009), 469--483.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Merwan Barlier, Romain Laroche, and Olivier Pietquin. 2018. Training dialogue systems with human advice. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 999--1007.Google ScholarGoogle Scholar
  7. Christoph Bartneck, Dana Kulić, Elizabeth Croft, and Susana Zoghbi. 2009. Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots. Int. J. Soc. Robot. 1, 1 (2009), 71--81.Google ScholarGoogle ScholarCross RefCross Ref
  8. Michael R. Berthold, Nicolas Cebron, Fabian Dill, Thomas R. Gabriel, Tobias Kötter, Thorsten Meinl, Peter Ohl, Christoph Sieb, Kilian Thiel, and Bernd Wiswedel. 2007. KNIME: The Konstanz information miner. In Studies in Classification, Data Analysis, and Knowledge Organization. Springer.Google ScholarGoogle Scholar
  9. Colleen M. Carpinella, Alisa B. Wyman, Michael A. Perez, and Steven J. Stroessner. 2017. The robotic social attributes scale (rosas): Development and validation. In Proceedings of the ACM/IEEE International Conference on Human-robot Interaction. ACM, 254--262.Google ScholarGoogle Scholar
  10. Sonia Chernova and Andrea L. Thomaz. 2014. Robot learning from human teachers. Synth. Lect. Artific. Intell. Mach. Learn. 8, 3 (2014), 1--121.Google ScholarGoogle ScholarCross RefCross Ref
  11. Francisco Cruz, German I. Parisi, and Stefan Wermter. 2018. Multi-modal feedback for affordance-driven interactive reinforcement learning. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’18). IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  12. Francisco Cruz, Johannes Twiefel, Sven Magg, Cornelius Weber, and Stefan Wermter. 2015. Interactive reinforcement learning through speech guidance in a domestic scenario. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’15). IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  13. M. M. de Graaf and Bertram F. Malle. 2017. How people explain action (and autonomous intelligent systems should too). In Proceedings of the AAAI Fall Symposium on Artificial Intelligence for Human-Robot Interaction.Google ScholarGoogle Scholar
  14. Richard Evans. 2002. Varieties of learning. AI Game Programming Wisdom 2 (2002), 15.Google ScholarGoogle Scholar
  15. Rachel Gockley, Allison Bruce, Jodi Forlizzi, Marek Michalowski, Anne Mundell, Stephanie Rosenthal, Brennan Sellner, Reid Simmons, Kevin Snipes, Alan C. Schultz, et al. 2005. Designing robots for long-term social interaction. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’05). IEEE, 1338--1343.Google ScholarGoogle ScholarCross RefCross Ref
  16. Vinicius G. Goecks, Gregory M. Gremillion, Vernon J. Lawhern, John Valasek, and Nicholas R. Waytowich. 2018. Efficiently combining human demonstrations and interventions for safe training of autonomous systems in real-time. CoRR abs/1810.11545 (2018). arXiv:1810.11545. http://arxiv.org/abs/1810.11545.Google ScholarGoogle Scholar
  17. Shane Griffith, Kaushik Subramanian, Jonathan Scholz, Charles L. Isbell, and Andrea L. Thomaz. 2013. Policy shaping: Integrating human feedback with reinforcement learning. In Advances in Neural Information Processing Systems. MIT Press, 2625--2633.Google ScholarGoogle Scholar
  18. Kao-Shing Hwang, Jin-Ling Lin, Haobin Shi, and Yu-Ying Chen. 2016. Policy learning with human reinforcement. Int. J. Fuzzy Syst. 18, 4 (2016), 618--629.Google ScholarGoogle ScholarCross RefCross Ref
  19. Charles Isbell, Christian R. Shelton, Michael Kearns, Satinder Singh, and Peter Stone. 2001. A social reinforcement learning agent. In Proceedings of the 5th International Conference on Autonomous Agents. ACM, 377--384.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Charles Lee Isbell, Michael Kearns, Dave Kormann, Satinder Singh, and Peter Stone. 2000. Cobot in LambdaMOO: A social statistics agent. In Proceedings of the AAAI International Conference on Artificial Intelligence (AAAI/IAAI’00). 36--41.Google ScholarGoogle Scholar
  21. Petr Jarušek and Radek Pelánek. 2010. Difficulty rating of sokoban puzzle. In Proceedings of the 5th Starting AI Researchers’ Symposium (STAIRS’10). 140--150.Google ScholarGoogle Scholar
  22. Taemie Kim and Pamela Hinds. 2006. Who should I blame? Effects of autonomy and transparency on attributions in human-robot interaction. In Proceedings of the 15th IEEE International Symposium on Robot and Human Interactive Communication (ROMAN’06). IEEE, 80--85.Google ScholarGoogle ScholarCross RefCross Ref
  23. W. Bradley Knox and Peter Stone. 2009. Interactively shaping agents via human reinforcement: The TAMER framework. In Proceedings of the Fifth International Conference on Knowledge Capture. ACM, 9--16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. W. Bradley Knox and Peter Stone. 2010. Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 5--12.Google ScholarGoogle Scholar
  25. W. Bradley Knox and Peter Stone. 2012. Reinforcement learning from simultaneous human and MDP reward. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 475--482.Google ScholarGoogle Scholar
  26. W. Bradley Knox, Peter Stone, and Cynthia Breazeal. 2013. Training a robot via human feedback: A case study. In Proceedings of the International Conference on Social Robotics. Springer, 460--470.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Samantha Krening. 2018. Newtonian action advice: Integrating human verbal instruction with reinforcement learning. CoRR abs/1804.05821 (2018). arXiv:1804.05821. http://arxiv.org/abs/1804.05821.Google ScholarGoogle Scholar
  28. Samantha Krening and Karen M. Feigh. 2018. Interaction algorithm effect on human experience with reinforcement learning. ACM Trans. Hum.-Robot Interact. 7, 2 (2018), 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Samantha Krening, Brent Harrison, Karen M. Feigh, Charles Lee Isbell, Mark Riedl, and Andrea Thomaz. 2017. Learning from explanations using sentiment and advice in RL. IEEE Trans. Cogn. Dev. Syst. 9, 1 (2017), 44--55.Google ScholarGoogle ScholarCross RefCross Ref
  30. Gautam Kunapuli, Phillip Odom, Jude W. Shavlik, and Sriraam Natarajan. 2013. Guiding autonomous agents to better behaviors through human advice. In Proceedings of the IEEE 13th International Conference on Data Mining (ICDM’13). IEEE, 409--418.Google ScholarGoogle ScholarCross RefCross Ref
  31. Adrián León, Eduardo Morales, Leopoldo Altamirano, and Jaime Ruiz. 2011. Teaching a robot to perform task through imitation and on-line feedback. Progr. Pattern Recogn., Image Anal., Comput. Vision, Appl. (2011), 549--556.Google ScholarGoogle Scholar
  32. L. Adrián León, Ana C. Tenorio, and Eduardo F. Morales. 2013. Human interaction for effective reinforcement learning. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD’13).Google ScholarGoogle Scholar
  33. Guangliang Li, Hayley Hung, Shimon Whiteson, and W. Bradley Knox. 2014. Learning from human reward benefits from socio-competitive feedback. In Proceedings of the Joint IEEE International Conferences on Development and Learning and Epigenetic Robotics (ICDL-Epirob’14). IEEE, 93--100.Google ScholarGoogle Scholar
  34. Guangliang Li, Shimon Whiteson, W. Bradley Knox, and Hayley Hung. 2017. Social interaction for efficient agent learning from human reward. Auton. Agents Multi-Agent Syst. 32, 1 (2017), 1--25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jamy Li. 2015. The benefit of being physically present: A survey of experimental works comparing copresent robots, telepresent robots and virtual agents. Int. J. Hum.-Comput. Studies 77 (2015), 23--37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Henry B. Mann and Donald R. Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 1 (1947), 50--60.Google ScholarGoogle ScholarCross RefCross Ref
  37. Matthew Marge, Satanjeev Banerjee, and Alexander I. Rudnicky. 2010. Using the Amazon mechanical turk for transcription of spoken language. In Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP’10). IEEE, 5270--5273.Google ScholarGoogle Scholar
  38. Andrew Y. Ng, Daishi Harada, and Stuart Russell. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the International Conference on machine Learning (ICML’99), Vol. 99. 278--287.Google ScholarGoogle Scholar
  39. J. Ross Quinlan. 2014. C4. 5: Programs for Machine Learning. Elsevier.Google ScholarGoogle Scholar
  40. Syed Ali Raza, Jesse Clark, and Mary-Anne Williams. 2016. On designing socially acceptable reward shaping. In Proceedings of the International Conference on Social Robotics. Springer, 860--869.Google ScholarGoogle ScholarCross RefCross Ref
  41. Syed Ali Raza, Benjamin Johnston, and Mary-Anne Williams. 2016. Reward from demonstration in interactive reinforcement learning. In The Twenty-Ninth International Flairs Conference.Google ScholarGoogle Scholar
  42. Jon Sprouse. 2011. A validation of Amazon mechanical turk for the collection of acceptability judgments in linguistic theory. Behav. Res. Methods 43, 1 (2011), 155--167.Google ScholarGoogle ScholarCross RefCross Ref
  43. Andrew Stern, Adam Frank, and Ben Resner. 1998. Virtual petz (video session): A hybrid approach to creating autonomous, lifelike dogz and catz. In Proceedings of the 2nd International Conference on Autonomous Agents. ACM, 334--335.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Sidney Strauss and Margalit Ziv. 2012. Teaching is a natural cognitive ability for humans. Mind, Brain Educat. 6, 4 (2012), 186--196.Google ScholarGoogle ScholarCross RefCross Ref
  45. Halit Bener Suay and Sonia Chernova. 2011. Effect of human guidance and state space size on interactive reinforcement learning. In Proceedings of the IEEE International Symposium on Robot and Human Interactive Communication (ROMAN’11). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  46. Halit Bener Suay, Russell Toris, and Sonia Chernova. 2012. A practical comparison of three robot learning from demonstration algorithm. Int. J. Soc. Robot. 4, 4 (2012), 319--330.Google ScholarGoogle ScholarCross RefCross Ref
  47. Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. Vol. 1. MIT Press, Cambridge.Google ScholarGoogle Scholar
  48. Dag Sverre Syrdal, Kerstin Dautenhahn, Kheng Lee Koay, and Michael L. Walters. 2009. The negative attitudes towards robots scale and reactions to robot behaviour in a live human-robot interaction study. In Adaptive and Emergent Behaviour and Complex Systems. SSAISB.Google ScholarGoogle Scholar
  49. Ana C. Tenorio-Gonzalez, Eduardo F. Morales, and Luis Villaseñor-Pineda. 2010. Dynamic reward shaping: Training a robot by voice. In Ibero-American Conference on Artificial Intelligence. Springer, 483--492.Google ScholarGoogle ScholarCross RefCross Ref
  50. Andrea Thomaz, Guy Hoffman, Maya Cakmak, et al. 2016. Computational human-robot interaction. Found. Trends Robot. 4, 2--3 (2016), 105--223.Google ScholarGoogle Scholar
  51. Andrea L. Thomaz, Guy Hoffman, and Cynthia Breazeal. 2006. Reinforcement learning with human teachers: Understanding how people want to teach robots. In Proceedings of the 15th IEEE International Symposium on Robot and Human Interactive Communication (ROMAN’06). IEEE, 352--357.Google ScholarGoogle ScholarCross RefCross Ref
  52. Ngo Anh Vien, Wolfgang Ertel, and Tae Choong Chung. 2013. Learning via human feedback in continuous state and action spaces. Appl. Intell. 39, 2 (2013), 267--278.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Joshua Wainer, David J. Feil-Seifer, Dylan A. Shell, and Maja J. Mataric. 2007. Embodiment and human-robot interaction: A task-based perspective. In Proceedings of the 16th IEEE International Symposium on Robot and Human Interactive Communication (ROMAN’07). IEEE, 872--877.Google ScholarGoogle Scholar
  54. Garrett Warnell, Nicholas Waytowich, Vernon Lawhern, and Peter Stone. 2017. Deep TAMER: Interactive agent shaping in high-dimensional state spaces. CoRR abs/1709.10163 (2017). arXiv:1709.10163. http://arxiv.org/abs/1709.10163.Google ScholarGoogle Scholar
  55. Christopher John Cornish Hellaby Watkins. 1989. Learning from Delayed Rewards. Ph.D. Dissertation. University of Cambridge, England.Google ScholarGoogle Scholar
  56. Nicholas R. Waytowich, Vinicius G. Goecks, and Vernon J. Lawhern. 2018. Cycle-of-learning for autonomous systems from human interaction. CoRR abs/1808.09572 (2018). arXiv:1808.09572. http://arxiv.org/abs/1808.09572.Google ScholarGoogle Scholar
  57. Theophane Weber, Sébastien Racanière, David P. Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adrià Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter W. Battaglia, David Silver, and Daan Wierstra. 2017. Imagination-Augmented Agents for Deep Reinforcement Learning. CoRR abs/1707.06203 (2017). arXiv:1707.06203. http://arxiv.org/abs/1707.06203.Google ScholarGoogle Scholar
  58. Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometr. Bull. 1, 6 (1945), 80--83.Google ScholarGoogle Scholar

Index Terms

  1. Human Feedback as Action Assignment in Interactive Reinforcement Learning

                      Recommendations

                      Comments

                      Login options

                      Check if you have access through your login credentials or your institution to get full access on this article.

                      Sign in

                      Full Access

                      PDF Format

                      View or Download as a PDF file.

                      PDF

                      eReader

                      View online with eReader.

                      eReader

                      HTML Format

                      View this article in HTML Format .

                      View HTML Format
                      About Cookies On This Site

                      We use cookies to ensure that we give you the best experience on our website.

                      Learn more

                      Got it!