Abstract
Teaching by demonstrations and teaching by assigning rewards are two popular methods of knowledge transfer in humans. However, showing the right behaviour (by demonstration) may appear more natural to a human teacher than assessing the learner’s performance and assigning a reward or punishment to it. In the context of robot learning, the preference between these two approaches has not been studied extensively. In this article, we propose a method that replaces the traditional method of reward assignment with action assignment (which is similar to providing a demonstration) in interactive reinforcement learning. The main purpose of the suggested action is to compute a reward by seeing if the suggested action was followed by the self-acting agent or not. We compared action assignment with reward assignment via a user study conducted over the web using a two-dimensional maze game. The logs of interactions showed that action assignment significantly improved users’ ability to teach the right behaviour. The survey results showed that both action and reward assignment seemed highly natural and usable, reward assignment required more mental effort, repeatedly assigning rewards and seeing the agent disobey commands caused frustration in users, and many users desired to control the agent’s behaviour directly.
- Alejandro Agostini, Carme Torras, and Florentin Wörgötter. 2015. Efficient interactive decision-making framework for robotic applications. Artific. Intell. 247, C (2015), 187--212.Google Scholar
- Tom Anthony, Daniel Polani, and Chrystopher L. Nehaniv. 2014. General self-motivation and strategy identification: Case studies based on Sokoban and Pac-Man. IEEE Trans. Comput. Intell. AI Games 6, 1 (2014), 1--17.Google Scholar
Cross Ref
- Riku Arakawa, Sosuke Kobayashi, Yuya Unno, Yuta Tsuboi, and Shin-ichi Maeda. 2018. DQN-TAMER: Human-in-the-loop reinforcement learning with intractable feedback. CoRR abs/1810.11748 (2018). arXiv:1810.11748. http://arxiv.org/abs/1810.11748.Google Scholar
- Brenna D. Argall, Brett Browning, and Manuela Veloso. 2008. Learning robot motion control with demonstration and advice-operators. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 399--404.Google Scholar
Cross Ref
- Brenna D. Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. 2009. A survey of robot learning from demonstration. Robot. Auton. Syst. 57, 5 (2009), 469--483.Google Scholar
Digital Library
- Merwan Barlier, Romain Laroche, and Olivier Pietquin. 2018. Training dialogue systems with human advice. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 999--1007.Google Scholar
- Christoph Bartneck, Dana Kulić, Elizabeth Croft, and Susana Zoghbi. 2009. Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots. Int. J. Soc. Robot. 1, 1 (2009), 71--81.Google Scholar
Cross Ref
- Michael R. Berthold, Nicolas Cebron, Fabian Dill, Thomas R. Gabriel, Tobias Kötter, Thorsten Meinl, Peter Ohl, Christoph Sieb, Kilian Thiel, and Bernd Wiswedel. 2007. KNIME: The Konstanz information miner. In Studies in Classification, Data Analysis, and Knowledge Organization. Springer.Google Scholar
- Colleen M. Carpinella, Alisa B. Wyman, Michael A. Perez, and Steven J. Stroessner. 2017. The robotic social attributes scale (rosas): Development and validation. In Proceedings of the ACM/IEEE International Conference on Human-robot Interaction. ACM, 254--262.Google Scholar
- Sonia Chernova and Andrea L. Thomaz. 2014. Robot learning from human teachers. Synth. Lect. Artific. Intell. Mach. Learn. 8, 3 (2014), 1--121.Google Scholar
Cross Ref
- Francisco Cruz, German I. Parisi, and Stefan Wermter. 2018. Multi-modal feedback for affordance-driven interactive reinforcement learning. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’18). IEEE, 1--8.Google Scholar
Cross Ref
- Francisco Cruz, Johannes Twiefel, Sven Magg, Cornelius Weber, and Stefan Wermter. 2015. Interactive reinforcement learning through speech guidance in a domestic scenario. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’15). IEEE, 1--8.Google Scholar
Cross Ref
- M. M. de Graaf and Bertram F. Malle. 2017. How people explain action (and autonomous intelligent systems should too). In Proceedings of the AAAI Fall Symposium on Artificial Intelligence for Human-Robot Interaction.Google Scholar
- Richard Evans. 2002. Varieties of learning. AI Game Programming Wisdom 2 (2002), 15.Google Scholar
- Rachel Gockley, Allison Bruce, Jodi Forlizzi, Marek Michalowski, Anne Mundell, Stephanie Rosenthal, Brennan Sellner, Reid Simmons, Kevin Snipes, Alan C. Schultz, et al. 2005. Designing robots for long-term social interaction. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’05). IEEE, 1338--1343.Google Scholar
Cross Ref
- Vinicius G. Goecks, Gregory M. Gremillion, Vernon J. Lawhern, John Valasek, and Nicholas R. Waytowich. 2018. Efficiently combining human demonstrations and interventions for safe training of autonomous systems in real-time. CoRR abs/1810.11545 (2018). arXiv:1810.11545. http://arxiv.org/abs/1810.11545.Google Scholar
- Shane Griffith, Kaushik Subramanian, Jonathan Scholz, Charles L. Isbell, and Andrea L. Thomaz. 2013. Policy shaping: Integrating human feedback with reinforcement learning. In Advances in Neural Information Processing Systems. MIT Press, 2625--2633.Google Scholar
- Kao-Shing Hwang, Jin-Ling Lin, Haobin Shi, and Yu-Ying Chen. 2016. Policy learning with human reinforcement. Int. J. Fuzzy Syst. 18, 4 (2016), 618--629.Google Scholar
Cross Ref
- Charles Isbell, Christian R. Shelton, Michael Kearns, Satinder Singh, and Peter Stone. 2001. A social reinforcement learning agent. In Proceedings of the 5th International Conference on Autonomous Agents. ACM, 377--384.Google Scholar
Digital Library
- Charles Lee Isbell, Michael Kearns, Dave Kormann, Satinder Singh, and Peter Stone. 2000. Cobot in LambdaMOO: A social statistics agent. In Proceedings of the AAAI International Conference on Artificial Intelligence (AAAI/IAAI’00). 36--41.Google Scholar
- Petr Jarušek and Radek Pelánek. 2010. Difficulty rating of sokoban puzzle. In Proceedings of the 5th Starting AI Researchers’ Symposium (STAIRS’10). 140--150.Google Scholar
- Taemie Kim and Pamela Hinds. 2006. Who should I blame? Effects of autonomy and transparency on attributions in human-robot interaction. In Proceedings of the 15th IEEE International Symposium on Robot and Human Interactive Communication (ROMAN’06). IEEE, 80--85.Google Scholar
Cross Ref
- W. Bradley Knox and Peter Stone. 2009. Interactively shaping agents via human reinforcement: The TAMER framework. In Proceedings of the Fifth International Conference on Knowledge Capture. ACM, 9--16.Google Scholar
Digital Library
- W. Bradley Knox and Peter Stone. 2010. Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 5--12.Google Scholar
- W. Bradley Knox and Peter Stone. 2012. Reinforcement learning from simultaneous human and MDP reward. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 475--482.Google Scholar
- W. Bradley Knox, Peter Stone, and Cynthia Breazeal. 2013. Training a robot via human feedback: A case study. In Proceedings of the International Conference on Social Robotics. Springer, 460--470.Google Scholar
Digital Library
- Samantha Krening. 2018. Newtonian action advice: Integrating human verbal instruction with reinforcement learning. CoRR abs/1804.05821 (2018). arXiv:1804.05821. http://arxiv.org/abs/1804.05821.Google Scholar
- Samantha Krening and Karen M. Feigh. 2018. Interaction algorithm effect on human experience with reinforcement learning. ACM Trans. Hum.-Robot Interact. 7, 2 (2018), 16.Google Scholar
Digital Library
- Samantha Krening, Brent Harrison, Karen M. Feigh, Charles Lee Isbell, Mark Riedl, and Andrea Thomaz. 2017. Learning from explanations using sentiment and advice in RL. IEEE Trans. Cogn. Dev. Syst. 9, 1 (2017), 44--55.Google Scholar
Cross Ref
- Gautam Kunapuli, Phillip Odom, Jude W. Shavlik, and Sriraam Natarajan. 2013. Guiding autonomous agents to better behaviors through human advice. In Proceedings of the IEEE 13th International Conference on Data Mining (ICDM’13). IEEE, 409--418.Google Scholar
Cross Ref
- Adrián León, Eduardo Morales, Leopoldo Altamirano, and Jaime Ruiz. 2011. Teaching a robot to perform task through imitation and on-line feedback. Progr. Pattern Recogn., Image Anal., Comput. Vision, Appl. (2011), 549--556.Google Scholar
- L. Adrián León, Ana C. Tenorio, and Eduardo F. Morales. 2013. Human interaction for effective reinforcement learning. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD’13).Google Scholar
- Guangliang Li, Hayley Hung, Shimon Whiteson, and W. Bradley Knox. 2014. Learning from human reward benefits from socio-competitive feedback. In Proceedings of the Joint IEEE International Conferences on Development and Learning and Epigenetic Robotics (ICDL-Epirob’14). IEEE, 93--100.Google Scholar
- Guangliang Li, Shimon Whiteson, W. Bradley Knox, and Hayley Hung. 2017. Social interaction for efficient agent learning from human reward. Auton. Agents Multi-Agent Syst. 32, 1 (2017), 1--25.Google Scholar
Digital Library
- Jamy Li. 2015. The benefit of being physically present: A survey of experimental works comparing copresent robots, telepresent robots and virtual agents. Int. J. Hum.-Comput. Studies 77 (2015), 23--37.Google Scholar
Digital Library
- Henry B. Mann and Donald R. Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 1 (1947), 50--60.Google Scholar
Cross Ref
- Matthew Marge, Satanjeev Banerjee, and Alexander I. Rudnicky. 2010. Using the Amazon mechanical turk for transcription of spoken language. In Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP’10). IEEE, 5270--5273.Google Scholar
- Andrew Y. Ng, Daishi Harada, and Stuart Russell. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the International Conference on machine Learning (ICML’99), Vol. 99. 278--287.Google Scholar
- J. Ross Quinlan. 2014. C4. 5: Programs for Machine Learning. Elsevier.Google Scholar
- Syed Ali Raza, Jesse Clark, and Mary-Anne Williams. 2016. On designing socially acceptable reward shaping. In Proceedings of the International Conference on Social Robotics. Springer, 860--869.Google Scholar
Cross Ref
- Syed Ali Raza, Benjamin Johnston, and Mary-Anne Williams. 2016. Reward from demonstration in interactive reinforcement learning. In The Twenty-Ninth International Flairs Conference.Google Scholar
- Jon Sprouse. 2011. A validation of Amazon mechanical turk for the collection of acceptability judgments in linguistic theory. Behav. Res. Methods 43, 1 (2011), 155--167.Google Scholar
Cross Ref
- Andrew Stern, Adam Frank, and Ben Resner. 1998. Virtual petz (video session): A hybrid approach to creating autonomous, lifelike dogz and catz. In Proceedings of the 2nd International Conference on Autonomous Agents. ACM, 334--335.Google Scholar
Digital Library
- Sidney Strauss and Margalit Ziv. 2012. Teaching is a natural cognitive ability for humans. Mind, Brain Educat. 6, 4 (2012), 186--196.Google Scholar
Cross Ref
- Halit Bener Suay and Sonia Chernova. 2011. Effect of human guidance and state space size on interactive reinforcement learning. In Proceedings of the IEEE International Symposium on Robot and Human Interactive Communication (ROMAN’11). IEEE, 1--6.Google Scholar
Cross Ref
- Halit Bener Suay, Russell Toris, and Sonia Chernova. 2012. A practical comparison of three robot learning from demonstration algorithm. Int. J. Soc. Robot. 4, 4 (2012), 319--330.Google Scholar
Cross Ref
- Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. Vol. 1. MIT Press, Cambridge.Google Scholar
- Dag Sverre Syrdal, Kerstin Dautenhahn, Kheng Lee Koay, and Michael L. Walters. 2009. The negative attitudes towards robots scale and reactions to robot behaviour in a live human-robot interaction study. In Adaptive and Emergent Behaviour and Complex Systems. SSAISB.Google Scholar
- Ana C. Tenorio-Gonzalez, Eduardo F. Morales, and Luis Villaseñor-Pineda. 2010. Dynamic reward shaping: Training a robot by voice. In Ibero-American Conference on Artificial Intelligence. Springer, 483--492.Google Scholar
Cross Ref
- Andrea Thomaz, Guy Hoffman, Maya Cakmak, et al. 2016. Computational human-robot interaction. Found. Trends Robot. 4, 2--3 (2016), 105--223.Google Scholar
- Andrea L. Thomaz, Guy Hoffman, and Cynthia Breazeal. 2006. Reinforcement learning with human teachers: Understanding how people want to teach robots. In Proceedings of the 15th IEEE International Symposium on Robot and Human Interactive Communication (ROMAN’06). IEEE, 352--357.Google Scholar
Cross Ref
- Ngo Anh Vien, Wolfgang Ertel, and Tae Choong Chung. 2013. Learning via human feedback in continuous state and action spaces. Appl. Intell. 39, 2 (2013), 267--278.Google Scholar
Digital Library
- Joshua Wainer, David J. Feil-Seifer, Dylan A. Shell, and Maja J. Mataric. 2007. Embodiment and human-robot interaction: A task-based perspective. In Proceedings of the 16th IEEE International Symposium on Robot and Human Interactive Communication (ROMAN’07). IEEE, 872--877.Google Scholar
- Garrett Warnell, Nicholas Waytowich, Vernon Lawhern, and Peter Stone. 2017. Deep TAMER: Interactive agent shaping in high-dimensional state spaces. CoRR abs/1709.10163 (2017). arXiv:1709.10163. http://arxiv.org/abs/1709.10163.Google Scholar
- Christopher John Cornish Hellaby Watkins. 1989. Learning from Delayed Rewards. Ph.D. Dissertation. University of Cambridge, England.Google Scholar
- Nicholas R. Waytowich, Vinicius G. Goecks, and Vernon J. Lawhern. 2018. Cycle-of-learning for autonomous systems from human interaction. CoRR abs/1808.09572 (2018). arXiv:1808.09572. http://arxiv.org/abs/1808.09572.Google Scholar
- Theophane Weber, Sébastien Racanière, David P. Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adrià Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter W. Battaglia, David Silver, and Daan Wierstra. 2017. Imagination-Augmented Agents for Deep Reinforcement Learning. CoRR abs/1707.06203 (2017). arXiv:1707.06203. http://arxiv.org/abs/1707.06203.Google Scholar
- Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometr. Bull. 1, 6 (1945), 80--83.Google Scholar
Index Terms
Human Feedback as Action Assignment in Interactive Reinforcement Learning
Recommendations
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent SystemsRecent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Newtonian Action Advice: Integrating Human Verbal Instruction with Reinforcement Learning
AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent SystemsA goal of Interactive Machine Learning is to enable people without specialized training to teach agents how to perform tasks. Many of the existing algorithms that learn from human instructions are evaluated using simulated feedback and focus on how ...
Framing reinforcement learning from human reward
Several studies have demonstrated that reward from a human trainer can be a powerful feedback signal for control-learning algorithms. However, the space of algorithms for learning from such human reward has hitherto not been explored systematically. ...






Comments