Abstract

To prevent harmful AI behavior, people need to specify constraints that forbid undesirable actions. Unfortunately, this is a complex task, since writing rules that distinguish harmful from non-harmful actions tends to be quite difficult in real-world situations. Therefore, such decisions have historically been made by a small group of powerful AI companies and developers, with limited community input. In this paper, we study how to enable a crowd of non-AI experts to work together to communicate high-quality, reliable constraints to AI systems. We first focus on understanding how humans reason about temporal dynamics in the context of AI behavior, finding through experiments on a novel game-based testbed that participants tend to adopt a long-term notion of harm, even in uncertain situations that do not affect them directly. Building off of this insight, we explore task design for long-term constraint specification, developing new filtering approaches and new methods of promoting user reflection. Next, we develop a novel rule-based interface which allows people to craft rules in an accessible fashion without programming knowledge. We test our approaches on a real-world AI problem in the domain of education, and find that our new filtering mechanisms and interfaces significantly improve constraint quality and human efficiency. We also demonstrate how these systems can be applied to other real-world AI problems (e.g. in social networks).
Supplemental Material
Available for Download
Video_Figure.mp4 - Our video figure, demonstrating the educational video game Riddle Books, our constraint specification interfaces, and the CarefulCar testbed. Playable in any video player. Credits: CarefulCar Music: After (iamamiwhoami Remix) by Moby, courtesy of https://mobygratis.com Riddle Books developed by the Center for Game Science at the University of Washington
- Muhammad Ali, Piotr Sapiezynski, Miranda Bogen, Aleksandra Korolova, Alan Mislove, and Aaron Rieke. 2019. Discrimination through Optimization: How Facebook's Ad Delivery Can Lead to Biased Outcomes. Proceedings of the ACM on Human-Computer Interaction, Vol. 3, CSCW (2019), 1--30.Google Scholar
Digital Library
- Ali Alkhatib and Michael Bernstein. 2019. Street-Level Algorithms: A Theory at the Gaps Between Policy and Decisions. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 530.Google Scholar
Digital Library
- Edmond Awad, Sohan Dsouza, Jean-Francc ois Bonnefon, Azim Shariff, and Iyad Rahwan. 2020. Crowdsourcing moral machines. Commun. ACM, Vol. 63, 3 (2020), 48--55.Google Scholar
Digital Library
- Edmond Awad, Sohan Dsouza, Richard Kim, Jonathan Schulz, Joseph Henrich, Azim Shariff, Jean-Francc ois Bonnefon, and Iyad Rahwan. 2018. The moral machine experiment. Nature, Vol. 563, 7729 (2018), 59.Google Scholar
- Avinash Balakrishnan, Djallel Bouneffouf, Nicholas Mattei, and Francesca Rossi. 2018. Using Contextual Bandits with Behavioral Constraints for Constrained Online Movie Recommendation. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13--19, 2018, Stockholm, Sweden. 5802--5804.Google Scholar
Cross Ref
- Ruha Benjamin. 2019. Race after technology: Abolitionist tools for the new jim code. Social Forces (2019).Google Scholar
- Ian Bogost. 2018. Enough with the Trolley Problem. The Atlantic (2018).Google Scholar
- Jonathan Bragg, Mausam Mausam, and Daniel S Weld. 2016. Optimal testing for crowd workers. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 966--974.Google Scholar
- Rodney Brooks. 2017. Unexpected Consequences of Self-Driving Cars. rodneybrooks.com (2017).Google Scholar
- Michael Buhrmester, Tracy Kwang, and Samuel D Gosling. 2016. Amazon's Mechanical Turk: A new source of inexpensive, yet high-quality data? (2016).Google Scholar
- Yvonne Chen, Travis Mandel, Yun-En Liu, and Zoran Popović. 2016. Crowdsourcing Accurate and Creative Word Problems and Hints. AAAI HCOMP (2016).Google Scholar
- Albert Mo Kim Cheng, James C Browne, Aloysius K Mok, and R-H Wang. 1991. Estella; a facility for specifying behavioral constraint assertions in real-time rule-based systems. In COMPASS'91, Proceedings of the Sixth Annual Conference on Computer Assurance. IEEE, 107--123.Google Scholar
- Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, and Mohammad Ghavamzadeh. 2018. A Lyapunov-based Approach to Safe Reinforcement Learning. arXiv preprint arXiv:1805.07708 (2018).Google Scholar
- Sasha Costanza-Chock. 2018. Design Justice: towards an intersectional feminist framework for design theory and practice. Proceedings of the Design Research Society (2018).Google Scholar
Cross Ref
- Peng Dai, Mausam Mausam, and Daniel S Weld. 2011. Artificial intelligence for artificial artificial intelligence. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence. AAAI Press, 1153--1159.Google Scholar
- Gal Dalal, Krishnamurthy Dvijotham, Matej Vecerik, Todd Hester, Cosmin Paduraru, and Yuval Tassa. 2018. Safe Exploration in Continuous Action Spaces. arXiv preprint arXiv:1801.08757 (2018).Google Scholar
- Shai Danziger, Jonathan Levav, and Liora Avnaim-Pesso. 2011. Extraneous factors in judicial decisions. Proceedings of the National Academy of Sciences, Vol. 108, 17 (2011), 6889--6892.Google Scholar
Cross Ref
- B. A. Davey and H.A Priestley. 1990. Introduction to Lattices and Order .Cambridge University Press.Google Scholar
- Greg d'Eon, Joslin Goh, Kate Larson, and Edith Law. 2019. Paying Crowd Workers for Collaborative Work. Proceedings of the ACM on Human-Computer Interaction, Vol. 3, CSCW (2019), 1--24.Google Scholar
- Deven R Desai and Joshua A Kroll. 2017. Trust but verify: A guide to algorithms and the law. Harv. JL & Tech., Vol. 31 (2017), 1.Google Scholar
- Giada Di Stefano, Francesca Gino, Gary Pisano, and Bradley Staats. 2014. Learning by Thinking: Overcoming Bias for Action through Reflection. Harvard Business School Working Paper Series, Vol. 58, 14-093 (March 2014).Google Scholar
- Brendan Dixon. 2020. The moral machine is bad news for AI ethics. Mind Matters News (2020).Google Scholar
- Shayan Doroudi, Ece Kamar, Emma Brunskill, and Eric Horvitz. 2016. Toward a learning science for complex crowdsourcing tasks. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 2623--2634.Google Scholar
Digital Library
- Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. 2017. CARLA: An open urban driving simulator. arXiv preprint arXiv:1711.03938 (2017).Google Scholar
- Ryan Drapeau, Lydia B Chilton, Jonathan Bragg, and Daniel S Weld. 2016. Microtalk: Using argumentation to improve crowdsourcing accuracy. In Fourth AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
Cross Ref
- Jie Gao, Hankz Hankui Zhuo, Subbarao Kambhampati, and Lei Li. 2015. Acquiring Planning Knowledge via Crowdsourcing. In Third AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
- Yotam Gingold, Etienne Vouga, Eitan Grinspun, and Haym Hirsh. 2012. Diamonds from the rough: Improving drawing, painting, and singing via crowdsourcing. In Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence.Google Scholar
- Meghan Holohan. 2018. Her baby was stillborn, but the ads just kept coming: One mother shares her pain. Today, https://www.today.com/parents/gillian-brockell-s-open-letter-tech-companies-goes-viral-t145124.Google Scholar
- Bert I Huang. 2019. LAW'S HALO AND THE MORAL MACHINE. Columbia Law Review, Vol. 119, 7 (2019), 1811--1828.Google Scholar
- Lilly C Irani and M Six Silberman. 2013. Turkopticon: Interrupting worker invisibility in amazon mechanical turk. In Proceedings of the SIGCHI conference on human factors in computing systems. 611--620.Google Scholar
Digital Library
- Pratyusha Kalluri. 2020. Don't ask if artificial intelligence is good or fair, ask how it shifts power. Nature, Vol. 583, 7815 (2020), 169--169.Google Scholar
- Walter S Lasecki, Adam Marcus, Jeffrey M Rzeszotarski, and Jeffrey P Bigham. 2014. Using microtask continuity to improve crowdsourcing. Technical Report.Google Scholar
- Glen D Lawrence. 2013. Dietary fats and health: dietary recommendations in the context of scientific evidence. Advances in nutrition, Vol. 4, 3 (2013), 294--302.Google Scholar
- Sang Won Lee, Rebecca Krosnick, Sun Young Park, Brandon Keelean, Sach Vaidya, Stephanie D O'Keefe, and Walter S Lasecki. 2018. Exploring real-time collaboration in crowd-powered systems through a ui design tool. Proceedings of the ACM on Human-Computer Interaction, Vol. 2, CSCW (2018), 1--23.Google Scholar
Digital Library
- Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, and Shane Legg. 2017. Ai safety gridworlds. arXiv preprint arXiv:1711.09883 (2017).Google Scholar
- Tianyi Li, Chandler J Manns, Chris North, and Kurt Luther. 2019. Dropping the baton? Understanding errors and bottlenecks in a crowdsourced sensemaking pipeline. Proceedings of the ACM on Human-Computer Interaction, Vol. 3, CSCW (2019), 1--26.Google Scholar
- Maximilian Mackeprang, Claudia Müller-Birn, and Maximilian Timo Stauss. 2019. Discovering the Sweet Spot of Human-Computer Configurations: A Case Study in Information Extraction. Proceedings of the ACM on Human-Computer Interaction, Vol. 3, CSCW (2019), 1--30.Google Scholar
Digital Library
- Winter Mason and Siddharth Suri. 2012. Conducting behavioral research on Amazon's Mechanical Turk. Behavior research methods, Vol. 44, 1 (2012), 1--23.Google Scholar
- David McAllester and Jeffrey Mark Siskind. 1993. Nondeterministic lisp as a substrate for constraint logic programming. In Proceedings AAAI. 133--138.Google Scholar
- George A Miller. 1956. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological review, Vol. 63, 2 (1956), 81.Google Scholar
- Tanushree Mitra, Clayton J Hutto, and Eric Gilbert. 2015. Comparing person-and process-centric strategies for obtaining quality data on amazon mechanical turk. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 1345--1354.Google Scholar
Digital Library
- Deirdre K Mulligan, Joshua A Kroll, Nitin Kohli, and Richmond Y Wong. 2019. This Thing Called Fairness: Disciplinary Confusion Realizing a Value in Technology. Proceedings of the ACM on Human-Computer Interaction, Vol. 3, CSCW (2019), 1--36.Google Scholar
Digital Library
- David Oleson, Alexander Sorokin, Greg P Laughlin, Vaughn Hester, John Le, and Lukas Biewald. 2011. Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing. Human computation, Vol. 11, 11 (2011).Google Scholar
Digital Library
- Barry O'Sullivan. 2002. Interactive constraint-aided conceptual design. AI EDAM, Vol. 16, 4 (2002), 303--328.Google Scholar
Digital Library
- Raja Parasuraman, Thomas B Sheridan, and Christopher D Wickens. 2000. A model for types and levels of human interaction with automation. IEEE Transactions on systems, man, and cybernetics-Part A: Systems and Humans, Vol. 30, 3 (2000), 286--297.Google Scholar
Digital Library
- Singer Peter. 1981. The expanding circle: ethics and sociobiology. New York: Farrar, Straus and Giroux (1981).Google Scholar
- Renee M Petrilli, Gregory D Roach, Drew Dawson, and Nicole Lamond. 2006. The sleep, subjective fatigue, and sustained attention of commercial airline pilots during an international pattern. Chronobiology international, Vol. 23, 6 (2006), 1357--1362.Google Scholar
Cross Ref
- Niki Pfeifer and Gernot D Kleiter. 2007. Human reasoning with imprecise probabilities: Modus ponens and Denying the antecedent. In 5th International symposium on imprecise probability: Theories and applications. 347--356.Google Scholar
- Anatol Rapoport, Albert M Chammah, and Carol J Orwant. 1965. Prisoner's dilemma: A study in conflict and cooperation. Vol. 165. University of Michigan press.Google Scholar
- William Saunders, Girish Sastry, Andreas Stuhlmueller, and Owain Evans. 2017. Trial without Error: Towards Safe Reinforcement Learning via Human Intervention. arXiv preprint arXiv:1707.05173 (2017).Google Scholar
- William Saunders, Girish Sastry, Andreas Stuhlmueller, and Owain Evans. 2018. Trial without error: Towards safe reinforcement learning via human intervention. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 2067--2069.Google Scholar
Digital Library
- Stephen Stich. 2007. Evolution, altruism and cognitive architecture: a critique of Sober and Wilson's argument for psychological altruism. Biology & Philosophy, Vol. 22, 2 (2007), 267--281.Google Scholar
Cross Ref
- Richard S Sutton, Andrew G Barto, et almbox. 1998. Reinforcement learning: An introduction .MIT press.Google Scholar
Digital Library
- Chen Tessler, Daniel J Mankowitz, and Shie Mannor. 2018. Reward Constrained Policy Optimization. arXiv preprint arXiv:1805.11074 (2018).Google Scholar
- Philip S Thomas, Bruno Castro da Silva, Andrew G Barto, Stephen Giguere, Yuriy Brun, and Emma Brunskill. 2019. Preventing undesirable behavior of intelligent machines. Science, Vol. 366, 6468 (2019), 999--1004.Google Scholar
- Niels van Berkel, Jorge Goncalves, Danula Hettiachchi, Senuri Wijenayake, Ryan M Kelly, and Vassilis Kostakos. 2019. Crowdsourcing Perceptions of Fair Predictors for Machine Learning: A Recidivism Case Study. Proceedings of the ACM on Human-Computer Interaction, Vol. 3, CSCW (2019), 1--21.Google Scholar
Digital Library
- George Christopher Williams. 1966. Adaptation and natural selection: A critique of some current evolutionary thought .Princeton university press.Google Scholar
- Christine Wolf and Jeanette Blomberg. 2019. Evaluating the Promise of Human-Algorithm Collaborations in Everyday Work Practices. Proceedings of the ACM on Human-Computer Interaction, Vol. 3, CSCW (2019), 1--23.Google Scholar
Digital Library
- Zijian Zhang, Jaspreet Singh, Ujwal Gadiraju, and Avishek Anand. 2019. Dissonance between human and machine understanding. Proceedings of the ACM on Human-Computer Interaction, Vol. 3, CSCW (2019), 1--23.Google Scholar
Digital Library
- Hankz Hankui Zhuo. 2015. Crowdsourced Action-Model Acquisition for Planning.. In AAAI. 3439--3446.Google Scholar
- Martin Zwick and Jeffrey A Fletcher. 2014. Levels of altruism. Biological Theory, Vol. 9, 1 (2014), 100--107.Google Scholar
Cross Ref
Index Terms
Using the Crowd to Prevent Harmful AI Behavior
Recommendations
Hybrid collective intelligence in a human–AI society
AbstractWithin current debates about the future impact of Artificial Intelligence (AI) on human society, roughly three different perspectives can be recognised: (1) the technology-centric perspective, claiming that AI will soon outperform humankind in all ...
Making better use of the crowd: how crowdsourcing can advance machine learning research
This survey provides a comprehensive overview of the landscape of crowdsourcing research, targeted at the machine learning community. We begin with an overview of the ways in which crowdsourcing can be used to advance machine learning research, focusing ...
A Study of Human-AI Symbiosis for Creative Work: Recent Developments and Future Directions in Deep Learning
Recent advances in Artificial Intelligence (AI), particularly deep learning (DL), are having an enormous impact on our society today. Record numbers of jobs previously held by people have been automated, from manufacturing to transportation to customer ...






Comments