skip to main content
research-article
Open Access

Play for Real(ism) - Using Games to Predict Human-AI interactions in the Real World

Published:06 October 2021Publication History
Skip Abstract Section

Abstract

AI-enabled decision support systems have repeatedly failed in real world applications despite the underlying model operating as designed. Often this was because the system was used in an unexpected manner. Our goal is to enable better prediction of how systems will be used prior to their implementation as well as to improve existing designs, by taking human behavior into account. There are several challenges to collecting such data. Not having access to an existing prediction engine requires the simulation of such a system's behavior. This simulation must include not just the behavior of the underlying model but also the context in which the decision will be made in the real world. Additionally, collecting statistically valid samples requires that test subjects make repeated choices under slightly varied conditions. Unfortunately, in such repetitious conditions fatigue can quickly set in. Games provide us the ability to address both of these challenges by providing both systems context and narrative context. Systems context can be used to convey some or all of the information the player needs to make a decision in the game environment itself, which can help avoid the onset of fatigue. Narrative context can provide a broader environment within which the simulated system operates, adding a sense of progress, showing the effect of decisions, adding perceived social norms, and setting incentives and stakes. This broader environment can further prevent player fatigue while replicating many of the external factors that might affect choices in the real world. In this paper we describe the design of the Human-AI Decision Evaluation System (HADES), a test harness capable of interfacing with a game environment, simulating the behavior of an AI-enabled decision support system, and collecting the results of human decision making based upon such a system's predictions. Additionally, we present an analysis of data collected by HADES while interfaced with a visual novel game focused on software cyber-risk assessment.

References

  1. REFERENCESGoogle ScholarGoogle Scholar
  2. D. J. Ahler, C. E. Roush, and G. Sood. 2018. The micro-task market for "Lemons": Collecting data on Amazon's Mechanical Turk. Working Paper. Epub ahead of print.Google ScholarGoogle Scholar
  3. V. Aleven, E. Myers, M. Easterday, and A. Ogan. 2010, April. Toward a framework for the analysis and design of educational games. In 2010 third IEEE international conference on digital game and intelligent toy enhanced learning (pp. 69--76). IEEE.Google ScholarGoogle Scholar
  4. I. G. Anson. 2018. Taking the time? Explaining effortful participation among low-cost online survey participants. Research & Politics, 5(3), 2053168018785483.Google ScholarGoogle ScholarCross RefCross Ref
  5. K. Bergström. 2010, October. The implicit rules of board games: On the particulars of the lusory agreement. In Proceedings of the 14th International Academic MindTrek Conference: Envisioning Future Media Environments (pp. 86--93).Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. J. Berinsky, G. A. Huber, and G. S. Lenz. 2012. Evaluating online labor markets for experimental research: Amazon.com's Mechanical Turk. Political analysis, 20(3), 351--368.Google ScholarGoogle Scholar
  7. J. A. Bopp, K. Opwis, and E.D. Mekler. 2018. "An Odd Kind of Pleasure": Differentiating Emotional Challenge in Digital Games. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). Association for Computing Machinery, New York, NY, USA, Paper 41, 1--12. DOI: https://doi.org/10.1145/3173574.3173615Google ScholarGoogle Scholar
  8. S. Clifford, R. M. Jewell, and P. D. Waggoner. 2015. Are samples drawn from Mechanical Turk valid for research on political ideology? Research & Politics, 2(4), 2053168015622072.Google ScholarGoogle ScholarCross RefCross Ref
  9. N.E Day, D Hudson, P.R. Dobies, et al. 2011. Student or situation? Personality and classroom context as predictors of attitudes about business school cheating. Soc Psychol Educ. 14: 261. https://doi.org/10.1007/s11218-010--9145--8Google ScholarGoogle ScholarCross RefCross Ref
  10. S. A. Dennis, B. M. Goodson, and C. A. Pearson. 2020. Online worker fraud and evolving threats to the integrity of MTurk data: A discussion of virtual private servers and the limitations of IP-based screening procedures. Behavioral Research in Accounting, 32(1), 119--134.Google ScholarGoogle ScholarCross RefCross Ref
  11. F. Doshi-Velez and B. Kim. 2017. Towards A Rigorous Science of Interpretable Machine Learning. arXiv preprint arXiv:1702.08608Google ScholarGoogle Scholar
  12. J. N. Druckman. 2001. Using credible advice to overcome framing effects. Journal of Law, Economics, and Organization, 17(1), 62--82.Google ScholarGoogle ScholarCross RefCross Ref
  13. M. Dufwenberg, S. Gächter, and H. Henning-Schmidt. 2006. The framing of games and the psychology of strategic choice (No. 19/2006). Bonn Econ Discussion Papers.Google ScholarGoogle Scholar
  14. S. Feng and J. Boyd-Graber. 2019, March. What can ai do for me? evaluating machine learning interpretations in cooperative play. In Proceedings of the 24th International Conference on Intelligent User Interfaces (pp. 229--239).Google ScholarGoogle Scholar
  15. L. B. Fulton, J. Y. Lee, Q. Wang, Z. Yuan, J. Hammer, and A. Perer. 2020, April. Getting playful with explainable ai: Games with a purpose to improve human understanding of ai. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (pp. 1--8).Google ScholarGoogle Scholar
  16. A. Furnham and H. C. Boo. 2011. A literature review of the anchoring effect. The journal of socio-economics, 40(1), 35--42.Google ScholarGoogle Scholar
  17. C. Garvie 2019. Garbage In, Garbage Out | Face Recognition on Flawed Data. [Online]. Available: https://www.flawedfacedata.com/Google ScholarGoogle Scholar
  18. D.Y. Geiskkovitch, D. Cormier, S.H. Seo, and J.E Young. 2016. Please continue, we need more data: an exploration of obedience to robots. J. Hum.-Robot Interact. 5, 1 (March 2016), 82--99. DOI: https://doi.org/10.5898/JHRI.5.1.GeiskkovitchGoogle ScholarGoogle Scholar
  19. Katy Ilonka Gero, Zahra Ashktorab, Casey Dugan, Qian Pan, James Johnson, Werner Geyer, Maria Ruiz, Sarah Miller, David R. Millen, Murray Campbell, Sadhana Kumaravel, and Wei Zhang. 2020. Mental Models of AI Agents in a Cooperative Game Setting. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI '20). Association for Computing Machinery, New York, NY, USA, 1--12. DOI: https://doi.org/10.1145/3313831.3376316Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. J. Habgood and S. E. Ainsworth. 2011. Motivating children to learn effectively: Exploring the value of intrinsic integration in educational games. The Journal of the Learning Sciences, 20(2), 169--206.Google ScholarGoogle ScholarCross RefCross Ref
  21. M. J. Habgood, S. E. Ainsworth, and S. Benford. 2005. Endogenous fantasy and learning in digital games. Simulation & Gaming, 36(4), 483--498.Google ScholarGoogle ScholarCross RefCross Ref
  22. D. Journet. 2007. Narrative, Action, and Learning: The Stories of Myst. In: Selfe C.L., Hawisher G.E., Van Ittersum D. (eds) Gaming Lives in the Twenty-First Century. Palgrave Macmillan, New York. https://doi.org/10.1057/9780230601765_6Google ScholarGoogle Scholar
  23. J. Juul. 2010. The game, the player, the world: Looking for a heart of gameness. Plurais Revista Multidisciplinar, 1(2).Google ScholarGoogle Scholar
  24. R. Kennedy, S. Clifford, T. Burleigh, P. D. Waggoner, R. Jewell, and N. J. Winter. 2020. The shape of and solutions to the MTurk quality crisis. Political Science Research and Methods, 8(4), 614--629.Google ScholarGoogle Scholar
  25. Y. Kou and X. Gui. 2020. Mediating Community-AI Interaction through Situated Explanation: The Case of AI-Led Moderation. Proceedings of the ACM on Human-Computer Interaction, 4(CSCW2), 1--27.Google ScholarGoogle Scholar
  26. A. Kühberger. 1998. The influence of framing on risky decisions: A meta-analysis. Organizational Behavior and Human Decision Processes. 75, 1 (1998), 23--55. DOI: https://doi.org/https://doi.org/10.1006/obhd.1998.2781Google ScholarGoogle ScholarCross RefCross Ref
  27. V. Lai, and C. Tan. 2019. On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection, arXiv preprint arXiv:1811.07901Google ScholarGoogle Scholar
  28. N. Lane and N. R. Prestopnik 2017, October. Diegetic connectivity: blending work and play with storytelling in serious games. In Proceedings of the Annual Symposium on Computer-Human Interaction in Play (pp. 229--240).Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. C. Madrigal. 2019. How a Feel-Good AI Story Went Wrong in Flint. [Online]. Available: https://www.theatlantic.com/technology/archive/2019/01/how-machine-learning-found-flints-lead-pipes/578692/Google ScholarGoogle Scholar
  30. P. Madumal, T. Miller, L. Sonenberg, and F. Vetere. 2019. A grounded interaction protocol for explainable artificial intelligence. arXiv preprint arXiv:1903.02409.Google ScholarGoogle Scholar
  31. T. Miller, P. Howe and L. Sonenberg. 2017. Explainable AI: Beware of inmates running the asylum or: How I learnt to stop worrying and love the social and behavioural sciences. arXiv preprint arXiv:1712.00547.Google ScholarGoogle Scholar
  32. M. Molineaux, D. Dannenhauer, and D. W. Aha. 2018, January. Towards Explainable NPCs: A Relational Exploration Learning Agent. In AAAI Workshops (pp. 565--569).Google ScholarGoogle Scholar
  33. K. J. Mullinix, T. J. Leeper, J. N. Druckman, and J. Freese. 2015. The generalizability of survey experiments. Journal of Experimental Political Science, 2(2), 109--138.Google ScholarGoogle ScholarCross RefCross Ref
  34. M. Narayanan, E. Chen, J. He, B. Kim, S. Gershman, and F. Doshi-Velez. 2018. How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation. arXiv preprint arXiv:1802.00682 (2018).Google ScholarGoogle Scholar
  35. S. Y. Okita, J. Bailenson, and D. L. Schwartz. 2007. The mere belief of social interaction improves learning. In Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 29, No. 29).Google ScholarGoogle Scholar
  36. E. Peer, J. Vosgerau, and A. Acquisti. 2014. Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behavior research methods, 46(4), 1023--1031.Google ScholarGoogle Scholar
  37. M.T. Ribeiro, S. Singh, and C. Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). Association for Computing Machinery, New York, NY, USA, 1135--1144. DOI: https://doi.org/10.1145/2939672.2939778Google ScholarGoogle Scholar
  38. D. Rumeser and M. Emsley. 2019. Can serious games improve project management decision making under complexity?. Project Management Journal, 50(1), 23--39.Google ScholarGoogle ScholarCross RefCross Ref
  39. S. Samat and A. Acquisti. 2017. Format vs. content: the impact of risk and presentation on disclosure decisions. In Thirteenth Symposium on Usable Privacy and Security ({SOUPS} 2017) (pp. 377--384).Google ScholarGoogle Scholar
  40. K. Schrier. 2019. Designing Games for Moral Learning and Knowledge Building. Games and Culture. 2019;14(4):306--343. doi:10.1177/1555412017711514Google ScholarGoogle Scholar
  41. C. A. Steinkuehler. 2004. Learning in massively multiplayer online games.Google ScholarGoogle Scholar
  42. C. Steinkuehler and S. Duncan. 2008. Scientific habits of mind in virtual worlds. Journal of Science Education and Technology, 17(6), 530--543.Google ScholarGoogle ScholarCross RefCross Ref
  43. K. A. Thomas and S. Clifford. 2017. Validity and Mechanical Turk: An assessment of exclusion methods and interactive experiments. Computers in Human Behavior, 77, 184--197.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Matt Turek. 2019. Explainable Artificial Intelligence (XAI). [Online]. Available: https://www.darpa.mil/program/explainable-artificial-intelligenceGoogle ScholarGoogle Scholar
  45. A. Tversky and D. Kahneman. 1981. The framing of decisions and the psychology of choice. science, 211(4481), 453--458.Google ScholarGoogle Scholar
  46. J. Villareale and J. Zhu. 2021. Understanding Mental Models of AI through Player-AI Interaction. arXiv preprint arXiv:2103.16168Google ScholarGoogle Scholar
  47. D. Wang, Q. Yang, A. Abdul, and B. Y. Lim. 2019, May. Designing theory-driven user-centric explainable AI. In Proceedings of the 2019 CHI conference on human factors in computing systems (pp. 1--15).Google ScholarGoogle Scholar
  48. J. D. Weinberg, J. Freese, and D. McElhattan. 2014. Comparing data characteristics and results of an online factorial survey between a population-based and a crowdsource-recruited sample. Sociological Science, 1.Google ScholarGoogle Scholar
  49. M. Yin, J.W. Vaughan, and H. Wallach. 2019. Understanding the Effect of Accuracy on Trust in Machine Learning Models. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4--9, 2019, Glasgow, Scotland.Google ScholarGoogle Scholar

Index Terms

  1. Play for Real(ism) - Using Games to Predict Human-AI interactions in the Real World

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!