skip to main content
research-article

Crowdsourcing Perceptions of Fair Predictors for Machine Learning: A Recidivism Case Study

Published:07 November 2019Publication History
Skip Abstract Section

Abstract

The increased reliance on algorithmic decision-making in socially impactful processes has intensified the calls for algorithms that are unbiased and procedurally fair. Identifying fair predictors is an essential step in the construction of equitable algorithms, but the lack of ground-truth in fair predictor selection makes this a challenging task. In our study, we recruit 90 crowdworkers to judge the inclusion of various predictors for recidivism. We divide participants across three conditions with varying group composition. Our results show that participants were able to make informed decisions on predictor selection. We find that agreement with the majority vote is higher when participants are part of a more diverse group. The presented workflow, which provides a scalable and practical approach to reach a diverse audience, allows researchers to capture participants' perceptions of fairness in private while simultaneously allowing for structured participant discussion.

Skip Supplemental Material Section

Supplemental Material

References

  1. Ashraf Abdul, Jo Vermeulen, Danding Wang, Brian Y. Lim, and Mohan Kankanhalli. 2018. Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 582, 18 pages. https://doi.org/10.1145/3173574.3174156Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine Bias: There's Software Used Across the Country to Predict Future Criminals. And it's Biased Against Blacks. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing Retrieved June 14, 2018 fromGoogle ScholarGoogle Scholar
  3. Jonas Auda, Dominik Weber, Alexandra Voit, and Stefan Schneegass. 2018. Understanding User Preferences Towards Rule-based Notification Deferral. In Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems (CHI EA '18). ACM, New York, NY, USA, Article LBW584, 6 pages. https://doi.org/10.1145/3170427.3188688Google ScholarGoogle Scholar
  4. Adam J. Berinsky, Gregory A. Huber, and Gabriel S. Lenz. 2012. Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk. Political Analysis, Vol. 20, 3 (2012), 351--368. https://doi.org/10.1093/pan/mpr057Google ScholarGoogle ScholarCross RefCross Ref
  5. Philippe Besnard and Anthony Hunter. 2008. Elements of Argumentation .The MIT Press. 298 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Reuben Binns, Max Van Kleek, Michael Veale, Ulrik Lyngs, Jun Zhao, and Nigel Shadbolt. 2018. 'It's Reducing a Human Being to a Percentage': Perceptions of Justice in Algorithmic Decisions. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 377, 14 pages. https://doi.org/10.1145/3173574.3173951Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Toon Calders and Sicco Verwer. 2010. Three naive Bayes approaches for discrimination-free classification. Data Mining and Knowledge Discovery, Vol. 21, 2 (01 Sep 2010), 277--292. https://doi.org/10.1007/s10618-010-0190-xGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  8. Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '15). ACM, New York, NY, USA, 1721--1730. https://doi.org/10.1145/2783258.2788613Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Raj Chetty, Nathaniel Hendren, Patrick Kline, and Emmanuel Saez. 2014. Where is the Land of Opportunity? The Geography of Intergenerational Mobility in the United States. Working Paper 19843. National Bureau of Economic Research. https://doi.org/10.3386/w19843Google ScholarGoogle Scholar
  10. R. E. Davis, M. P. Couper, N. K. Janz, C. H. Caldwell, and K. Resnicow. 2010. Interviewer effects in public health surveys. Health Education Research, Vol. 25, 1 (2010), 14--26. https://doi.org/10.1093/her/cyp046Google ScholarGoogle ScholarCross RefCross Ref
  11. Sarah Desmarais and Jay Singh. 2013. Risk assessment instruments validated and implemented in correctional settings in the United States. Lexington: Council of State Governments (2013).Google ScholarGoogle Scholar
  12. Berkeley J. Dietvorst, Joseph P. Simmons, and Cade Massey. 2018. Overcoming Algorithm Aversion: People Will Use Imperfect Algorithms If They Can (Even Slightly) Modify Them. Management Science, Vol. 64 (2018), 1155--1170. Issue 3. https://doi.org/10.1287/mnsc.2016.2643Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Julia Dressel and Hany Farid. 2018. The accuracy, fairness, and limits of predicting recidivism. Science Advances, Vol. 4, 1 (2018). https://doi.org/10.1126/sciadv.aao5580Google ScholarGoogle Scholar
  14. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness Through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS '12). ACM, New York, NY, USA, 214--226. https://doi.org/10.1145/2090236.2090255Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sorelle A. Friedler, Carlos Scheidegger, Suresh Venkatasubramanian, Sonam Choudhary, Evan P. Hamilton, and Derek Roth. 2019. A Comparative Study of Fairness-enhancing Interventions in Machine Learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT* '19). ACM, New York, NY, USA, 329--338. https://doi.org/10.1145/3287560.3287589Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Antoine M. Garibaldi. 2014. The Expanding Gender and Racial Gap in American Higher Education. The Journal of Negro Education, Vol. 83, 3 (2014), 371--384. https://doi.org/10.7709/jnegroeducation.83.3.0371Google ScholarGoogle ScholarCross RefCross Ref
  17. Leilani H. Gilpin, David Bau, Ben Z. Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. 2018. Explaining Explanations: An Approach to Evaluating Interpretability of Machine Learning. CoRR (2018). http://arxiv.org/abs/1806.00069Google ScholarGoogle Scholar
  18. John C. Gower. 1971. A General Coefficient of Similarity and Some of Its Properties. Biometrics, Vol. 27, 4 (1971), 857--871.Google ScholarGoogle ScholarCross RefCross Ref
  19. Ben Green and Yiling Chen. 2019. Disparate Interactions: An Algorithm-in-the-Loop Analysis of Fairness in Risk Assessments. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT* '19). ACM, New York, NY, USA, 90--99. https://doi.org/10.1145/3287560.3287563Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Nina Grgiç-Hlavca, Elissa M. Redmiles, Krishna P. Gummadi, and Adrian Weller. 2018. Human Perceptions of Fairness in Algorithmic Decision Making: A Case Study of Criminal Risk Prediction. In Proceedings of the 2018 World Wide Web Conference (WWW '18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 903--912. https://doi.org/10.1145/3178876.3186138Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Nina Grgiç-Hlavca, Muhammad Bilal Zafar, Krishna P. Gummadi, and Adrian Weller. 2018. Beyond Distributive Fairness in Algorithmic Decision Making: Feature Selection for Procedurally Fair Learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA .Google ScholarGoogle Scholar
  22. Moritz Hardt, Eric Price, and Nathan Srebro. 2016. Equality of Opportunity in Supervised Learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS'16). Curran Associates Inc., USA, 3323--3331. http://dl.acm.org/citation.cfm?id=3157382.3157469Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Trevor J. Hastie and Robert J. Tibshirani. 1990. Generalized Additive Models. Monographs on Statistics and Applied Probability, Vol. 43. Chapman & Hall, London. 352 pages.Google ScholarGoogle Scholar
  24. Simo Johannes Hosio, Jaro Karppinen, Esa-Pekka Takala, Jani Takatalo, Jorge Goncalves, Niels van Berkel, Shin'ichi Konomi, and Vassilis Kostakos. 2018. Crowdsourcing Treatments for Low Back Pain. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 276, 12 pages. https://doi.org/10.1145/3173574.3173850Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ben Hutchinson and Margaret Mitchell. 2019. 50 Years of Test (Un)Fairness: Lessons for Machine Learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT* '19). ACM, New York, NY, USA, 49--58. https://doi.org/10.1145/3287560.3287600Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ellora Thadaney Israni. 2017. When an Algorithm Helps Send You to Prison. https://www.nytimes.com/2017/10/26/opinion/algorithm-compas-sentencing-bias.html Retrieved June 2, 2018 fromGoogle ScholarGoogle Scholar
  27. William A. Kahn. 1990. Psychological Conditions of Personal Engagement and Disengagement at Work. Academy of Management Journal, Vol. 33, 4 (1990), 692--724. https://doi.org/10.5465/256287Google ScholarGoogle ScholarCross RefCross Ref
  28. Juho Kim, Eun-Young Ko, Jonghyuk Jung, Chang Won Lee, Nam Wook Kim, and Jihee Kim. 2015. Factful: Engaging Taxpayers in the Public Discussion of a Government Budget. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). ACM, New York, NY, USA, 2843--2852. https://doi.org/10.1145/2702123.2702352Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing User Studies with Mechanical Turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '08). ACM, New York, NY, USA, 453--456. https://doi.org/10.1145/1357054.1357127Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Aniket Kittur, Jeffrey V. Nickerson, Michael Bernstein, Elizabeth Gerber, Aaron Shaw, John Zimmerman, Matt Lease, and John Horton. 2013. The Future of Crowd Work. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work (CSCW '13). ACM, New York, NY, USA, 1301--1318. https://doi.org/10.1145/2441776.2441923Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Lorenz Cuno Klopfenstein, Saverio Delpriori, Silvia Malatini, and Alessandro Bogliolo. 2017. The Rise of Bots: A Survey of Conversational Interfaces, Patterns, and Paradigms. In Proceedings of the 2017 Conference on Designing Interactive Systems (DIS '17). ACM, New York, NY, USA, 555--565. https://doi.org/10.1145/3064663.3064672Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Travis Kriplean, Jonathan Morgan, Deen Freelon, Alan Borning, and Lance Bennett. 2012. Supporting Reflective Public Thought with ConsiderIt. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (CSCW '12). ACM, New York, NY, USA, 265--274. https://doi.org/10.1145/2145204.2145249Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Todd Kulesza, Weng-Keen Wong, Simone Stumpf, Stephen Perona, Rachel White, Margaret M. Burnett, Ian Oberst, and Andrew J. Ko. 2009. Fixing the Program My Computer Learned: Barriers for End Users, Challenges for the Machine. In Proceedings of the 14th International Conference on Intelligent User Interfaces (IUI '09). ACM, New York, NY, USA, 187--196. https://doi.org/10.1145/1502650.1502678Google ScholarGoogle Scholar
  34. M. Lee, L.E. Frank, F. Beute, Y.A.W. de Kort, and W.A. IJsselsteijn. 2017. Bots mind the social-technical gap. In Proceedings of 15th European Conference on Computer-Supported Cooperative Work, 28 August - 1 September 2017, Sheffield, United Kingdom (Reports of the European Society for Socially Embedded Technologies). European Society for Socially Embedded Technologies (EUSSET), 35--54. https://doi.org/10.18420/ecscw2017--14Google ScholarGoogle Scholar
  35. Min Kyung Lee and Su Baykal. 2017. Algorithmic Mediation in Group Decisions: Fairness Perceptions of Algorithmically Mediated vs. Discussion-Based Social Division. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW '17). ACM, New York, NY, USA, 1035--1048. https://doi.org/10.1145/2998181.2998230Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Min Kyung Lee, Ji Tae Kim, and Leah Lizarondo. 2017. A Human-Centered Approach to Algorithmic Services: Considerations for Fair and Motivating Smart Community Service Management That Allocates Donations to Non-Profit Organizations. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 3365--3376. https://doi.org/10.1145/3025453.3025884Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Bruno Lepri, Nuria Oliver, Emmanuel Letouzé, Alex Pentland, and Patrick Vinck. 2018. Fair, Transparent, and Accountable Algorithmic Decision-making Processes. Philosophy & Technology, Vol. 31, 4 (01 Dec 2018), 611--627. https://doi.org/10.1007/s13347-017-0279-xGoogle ScholarGoogle ScholarCross RefCross Ref
  38. Kevin E. Levay, Jeremy Freese, and James N. Druckman. 2016. The Demographic and Political Composition of Mechanical Turk Samples. SAGE Open, Vol. 6, 1 (2016), 1--17. https://doi.org/10.1177/2158244016636433Google ScholarGoogle ScholarCross RefCross Ref
  39. Ewa Luger and Abigail Sellen. 2016. "Like Having a Really Bad PA": The Gulf Between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA, 5286--5297. https://doi.org/10.1145/2858036.2858288Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Tim Miller. 2019. Explanation in Artificial Intelligence: Insights from the Social Sciences. Artifical Intelligence, Vol. 267 (February 2019), 1--38. https://doi.org/10.1016/j.artint.2018.07.007Google ScholarGoogle ScholarCross RefCross Ref
  41. Ritesh Noothigattu, Snehalkumar (Neil) S. Gaikwad, Edmond Awad, Sohan Dsouza, Iyad Rahwan, Pradeep Ravikumar, and Ariel D. Procaccia. 2017. A Voting-Based System for Ethical Decision Making. CoRR, Vol. abs/1709.06692 (2017). http://arxiv.org/abs/1709.06692Google ScholarGoogle Scholar
  42. Gabriele Paolacci and Jesse Chandler. 2014. Inside the Turk: Understanding Mechanical Turk as a Participant Pool. Current Directions in Psychological Science, Vol. 23, 3 (2014), 184--188. https://doi.org/10.1177/0963721414531598Google ScholarGoogle ScholarCross RefCross Ref
  43. Emilee Rader, Kelley Cotter, and Janghee Cho. 2018. Explanations As Mechanisms for Supporting Algorithmic Transparency. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 103, 13 pages. https://doi.org/10.1145/3173574.3173677Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Niloufar Salehi, Andrew McCabe, Melissa Valentine, and Michael Bernstein. 2017. Huddler: Convening Stable and Familiar Crowd Teams Despite Unpredictable Availability. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW '17). ACM, New York, NY, USA, 1700--1713. https://doi.org/10.1145/2998181.2998300Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. M. Six Silberman, Lilly Irani, and Joel Ross. 2010. Ethics and Tactics of Professional Crowdwork. XRDS, Vol. 17, 2 (2010), 39--43. https://doi.org/10.1145/1869086.1869100Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Eleanor Singer, Martin R. Frankel, and Marc B. Glassman. 1983. The Effect of Interviewer Characteristics and Expectations on Response. The Public Opinion Quarterly, Vol. 47, 1 (1983), 68--83. http://www.jstor.org/stable/2748706Google ScholarGoogle ScholarCross RefCross Ref
  47. Simone Stumpf, Vidya Rajaram, Lida Li, Weng-Keen Wong, Margaret Burnett, Thomas Dietterich, Erin Sullivan, and Jonathan Herlocker. 2009. Interacting Meaningfully with Machine Learning Systems: Three Experiments. International Journal of Human-Computer Studies, Vol. 67, 8 (2009), 639--662. https://doi.org/10.1016/j.ijhcs.2009.03.004Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. David R. Thomas. 2006. A General Inductive Approach for Analyzing Qualitative Evaluation Data. American Journal of Evaluation, Vol. 27, 2 (2006), 237--246. https://doi.org/10.1177/1098214005283748Google ScholarGoogle ScholarCross RefCross Ref
  49. Edward R. Tufte. 1986. The Visual Display of Quantitative Information .Graphics Press, Cheshire, CT, USA. 200 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Sherry Turkle. 1995. Life on the Screen: Identity in the Age of the Internet .Simon & Schuster Trade. 352 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. U.S. Census Bureau. 2017. Press Kit: 2015 National Content Test. https://www.census.gov/newsroom/press-kits/2017/nct.html Retrieved August 27, 2018 fromGoogle ScholarGoogle Scholar
  52. U.S. Census Bureau. 2018. Annual Estimates of the Resident Population by Sex, Race Alone or in Combination, and Hispanic Origin for the United States, States, and Counties. U.S. Census Bureau, Population Division. https://www.census.gov/newsroom/press-kits/2018/estimates-characteristics.htmlGoogle ScholarGoogle Scholar
  53. Melissa A. Valentine, Daniela Retelny, Alexandra To, Negar Rahmati, Tulsee Doshi, and Michael S. Bernstein. 2017. Flash Organizations: Crowdsourcing Complex Work by Structuring Crowds As Organizations. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 3523--3537. https://doi.org/10.1145/3025453.3025811Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Daan van Knippenberg and Michaé la C. Schippers. 2007. Work Group Diversity. Annual Review of Psychology, Vol. 58, 1 (2007), 515--541. https://doi.org/10.1146/annurev.psych.58.110405.085546 PMID: 16903805.Google ScholarGoogle ScholarCross RefCross Ref
  55. Michael Veale, Max Van Kleek, and Reuben Binns. 2018. Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 440, 14 pages. https://doi.org/10.1145/3173574.3174014Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Anthony W. Flores, Kristin Bechtel, and Christopher Lowenkamp. 2016. False Positives, False Negatives, and False Analyses: A Rejoinder to "Machine Bias: There's Software Used Across the Country to Predict Future Criminals. And it's Biased Against Blacks."., Vol. 80 (2016).Google ScholarGoogle Scholar
  57. M. Whitty and J. Gavin. 2001. Age/Sex/Location: Uncovering the Social Cues in the Development of Online Relationships. Cyberpsychology & Behavior, Vol. 4, 5 (2001), 623--630.Google ScholarGoogle ScholarCross RefCross Ref
  58. Katherine Y. Williams and Charles A. O'Reilly. 1998. Demography and Diversity in Organizations: A Review of 40 Years of Research. Research in Organizational Behavior, Vol. 20 (1998), 77--140.Google ScholarGoogle Scholar
  59. S.N Wood. 2017. Generalized Additive Models: An Introduction with R 2 ed.). Chapman and Hall/CRC.Google ScholarGoogle Scholar
  60. Allison Woodruff, Sarah E. Fox, Steven Rousso-Schindler, and Jeffrey Warshaw. 2018. A Qualitative Exploration of Perceptions of Algorithmic Fairness. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 656, 14 pages. https://doi.org/10.1145/3173574.3174230Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Anbang Xu, Shih-Wen Huang, and Brian Bailey. 2014. Voyant: Generating Structured Feedback on Visual Designs Using a Crowd of Non-experts. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW '14). ACM, New York, NY, USA, 1433--1444. https://doi.org/10.1145/2531602.2531604Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Naomi Zack. 2015. White Privilege and Black Rights: The Injustice of US Police Racial Profiling and Homicide .Rowman & Littlefield.Google ScholarGoogle Scholar
  63. Sharon Zhou, Melissa Valentine, and Michael S. Bernstein. 2018. In Search of the Dream Team: Temporally Constrained Multi-Armed Bandits for Identifying Effective Team Structures. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 108, 13 pages. https://doi.org/10.1145/3173574.3173682Google ScholarGoogle Scholar
  64. Haiyi Zhu, Bowen Yu, Aaron Halfaker, and Loren Terveen. 2018. Value-Sensitive Algorithm Design: Method, Case Study, and Lessons. Proc. ACM Hum.-Comput. Interact., Vol. 2, CSCW, Article 194 (Nov. 2018), 23 pages. https://doi.org/10.1145/3274463Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Crowdsourcing Perceptions of Fair Predictors for Machine Learning: A Recidivism Case Study

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!