Abstract
The increased reliance on algorithmic decision-making in socially impactful processes has intensified the calls for algorithms that are unbiased and procedurally fair. Identifying fair predictors is an essential step in the construction of equitable algorithms, but the lack of ground-truth in fair predictor selection makes this a challenging task. In our study, we recruit 90 crowdworkers to judge the inclusion of various predictors for recidivism. We divide participants across three conditions with varying group composition. Our results show that participants were able to make informed decisions on predictor selection. We find that agreement with the majority vote is higher when participants are part of a more diverse group. The presented workflow, which provides a scalable and practical approach to reach a diverse audience, allows researchers to capture participants' perceptions of fairness in private while simultaneously allowing for structured participant discussion.
Supplemental Material
Available for Download
Code for Crowdsourcing Perceptions of Fair Predictors for Machine Learning: A Recidivism Case Study This repo contains code for an automated Slack bot as used in the CSCW 2019 publication Crowdsourcing Perceptions of Fair Predictors for Machine Learning: A Recidivism Case Study. The Slack bot is built with Botkit. See 'skills/application.js' for the main functionality of the bot. The bot structures conversation among participants, keeps track of voting behaviour, listens to a number of commands (e.g., 'ready'), and presents a final survey upon completion of the study. The application also takes care of loading and presenting images, in-line voting buttons, and eventual database storage. All interaction between participant(s) and the bot take place within a Slack channel previously set up by the researcher. Set up your database details and Slack application credentials in the .env file. Please see readme-botkit.md for additional information on configuring Botkit.
- Ashraf Abdul, Jo Vermeulen, Danding Wang, Brian Y. Lim, and Mohan Kankanhalli. 2018. Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 582, 18 pages. https://doi.org/10.1145/3173574.3174156Google Scholar
Digital Library
- Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine Bias: There's Software Used Across the Country to Predict Future Criminals. And it's Biased Against Blacks. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing Retrieved June 14, 2018 fromGoogle Scholar
- Jonas Auda, Dominik Weber, Alexandra Voit, and Stefan Schneegass. 2018. Understanding User Preferences Towards Rule-based Notification Deferral. In Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems (CHI EA '18). ACM, New York, NY, USA, Article LBW584, 6 pages. https://doi.org/10.1145/3170427.3188688Google Scholar
- Adam J. Berinsky, Gregory A. Huber, and Gabriel S. Lenz. 2012. Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk. Political Analysis, Vol. 20, 3 (2012), 351--368. https://doi.org/10.1093/pan/mpr057Google Scholar
Cross Ref
- Philippe Besnard and Anthony Hunter. 2008. Elements of Argumentation .The MIT Press. 298 pages.Google Scholar
Digital Library
- Reuben Binns, Max Van Kleek, Michael Veale, Ulrik Lyngs, Jun Zhao, and Nigel Shadbolt. 2018. 'It's Reducing a Human Being to a Percentage': Perceptions of Justice in Algorithmic Decisions. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 377, 14 pages. https://doi.org/10.1145/3173574.3173951Google Scholar
Digital Library
- Toon Calders and Sicco Verwer. 2010. Three naive Bayes approaches for discrimination-free classification. Data Mining and Knowledge Discovery, Vol. 21, 2 (01 Sep 2010), 277--292. https://doi.org/10.1007/s10618-010-0190-xGoogle Scholar
Digital Library
- Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '15). ACM, New York, NY, USA, 1721--1730. https://doi.org/10.1145/2783258.2788613Google Scholar
Digital Library
- Raj Chetty, Nathaniel Hendren, Patrick Kline, and Emmanuel Saez. 2014. Where is the Land of Opportunity? The Geography of Intergenerational Mobility in the United States. Working Paper 19843. National Bureau of Economic Research. https://doi.org/10.3386/w19843Google Scholar
- R. E. Davis, M. P. Couper, N. K. Janz, C. H. Caldwell, and K. Resnicow. 2010. Interviewer effects in public health surveys. Health Education Research, Vol. 25, 1 (2010), 14--26. https://doi.org/10.1093/her/cyp046Google Scholar
Cross Ref
- Sarah Desmarais and Jay Singh. 2013. Risk assessment instruments validated and implemented in correctional settings in the United States. Lexington: Council of State Governments (2013).Google Scholar
- Berkeley J. Dietvorst, Joseph P. Simmons, and Cade Massey. 2018. Overcoming Algorithm Aversion: People Will Use Imperfect Algorithms If They Can (Even Slightly) Modify Them. Management Science, Vol. 64 (2018), 1155--1170. Issue 3. https://doi.org/10.1287/mnsc.2016.2643Google Scholar
Digital Library
- Julia Dressel and Hany Farid. 2018. The accuracy, fairness, and limits of predicting recidivism. Science Advances, Vol. 4, 1 (2018). https://doi.org/10.1126/sciadv.aao5580Google Scholar
- Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness Through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS '12). ACM, New York, NY, USA, 214--226. https://doi.org/10.1145/2090236.2090255Google Scholar
Digital Library
- Sorelle A. Friedler, Carlos Scheidegger, Suresh Venkatasubramanian, Sonam Choudhary, Evan P. Hamilton, and Derek Roth. 2019. A Comparative Study of Fairness-enhancing Interventions in Machine Learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT* '19). ACM, New York, NY, USA, 329--338. https://doi.org/10.1145/3287560.3287589Google Scholar
Digital Library
- Antoine M. Garibaldi. 2014. The Expanding Gender and Racial Gap in American Higher Education. The Journal of Negro Education, Vol. 83, 3 (2014), 371--384. https://doi.org/10.7709/jnegroeducation.83.3.0371Google Scholar
Cross Ref
- Leilani H. Gilpin, David Bau, Ben Z. Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. 2018. Explaining Explanations: An Approach to Evaluating Interpretability of Machine Learning. CoRR (2018). http://arxiv.org/abs/1806.00069Google Scholar
- John C. Gower. 1971. A General Coefficient of Similarity and Some of Its Properties. Biometrics, Vol. 27, 4 (1971), 857--871.Google Scholar
Cross Ref
- Ben Green and Yiling Chen. 2019. Disparate Interactions: An Algorithm-in-the-Loop Analysis of Fairness in Risk Assessments. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT* '19). ACM, New York, NY, USA, 90--99. https://doi.org/10.1145/3287560.3287563Google Scholar
Digital Library
- Nina Grgiç-Hlavca, Elissa M. Redmiles, Krishna P. Gummadi, and Adrian Weller. 2018. Human Perceptions of Fairness in Algorithmic Decision Making: A Case Study of Criminal Risk Prediction. In Proceedings of the 2018 World Wide Web Conference (WWW '18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 903--912. https://doi.org/10.1145/3178876.3186138Google Scholar
Digital Library
- Nina Grgiç-Hlavca, Muhammad Bilal Zafar, Krishna P. Gummadi, and Adrian Weller. 2018. Beyond Distributive Fairness in Algorithmic Decision Making: Feature Selection for Procedurally Fair Learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA .Google Scholar
- Moritz Hardt, Eric Price, and Nathan Srebro. 2016. Equality of Opportunity in Supervised Learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS'16). Curran Associates Inc., USA, 3323--3331. http://dl.acm.org/citation.cfm?id=3157382.3157469Google Scholar
Digital Library
- Trevor J. Hastie and Robert J. Tibshirani. 1990. Generalized Additive Models. Monographs on Statistics and Applied Probability, Vol. 43. Chapman & Hall, London. 352 pages.Google Scholar
- Simo Johannes Hosio, Jaro Karppinen, Esa-Pekka Takala, Jani Takatalo, Jorge Goncalves, Niels van Berkel, Shin'ichi Konomi, and Vassilis Kostakos. 2018. Crowdsourcing Treatments for Low Back Pain. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 276, 12 pages. https://doi.org/10.1145/3173574.3173850Google Scholar
Digital Library
- Ben Hutchinson and Margaret Mitchell. 2019. 50 Years of Test (Un)Fairness: Lessons for Machine Learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT* '19). ACM, New York, NY, USA, 49--58. https://doi.org/10.1145/3287560.3287600Google Scholar
Digital Library
- Ellora Thadaney Israni. 2017. When an Algorithm Helps Send You to Prison. https://www.nytimes.com/2017/10/26/opinion/algorithm-compas-sentencing-bias.html Retrieved June 2, 2018 fromGoogle Scholar
- William A. Kahn. 1990. Psychological Conditions of Personal Engagement and Disengagement at Work. Academy of Management Journal, Vol. 33, 4 (1990), 692--724. https://doi.org/10.5465/256287Google Scholar
Cross Ref
- Juho Kim, Eun-Young Ko, Jonghyuk Jung, Chang Won Lee, Nam Wook Kim, and Jihee Kim. 2015. Factful: Engaging Taxpayers in the Public Discussion of a Government Budget. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). ACM, New York, NY, USA, 2843--2852. https://doi.org/10.1145/2702123.2702352Google Scholar
Digital Library
- Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing User Studies with Mechanical Turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '08). ACM, New York, NY, USA, 453--456. https://doi.org/10.1145/1357054.1357127Google Scholar
Digital Library
- Aniket Kittur, Jeffrey V. Nickerson, Michael Bernstein, Elizabeth Gerber, Aaron Shaw, John Zimmerman, Matt Lease, and John Horton. 2013. The Future of Crowd Work. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work (CSCW '13). ACM, New York, NY, USA, 1301--1318. https://doi.org/10.1145/2441776.2441923Google Scholar
Digital Library
- Lorenz Cuno Klopfenstein, Saverio Delpriori, Silvia Malatini, and Alessandro Bogliolo. 2017. The Rise of Bots: A Survey of Conversational Interfaces, Patterns, and Paradigms. In Proceedings of the 2017 Conference on Designing Interactive Systems (DIS '17). ACM, New York, NY, USA, 555--565. https://doi.org/10.1145/3064663.3064672Google Scholar
Digital Library
- Travis Kriplean, Jonathan Morgan, Deen Freelon, Alan Borning, and Lance Bennett. 2012. Supporting Reflective Public Thought with ConsiderIt. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (CSCW '12). ACM, New York, NY, USA, 265--274. https://doi.org/10.1145/2145204.2145249Google Scholar
Digital Library
- Todd Kulesza, Weng-Keen Wong, Simone Stumpf, Stephen Perona, Rachel White, Margaret M. Burnett, Ian Oberst, and Andrew J. Ko. 2009. Fixing the Program My Computer Learned: Barriers for End Users, Challenges for the Machine. In Proceedings of the 14th International Conference on Intelligent User Interfaces (IUI '09). ACM, New York, NY, USA, 187--196. https://doi.org/10.1145/1502650.1502678Google Scholar
- M. Lee, L.E. Frank, F. Beute, Y.A.W. de Kort, and W.A. IJsselsteijn. 2017. Bots mind the social-technical gap. In Proceedings of 15th European Conference on Computer-Supported Cooperative Work, 28 August - 1 September 2017, Sheffield, United Kingdom (Reports of the European Society for Socially Embedded Technologies). European Society for Socially Embedded Technologies (EUSSET), 35--54. https://doi.org/10.18420/ecscw2017--14Google Scholar
- Min Kyung Lee and Su Baykal. 2017. Algorithmic Mediation in Group Decisions: Fairness Perceptions of Algorithmically Mediated vs. Discussion-Based Social Division. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW '17). ACM, New York, NY, USA, 1035--1048. https://doi.org/10.1145/2998181.2998230Google Scholar
Digital Library
- Min Kyung Lee, Ji Tae Kim, and Leah Lizarondo. 2017. A Human-Centered Approach to Algorithmic Services: Considerations for Fair and Motivating Smart Community Service Management That Allocates Donations to Non-Profit Organizations. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 3365--3376. https://doi.org/10.1145/3025453.3025884Google Scholar
Digital Library
- Bruno Lepri, Nuria Oliver, Emmanuel Letouzé, Alex Pentland, and Patrick Vinck. 2018. Fair, Transparent, and Accountable Algorithmic Decision-making Processes. Philosophy & Technology, Vol. 31, 4 (01 Dec 2018), 611--627. https://doi.org/10.1007/s13347-017-0279-xGoogle Scholar
Cross Ref
- Kevin E. Levay, Jeremy Freese, and James N. Druckman. 2016. The Demographic and Political Composition of Mechanical Turk Samples. SAGE Open, Vol. 6, 1 (2016), 1--17. https://doi.org/10.1177/2158244016636433Google Scholar
Cross Ref
- Ewa Luger and Abigail Sellen. 2016. "Like Having a Really Bad PA": The Gulf Between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA, 5286--5297. https://doi.org/10.1145/2858036.2858288Google Scholar
Digital Library
- Tim Miller. 2019. Explanation in Artificial Intelligence: Insights from the Social Sciences. Artifical Intelligence, Vol. 267 (February 2019), 1--38. https://doi.org/10.1016/j.artint.2018.07.007Google Scholar
Cross Ref
- Ritesh Noothigattu, Snehalkumar (Neil) S. Gaikwad, Edmond Awad, Sohan Dsouza, Iyad Rahwan, Pradeep Ravikumar, and Ariel D. Procaccia. 2017. A Voting-Based System for Ethical Decision Making. CoRR, Vol. abs/1709.06692 (2017). http://arxiv.org/abs/1709.06692Google Scholar
- Gabriele Paolacci and Jesse Chandler. 2014. Inside the Turk: Understanding Mechanical Turk as a Participant Pool. Current Directions in Psychological Science, Vol. 23, 3 (2014), 184--188. https://doi.org/10.1177/0963721414531598Google Scholar
Cross Ref
- Emilee Rader, Kelley Cotter, and Janghee Cho. 2018. Explanations As Mechanisms for Supporting Algorithmic Transparency. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 103, 13 pages. https://doi.org/10.1145/3173574.3173677Google Scholar
Digital Library
- Niloufar Salehi, Andrew McCabe, Melissa Valentine, and Michael Bernstein. 2017. Huddler: Convening Stable and Familiar Crowd Teams Despite Unpredictable Availability. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW '17). ACM, New York, NY, USA, 1700--1713. https://doi.org/10.1145/2998181.2998300Google Scholar
Digital Library
- M. Six Silberman, Lilly Irani, and Joel Ross. 2010. Ethics and Tactics of Professional Crowdwork. XRDS, Vol. 17, 2 (2010), 39--43. https://doi.org/10.1145/1869086.1869100Google Scholar
Digital Library
- Eleanor Singer, Martin R. Frankel, and Marc B. Glassman. 1983. The Effect of Interviewer Characteristics and Expectations on Response. The Public Opinion Quarterly, Vol. 47, 1 (1983), 68--83. http://www.jstor.org/stable/2748706Google Scholar
Cross Ref
- Simone Stumpf, Vidya Rajaram, Lida Li, Weng-Keen Wong, Margaret Burnett, Thomas Dietterich, Erin Sullivan, and Jonathan Herlocker. 2009. Interacting Meaningfully with Machine Learning Systems: Three Experiments. International Journal of Human-Computer Studies, Vol. 67, 8 (2009), 639--662. https://doi.org/10.1016/j.ijhcs.2009.03.004Google Scholar
Digital Library
- David R. Thomas. 2006. A General Inductive Approach for Analyzing Qualitative Evaluation Data. American Journal of Evaluation, Vol. 27, 2 (2006), 237--246. https://doi.org/10.1177/1098214005283748Google Scholar
Cross Ref
- Edward R. Tufte. 1986. The Visual Display of Quantitative Information .Graphics Press, Cheshire, CT, USA. 200 pages.Google Scholar
Digital Library
- Sherry Turkle. 1995. Life on the Screen: Identity in the Age of the Internet .Simon & Schuster Trade. 352 pages.Google Scholar
Digital Library
- U.S. Census Bureau. 2017. Press Kit: 2015 National Content Test. https://www.census.gov/newsroom/press-kits/2017/nct.html Retrieved August 27, 2018 fromGoogle Scholar
- U.S. Census Bureau. 2018. Annual Estimates of the Resident Population by Sex, Race Alone or in Combination, and Hispanic Origin for the United States, States, and Counties. U.S. Census Bureau, Population Division. https://www.census.gov/newsroom/press-kits/2018/estimates-characteristics.htmlGoogle Scholar
- Melissa A. Valentine, Daniela Retelny, Alexandra To, Negar Rahmati, Tulsee Doshi, and Michael S. Bernstein. 2017. Flash Organizations: Crowdsourcing Complex Work by Structuring Crowds As Organizations. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 3523--3537. https://doi.org/10.1145/3025453.3025811Google Scholar
Digital Library
- Daan van Knippenberg and Michaé la C. Schippers. 2007. Work Group Diversity. Annual Review of Psychology, Vol. 58, 1 (2007), 515--541. https://doi.org/10.1146/annurev.psych.58.110405.085546 PMID: 16903805.Google Scholar
Cross Ref
- Michael Veale, Max Van Kleek, and Reuben Binns. 2018. Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 440, 14 pages. https://doi.org/10.1145/3173574.3174014Google Scholar
Digital Library
- Anthony W. Flores, Kristin Bechtel, and Christopher Lowenkamp. 2016. False Positives, False Negatives, and False Analyses: A Rejoinder to "Machine Bias: There's Software Used Across the Country to Predict Future Criminals. And it's Biased Against Blacks."., Vol. 80 (2016).Google Scholar
- M. Whitty and J. Gavin. 2001. Age/Sex/Location: Uncovering the Social Cues in the Development of Online Relationships. Cyberpsychology & Behavior, Vol. 4, 5 (2001), 623--630.Google Scholar
Cross Ref
- Katherine Y. Williams and Charles A. O'Reilly. 1998. Demography and Diversity in Organizations: A Review of 40 Years of Research. Research in Organizational Behavior, Vol. 20 (1998), 77--140.Google Scholar
- S.N Wood. 2017. Generalized Additive Models: An Introduction with R 2 ed.). Chapman and Hall/CRC.Google Scholar
- Allison Woodruff, Sarah E. Fox, Steven Rousso-Schindler, and Jeffrey Warshaw. 2018. A Qualitative Exploration of Perceptions of Algorithmic Fairness. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 656, 14 pages. https://doi.org/10.1145/3173574.3174230Google Scholar
Digital Library
- Anbang Xu, Shih-Wen Huang, and Brian Bailey. 2014. Voyant: Generating Structured Feedback on Visual Designs Using a Crowd of Non-experts. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW '14). ACM, New York, NY, USA, 1433--1444. https://doi.org/10.1145/2531602.2531604Google Scholar
Digital Library
- Naomi Zack. 2015. White Privilege and Black Rights: The Injustice of US Police Racial Profiling and Homicide .Rowman & Littlefield.Google Scholar
- Sharon Zhou, Melissa Valentine, and Michael S. Bernstein. 2018. In Search of the Dream Team: Temporally Constrained Multi-Armed Bandits for Identifying Effective Team Structures. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 108, 13 pages. https://doi.org/10.1145/3173574.3173682Google Scholar
- Haiyi Zhu, Bowen Yu, Aaron Halfaker, and Loren Terveen. 2018. Value-Sensitive Algorithm Design: Method, Case Study, and Lessons. Proc. ACM Hum.-Comput. Interact., Vol. 2, CSCW, Article 194 (Nov. 2018), 23 pages. https://doi.org/10.1145/3274463Google Scholar
Digital Library
Index Terms
Crowdsourcing Perceptions of Fair Predictors for Machine Learning: A Recidivism Case Study
Recommendations
Effect of Information Presentation on Fairness Perceptions of Machine Learning Predictors
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing SystemsThe uptake of artificial intelligence-based applications raises concerns about the fairness and transparency of AI behaviour. Consequently, the Computer Science community calls for the involvement of the general public in the design and evaluation of ...
Performance evaluation of a fair backoff algorithm for IEEE 802.11 DFWMAC
MobiHoc '02: Proceedings of the 3rd ACM international symposium on Mobile ad hoc networking & computingDue to hidden terminals and a dynamic topology, contention among stations in an ad-hoc network is not homogeneous. Some stations are at a disadvantage in opportunity of access to the shared channel and can suffer severe throughput degradation when the ...
Bias and Fairness in Multimodal Machine Learning: A Case Study of Automated Video Interviews
ICMI '21: Proceedings of the 2021 International Conference on Multimodal InteractionWe introduce the psychometric concepts of bias and fairness in a multimodal machine learning context assessing individuals’ hireability from prerecorded video interviews. We collected interviews from 733 participants and hireability ratings from a panel ...






Comments