Abstract
Data is a crucial component of machine learning. The field is reliant on data to train, validate, and test models. With increased technical capabilities, machine learning research has boomed in both academic and industry settings, and one major focus has been on computer vision. Computer vision is a popular domain of machine learning increasingly pertinent to real-world applications, from facial recognition in policing to object detection for autonomous vehicles. Given computer vision's propensity to shape machine learning research and impact human life, we seek to understand disciplinary practices around dataset documentation - how data is collected, curated, annotated, and packaged into datasets for computer vision researchers and practitioners to use for model tuning and development. Specifically, we examine what dataset documentation communicates about the underlying values of vision data and the larger practices and goals of computer vision as a field. To conduct this study, we collected a corpus of about 500 computer vision datasets, from which we sampled 114 dataset publications across different vision tasks. Through both a structured and thematic content analysis, we document a number of values around accepted data practices, what makes desirable data, and the treatment of humans in the dataset construction process. We discuss how computer vision datasets authors value efficiency at the expense of care; universality at the expense of contextuality; impartiality at the expense of positionality; and model work at the expense of data work. Many of the silenced values we identify sit in opposition with social computing practices. We conclude with suggestions on how to better incorporate silenced values into the dataset creation and curation process.
- Rediet Abebe, Solon Barocas, Jon Kleinberg, Karen Levy, Manish Raghavan, and David G. Robinson. 2020. Roles for Computing in Social Change. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. ACM, Barcelona Spain, 252--260. https://doi.org/10.1145/3351095.3372871Google Scholar
Digital Library
- Shazia Afzal, Rajmohan C, Manish Kesarwani, Sameep Mehta, and Hima Patel. 2020. Data Readiness Report. arXiv:2010.07213 [cs] (Oct. 2020). arXiv:2010.07213 [cs]Google Scholar
- Philip E. Agre. 1998. Toward a Critical Technical Practice: Lessons Learned in Trying to Reform AI. Psychology Press.Google Scholar
- Elizabeth Anderson. 1995. Knowledge, human interests, and objectivity in feminist epistemology. Philosophical Topics 23, 2 (1995), 27--58.Google Scholar
Cross Ref
- Jane Anderson and Kimberly Christen. 2013. 'Chuck a copyright on It': dilemmas of digital return and the possibilities for traditional knowledge licenses and labels. Museum Anthropology Review 7, 1--2 (2013), 105.Google Scholar
- Carolyn Ashurst, Markus Anderljung, Carina Prunkl, Jan Leike, Yarin Gal, Toby Shevlane, and Allan Dafoe. 2020. A Guide to Writing the NeurIPS Impact Statement. (2020).Google Scholar
- Mariam Attia and Julian Edge. 2017. Be(Com)Ing a Reflexive Researcher: A Developmental Approach to Research Methodology. Open Review of Educational Research 4, 1 (Jan. 2017), 33--45. https://doi.org/10.1080/23265507.2017.1300068Google Scholar
- Shaowen Bardzell. 2010. Feminist HCI: Taking Stock and Outlining an Agenda for Design. In Proceedings of the 28th International Conference on Human Factors in Computing Systems - CHI '10. ACM Press, Atlanta, Georgia, USA, 1301. https://doi.org/10.1145/1753326.1753521Google Scholar
Digital Library
- Tony Becher. 1987. Disciplinary Discourse. Studies in Higher Education 12, 3 (Jan. 1987), 261--274. https://doi.org/10.1080/03075078712331378052Google Scholar
Cross Ref
- Emily M. Bender and Batya Friedman. 2018. Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science. Transactions of the Association for Computational Linguistics 6 (Dec. 2018), 587--604. https://doi.org/10.1162/tacl_a_00041Google Scholar
- Eli Blevis. 2007. Sustainable Interaction Design: Invention & Disposal, Renewal & Reuse. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems - CHI '07. ACM Press, San Jose, California, USA, 503--512. https://doi.org/10.1145/1240624.1240705Google Scholar
Digital Library
- Christine L Borgman. 2016. Big Data, Little Data, No Data: Scholarship in the Networked World. MIT press.Google Scholar
- Geoffrey C Bowker. 2005. Memory Practices in the Sciences. Mit Press Cambridge, MA.Google Scholar
- Geoffrey C Bowker and Susan Leigh Star. 2000. Sorting Things Out: Classification and Its Consequences. MIT Press.Google Scholar
Digital Library
- Ruth Breeze. 2011. Disciplinary Values in Legal Discourse: A Corpus Study. Ibérica, Revista de la Asociación Europea de Lenguas para Fines Específicos 21 (2011), 93--115.Google Scholar
- Tom Broens, Dick Quartel, and Marten van Sinderen. 2007. Capturing Context Requirements. In Smart Sensing and Context, Gerd Kortuem, Joe Finney, Rodger Lea, and Vasughi Sundramoorthy (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 223--238.Google Scholar
- Joy Buolamwini and Timnit Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In FAT*. 77--91.Google Scholar
- Stevie Chancellor, Eric P. S. Baumer, and Munmun De Choudhury. 2019. Who Is the "Human" in Human-Centered Machine Learning: The Case of Predicting Mental Health from Social Media. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (Nov. 2019), 1--32. https://doi.org/10.1145/3359249Google Scholar
Digital Library
- Kathy Charmaz. 2006. Constructing Grounded Theory: A Practical Guide through Qualitative Analysis. sage.Google Scholar
- Torkil Clemmensen and Kerstin Roese. 2010. An Overview of a Decade of Journal Publications about Culture and Human-Computer Interaction (HCI). In Human Work Interaction Design: Usability in Social, Cultural and Organizational Contexts, Dinesh Katre, Rikke Orngreen, Pradeep Yammiyavar, and Torkil Clemmensen (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 98--112.Google Scholar
- Harry Collins. 1992. Changing Order: Replication and Induction in Scientific Practice. University of Chicago Press.Google Scholar
- Danish Contractor, Daniel McDuff, Julia Haines, Brent Hecht, and Christopher Hines. [n.d.]. Responsible AI Licenses. https://www.licenses.ai/.Google Scholar
- Geoff Cooper and John Bowers. 1995. Representing the user: Notes on the disciplinary rhetoric of human-computer interaction. Cambridge Series on Human Computer Interaction (1995), 48--66.Google Scholar
- Sasha Costanza-Chock. 2020. Design Justice: Community-Led Practices to Build the Worlds We Need. The MIT Press, Cambridge, MA.Google Scholar
Digital Library
- Kate Crawford and Trevor Paglen. 2019. Excavating AI: The Politics of Images in Machine Learning Training Sets. Excavating AI (2019).Google Scholar
- Fred D. Davis. 1989. Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology. MIS Quarterly 13, 3 (1989), 319--340. http://www.jstor.org/stable/249008Google Scholar
Digital Library
- Terrance de Vries, Ishan Misra, Changhan Wang, and Laurens van der Maaten. 2019. Does Object Recognition Work for Everyone?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.Google Scholar
- Emily L. Denton, A. Hanna, Razvan Amironesei, Andrew Smart, Hilary Nicole, and M. Scheuerman. 2020. Bringing the People Back In: Contesting Benchmark Machine Learning Datasets. ICML Workshop on Participatory Approaches to Machine Learning (2020).Google Scholar
- Michael A. DeVito, Darren Gergle, and Jeremy Birnholtz. 2017. "Algorithms Ruin Everything": #RIPTwitter, Folk Theories, and Resistance to Algorithmic Change in Social Media. Association for Computing Machinery, New York, NY, USA, 3163--3174. https://doi.org/10.1145/3025453.3025659Google Scholar
Digital Library
- Jacob Dexe, Ulrik Franke, Anneli Avatare Nöu, and Alexander Rad. 2020. Towards Increased Transparency with Value Sensitive Design. In Artificial Intelligence in HCI, Helmut Degen and Lauren Reinerman-Jones (Eds.). Springer International Publishing, Cham, 3--15.Google Scholar
- Brian Dobreski. 2018. Toward a Value-Analytic Approach to Information Standards. Proceedings of the Association for Information Science and Technology 55, 1 (2018), 114--122. https://doi.org/10.1002/pra2.2018.14505501013 arXiv:https://asistdl.onlinelibrary.wiley.com/doi/pdf/10.1002/pra2.2018.14505501013Google Scholar
Cross Ref
- Dulhanty, Chris. 2020. Issues in Computer Vision Data Collection: Bias, Consent, and Label Taxonomy.Google Scholar
- Brianna Dym, Jed R. Brubaker, Casey Fiesler, and Bryan Semaan. 2019. "Coming Out Okay": Community Narratives for LGBTQ Identity Recovery Work. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (Nov. 2019), 1--28. https://doi.org/10.1145/3359256Google Scholar
Digital Library
- William Easley, Foad Hamidi, Wayne G. Lutters, and Amy Hurst. 2018. Shifting Expectations: Understanding Youth Employees' Handoffs in a 3D Print Shop. Proc. ACM Hum.-Comput. Interact. 2, CSCW, Article 47 (Nov. 2018), 23 pages. https://doi.org/10.1145/3274316Google Scholar
Digital Library
- Christiane Fellbaum. 2012. WordNet. In The Encyclopedia of Applied Linguistics, Carol Chapelle (Ed.). John Wiley & Sons, Inc., Hoboken, NJ, USA, wbeal1285. https://doi.org/10.1002/9781405198431.wbeal1285Google Scholar
- Karën Fort, Gilles Adda, and K Bretonnel Cohen. 2011. Amazon mechanical turk: Gold mine or coal mine? Computational Linguistics 37, 2 (2011), 413--420.Google Scholar
Digital Library
- Race Forward. 2015. Race Reporting Guide. Technical Report.Google Scholar
- Michel Foucault. 1990. The History of Sexuality: An Introduction. Vintage.Google Scholar
- Christopher Fox, Anany Levitin, and Thomas Redman. 1994. The Notion of Data and Its Quality Dimensions. Information Processing & Management 30, 1 (Jan. 1994), 9--19. https://doi.org/10.1016/0306--4573(94)90020--5Google Scholar
Digital Library
- Batya Friedman. 1996. Value-Sensitive Design. Interactions 3, 6 (Dec. 1996), 16--23. https://doi.org/10.1145/242485.242493Google Scholar
Digital Library
- Patricia Garcia and Marika Cifor. 2019. Expanding Our Reflexive Toolbox: Collaborative Possibilities for Examining Socio-Technical Systems Using Duoethnography. Proc. ACM Hum.-Comput. Interact. 3, CSCW, Article 190 (Nov. 2019), 23 pages. https://doi.org/10.1145/3359292Google Scholar
Digital Library
- Patricia Garcia, Tonia Sutherland, Marika Cifor, Anita Say Chan, Lauren Klein, Catherine D'Ignazio, and Niloufar Salehi. 2020. No: Critical Refusal as Feminist Data Practice. In Conference Companion Publication of the 2020 on Computer Supported Cooperative Work and Social Computing. ACM, Virtual Event USA, 199--202. https://doi.org/10.1145/3406865.3419014Google Scholar
Digital Library
- Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2020. Datasheets for Datasets. arXiv:1803.09010 [cs] (March 2020). arXiv:1803.09010 [cs]Google Scholar
- R. Stuart Geiger, Kevin Yu, Yanlai Yang, Mindy Dai, Jie Qiu, Rebekah Tang, and Jenny Huang. 2020. Garbage in, Garbage out? Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training Data Comes From?. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency.Google Scholar
Digital Library
- Yolanda Gil, Cédric H. David, Ibrahim Demir, Bakinam T. Essawy, Robinson W. Fulweiler, Jonathan L. Goodall, Leif Karlstrom, Huikyo Lee, Heath J. Mills, Ji-Hyun Oh, Suzanne A. Pierce, Allen Pope, Mimi W. Tzeng, Sandra R. Villamizar, and Xuan Yu. 2016. Toward the Geoscience Paper of the Future: Best practices for documenting and sharing research from data to software to provenance. Earth and Space Science 3, 10 (2016), 388--415. https://doi.org/10.1002/2015EA000136 arXiv:https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1002/2015EA000136Google Scholar
Cross Ref
- Lisa Gitelman (Ed.). 2013. "Raw Data" Is an Oxymoron. The MIT Press, Cambridge, Massachusetts ; London, England.Google Scholar
- Laurence Goldman. 2020. Social Impact Analysis: An Applied Anthropology Manual.Google Scholar
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.Google Scholar
Digital Library
- Mary L Gray and Siddharth Suri. 2019. Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass. Houghton Mifflin Harcourt.Google Scholar
- J. Grudin. 1994. Computer-Supported Cooperative Work: History and Focus. Computer 27, 5 (1994), 19--26. https://doi.org/10.1109/2.291294Google Scholar
Digital Library
- Alex Hanna, Emily Denton, Andrew Smart, and Jamila Smith-Loud. 2020. Towards a Critical Race Methodology in Algorithmic Fairness. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Dec. 2020). https://doi.org/10.1145/3351095.3372826Google Scholar
Digital Library
- Donna Haraway. 1988. Situated Knowledges: The Science Question in Feminism and the Privilege of Partial Perspective. Feminist studies 14, 3 (1988), 575--599.Google Scholar
- Gillian R. Hayes. 2011. The Relationship of Action Research to Human-Computer Interaction. ACM Transactions on Computer-Human Interaction 18, 3 (July 2011), 1--20. https://doi.org/10.1145/1993060.1993065Google Scholar
Digital Library
- Benjamin Heinzerling. 2019. NLP's Clever Hans Moment has Arrived. https://thegradient.pub/nlps-clever-hans-moment-has-arrived/. The Gradient (2019).Google Scholar
- Lisa Anne Hendricks, Kaylee Burns, Kate Saenko, Trevor Darrell, and Anna Rohrbach. 2018. Women Also Snowboard: Overcoming Bias in Captioning Models. In Computer Vision -- ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 793--811.Google Scholar
Digital Library
- James Hodge, Sarah Foley, Rens Brankaert, Gail Kenning, Amanda Lazar, Jennifer Boger, and Kellie Morrissey. 2020. Relational, Flexible, Everyday: Learning from Ethics in Dementia Research. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, Honolulu HI USA, 1--16. https://doi.org/10.1145/3313831.3376627Google Scholar
Digital Library
- Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielinski. 2018. The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards. arXiv:1805.03677 [cs] (May 2018). arXiv:1805.03677 [cs]Google Scholar
- Ben Hutchinson, Andrew Smart, Alex Hanna, Emily Denton, Christina Greer, Oddur Kjartansson, Parker Barnes, and Margaret Mitchell. 2021. Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure. Proceedings of the 2021 Conference on Fairness, Accountability, and Transparency (Oct. 2021).Google Scholar
Digital Library
- IBM. [n.d.]. IBM Data Privacy Passports. https://www.ibm.com/products/data-privacy-passports.Google Scholar
- Inter-University Consortium For Political And Social Research. 2012. Guide to Social Science Data Preparation and Archiving: Best Practice Throughout the Data Life Cycle. (2012). https://doi.org/10.3886/GUIDETOSOCIALSCIENCEDATAPREPARATIONANDARCHIVINGGoogle Scholar
- Lilly C Irani and M Six Silberman. 2013. Turkopticon: Interrupting Worker Invisibility in Amazon Mechanical Turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 611--620.Google Scholar
Digital Library
- Eun Seo Jo and Timnit Gebru. 2020. Lessons from Archives: Strategies for Collecting Sociocultural Data in Machine Learning. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAT* '20). Association for Computing Machinery, New York, NY, USA, 306--316. https://doi.org/10.1145/3351095.3372829Google Scholar
Digital Library
- Christine Kaeser-Chen, Elizabeth Dubois, Friederike Schüür, and Emanuel Moss. 2020. Positionality-Aware Machine Learning: Translation Tutorial. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* '20). Association for Computing Machinery, New York, NY, USA, 704. https://doi.org/10.1145/3351095.3375666Google Scholar
Digital Library
- Lucas Kempe-Cook, Stephen Tsung-Han Sher, and Norman Makoto Su. 2019. Behind the Voices: The Practice and Challenges of Esports Casters. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI '19). Association for Computing Machinery, New York, NY, USA, 1--12. https://doi.org/10.1145/3290605.3300795Google Scholar
Digital Library
- Os Keyes, Josephine Hoy, and Margaret Drouhard. 2019. Human-Computer Insurrection: Notes on an Anarchist HCI. arXiv:1908.06167 [cs] (Aug. 2019). arXiv:1908.06167 [cs]Google Scholar
- Mehtab Khan and Alex Hanna. 2020. The Legality of Computer Vision Datasets. Under review (2020).Google Scholar
- Colin Koopman. 2019. How We Became Our Data: A Genealogy of the Informational Person. The University of Chicago Press, Chicago.Google Scholar
- Klaus Krippendorff. 2018. Content Analysis: An Introduction to Its Methodology. Sage publications.Google Scholar
- Bruno Latour. 1987. Science in action: How to follow scientists and engineers through society. Harvard university press.Google Scholar
- Bruno Latour and Steve Woolgar. 1986. Laboratory life: The construction of scientific facts. Princeton University Press.Google Scholar
- Leib Litman, Jonathan Robinson, and Cheskie Rosenzweig. 2015. The Relationship between Motivation, Monetary Compensation, and Data Quality among US- and India-Based Workers on Mechanical Turk. Behavior Research Methods 47, 2 (June 2015), 519--528. https://doi.org/10.3758/s13428-014-0483-xGoogle Scholar
Cross Ref
- Wendy E. MacKay. 1999. Is Paper Safer? The Role of Paper Flight Strips in Air Traffic Control. ACM Trans. Comput.-Hum. Interact. 6, 4 (Dec. 1999), 311--340. https://doi.org/10.1145/331490.331491Google Scholar
Digital Library
- Manuel Mager, Ximena Gutierrez-Vasques, Gerardo Sierra, and Ivan Meza-Ruiz. 2018. Challenges of language technologies for the indigenous languages of the Americas. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, New Mexico, USA, 55--69. https://www.aclweb.org/anthology/C18--1006Google Scholar
- Nora McDonald, Sarita Schoenebeck, and Andrea Forte. 2019. Reliability and Inter-Rater Reliability in Qualitative Research: Norms and Guidelines for CSCW and HCI Practice. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (Nov. 2019), 1--23. https://doi.org/10.1145/3359174Google Scholar
Digital Library
- Jacob Metcalf and Kate Crawford. 2016. Where Are Human Subjects in Big Data Research? The Emerging Ethics Divide. Big Data & Society 3, 1 (Jan. 2016), 205395171665021. https://doi.org/10.1177/2053951716650211Google Scholar
- Mila Miceli, Tianling Yang, Laurens Naudts, Martin Schuessler, Diana-Alina Serbanescu, and Alex Hanna. 2021. Documenting Computer Vision Datasets: An Invitation to Reflexive Data Practices. In FAccT.Google Scholar
- Erwan Moreau, Carl Vogel, and Marguerite Barry. 2019. A Paradigm for Democratizing Artificial Intelligence Research. In Innovations in Big Data Mining and Embedded Knowledge, Anna Esposito, Antonietta M. Esposito, and Lakhmi C. Jain (Eds.). Springer International Publishing, Cham, 137--166. https://doi.org/10.1007/978--3-030--15939--9Google Scholar
- Michael Muller, Cecilia Aragon, Shion Guha, Marina Kogan, Gina Neff, Cathrine Seidelin, Katie Shilton, and Anissa Tanweer. 2020. Interrogating Data Science. In Conference Companion Publication of the 2020 on Computer Supported Cooperative Work and Social Computing. ACM, Virtual Event USA, 467--473. https://doi.org/10.1145/3406865.3418584Google Scholar
- Madhumita Murgia. 2019. Who's Using Your Face? The Ugly Truth about Facial Recognition. Financial Times (Sept. 2019).Google Scholar
- Michelle Murphy. 2017. The Economization of Life. Duke University Press, Durham ; London.Google Scholar
- Ihudiya Finda Ogbonnaya-Ogburu, Angela D.R. Smith, Alexandra To, and Kentaro Toyama. 2020. Critical Race Theory for HCI. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, Honolulu HI USA, 1--16. https://doi.org/10.1145/3313831.3376392Google Scholar
Digital Library
- Open Science Collaboration. 2015. Estimating the Reproducibility of Psychological Science. Science 349, 6251 (Aug. 2015), aac4716--aac4716. https://doi.org/10.1126/science.aac4716Google Scholar
- Irene V. Pasquetto, Bernadette M. Randles, and Christine L. Borgman. 2017. On the Reuse of Scientific Data. Data Science Journal 16 (March 2017), 8. https://doi.org/10.5334/dsj-2017-008Google Scholar
- Desmond Upton Patton, Philipp Blandfort, William R Frey, Michael B Gaskell, and Svebor Karaman. 2019. Annotating twitter data from vulnerable populations: Evaluating disagreement between domain experts and graduate student annotators. (2019).Google Scholar
- Amandalynne Paullada, Inioluwa Deborah Raji, Emily M. Bender, Emily Denton, and Alex Hanna. 2020. Data and Its (Dis)Contents: A Survey of Dataset Development and Use in Machine Learning Research. arXiv:2012.05345 [cs] (Dec. 2020). arXiv:2012.05345 [cs]Google Scholar
- Andrew Pickering. 2010. The Mangle of Practice: Time, Agency, and Science. University of Chicago Press.Google Scholar
- Matthew Pittman and Kim Sheehan. 2016. Amazon's Mechanical Turk a Digital Sweatshop? Transparency and Accountability in Crowdsourced Online Research. Journal of Media Ethics 31, 4 (Oct. 2016), 260--262. https://doi.org/10.1080/23736992.2016.1228811Google Scholar
Cross Ref
- Jason L Powell. 2015. 'Disciplining' Truth and Science: Michel Foucault and the Power of Social Science. World Scientific News 13 (2015), 15--29.Google Scholar
- Vinay Uday Prabhu and Abeba Birhane. 2020. Large Image Datasets: A Pyrrhic Win for Computer Vision? arXiv:2006.16923 [cs, stat] (July 2020). arXiv:2006.16923 [cs, stat]Google Scholar
- Inioluwa Deborah Raji, Morgan Klaus Scheuerman, and Razvan Amironesei. 2021. ?You Can't Sit With Us": Exclusionary Pedagogy in AI Ethics Education. In FAccT.Google Scholar
Digital Library
- Jemima Repo. 2015. The Biopolitics of Gender. Oxford University Press.Google Scholar
- Wendy Roldan, Xin Gao, Allison Marie Hishikawa, Tiffany Ku, Ziyue Li, Echo Zhang, Jon E. Froehlich, and Jason Yip. 2020. Opportunities and Challenges in Involving Users in Project-Based HCI Education. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, Honolulu HI USA, 1--15. https://doi.org/10.1145/3313831.3376530Google Scholar
Digital Library
- Gillian Rose. 1997. Situating Knowledges: Positionality, Reflexivities and Other Tactics. Progress in Human Geography 21, 3 (June 1997), 305--320. https://doi.org/10.1191/030913297673302122Google Scholar
- Ammon J. Salter and Ben R. Martin. 2001. The Economic Benefits of Publicly Funded Basic Research: A Critical Review. Research Policy 30, 3 (March 2001), 509--532. https://doi.org/10.1016/S0048--7333(00)00091--3Google Scholar
- Nithya Sambasivan, Shivani Kapania, Hannah Highfill, Diana Akrong, Parveen Paritosh, and Lora Aroyo. 2021. "Everyone Wants to Do the Model Work, Not the Data Work": Data Cascades in High-Stakes AI. In CHI.Google Scholar
- Morgan Klaus Scheuerman, Jacob M. Paul, and Jed R. Brubaker. 2019. How Computers See Gender: An Evaluation of Gender Classification in Commercial Facial Analysis Services. Proc. ACM Hum.-Comput. Interact. 3, CSCW, Article 144 (Nov. 2019). https://doi.org/10.1145/3359246Google Scholar
Digital Library
- Morgan Klaus Scheuerman, Katta Spiel, Oliver L Haimson, Foad Hamidi, and Stacy M Branham. 2020. HCI Guidelines for Gender Equity and Inclusivity. https://www.morgan-klaus.com/gender-guidelines.html.Google Scholar
- Morgan Klaus Scheuerman, Kandrea Wade, Caitlin Lustig, and Jed R. Brubaker. 2020. How We've Taught Algorithms to See Identity: Constructing Race and Gender in Image Databases for Facial Analysis. Proc. ACM Hum.-Comput. Interact. 4, CSCW1 (2020). https://doi.org/10.1145/3392866Google Scholar
- Christof Schöch. 2013. Big? Smart? Clean? Messy? Data in the Humanities. Journal of Digital Humanities 2, 3 (Dec. 2013), 2--13.Google Scholar
- James C. Scott. 2008. Seeing like a State: How Certain Schemes to Improve the Human Condition Have Failed (nachdr. ed.). Yale Univ. Press, New Haven, Conn.Google Scholar
- D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, and Dan Dennison. 2015. Hidden Technical Debt in Machine Learning Systems. In Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (Eds.), Vol. 28. Curran Associates, Inc., 2503--2511.Google Scholar
- Evan Selinger and Woodrow Hartzog. 2019. What Happens When Employers Can Read Your Facial Expressions? The New York Times (Oct. 2019).Google Scholar
- Shilad Sen, Margaret E. Giesel, Rebecca Gold, Benjamin Hillmann, Matt Lesicko, Samuel Naden, Jesse Russell, Zixiao (Ken) Wang, and Brent Hecht. 2015. Turkers, Scholars, "Arafat" and "Peace": Cultural Communities and Algorithmic Gold Standards. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM, Vancouver BC Canada, 826--838. https://doi.org/10.1145/2675133.2675285Google Scholar
Digital Library
- Phoebe Sengers, Kirsten Boehner, Shay David, and Joseph 'Jofish' Kaye. 2005. Reflective Design. In Proceedings of the 4th Decennial Conference on Critical Computing between Sense and Sensibility - CC '05. ACM Press, Aarhus, Denmark, 49. https://doi.org/10.1145/1094562.1094569Google Scholar
Digital Library
- Shreya Shankar, Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, and D. Sculley. 2017. No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World. arXiv:1711.08536 [stat] (Nov. 2017). arXiv:1711.08536 [stat]Google Scholar
- Ellen Simpson and Bryan Semaan. 2021. For You, or For"You"?: Everyday LGBTQ+ Encounters with TikTok. Proceedings of the ACM on Human-Computer Interaction 4, CSCW3 (Jan. 2021), 1--34. https://doi.org/10.1145/3432951Google Scholar
Digital Library
- Stephen C. Slota, Kenneth R. Fleischmann, Sherri Greenberg, Nitin Verma, Brenna Cummings, Lan Li, and Chris Shenefiel. 2020. Good Systems, Bad Data?: Interpretations of AI Hype and Failures. Proceedings of the Association for Information Science and Technology 57, 1 (2020), e275. https://doi.org/10.1002/pra2.275 arXiv:https://asistdl.onlinelibrary.wiley.com/doi/pdf/10.1002/pra2.275Google Scholar
Cross Ref
- R. Smith. 2001. Measuring the Social Impact of Research. BMJ 323, 7312 (Sept. 2001), 528--528. https://doi.org/10.1136/bmj.323.7312.528Google Scholar
Cross Ref
- Luke Stark. 2019. Facial Recognition Is the Plutonium of AI. XRDS 25, 3 (April 2019), 50--55. https://doi.org/10.1145/3313129Google Scholar
Digital Library
- Victoria Stodden, Matthew S. Krafczyk, and Adhithya Bhaskar. 2018. Enabling the Verification of Computational Results: An Empirical Evaluation of Computational Reproducibility. In Proceedings of the First International Workshop on Practical Reproducible Evaluation of Computer Systems. ACM, Tempe AZ USA, 1--5. https://doi.org/10.1145/3214239.3214242Google Scholar
Digital Library
- Victoria Stodden, Jennifer Seiler, and Zhaokun Ma. 2018. An Empirical Analysis of Journal Policy Effectiveness for Computational Reproducibility. Proceedings of the National Academy of Sciences 115, 11 (March 2018), 2584--2589. https://doi.org/10.1073/pnas.1708290115Google Scholar
Cross Ref
- Ann Laura Stoler. 1995. Race and the Education of Desire: Foucault's History of Sexuality and the Colonial Order of Things. Duke University Press. https://doi.org/10.1215/9780822377719Google Scholar
- Lucy Suchman. 1993. Do Categories Have Politics?: The Language/Action Perspective Reconsidered. Computer Supported Cooperative Work (CSCW) 2, 3 (Sept. 1993), 177--190. https://doi.org/10.1007/BF00749015Google Scholar
Cross Ref
- Jennyfer Lawrence Taylor, Alessandro Soro, Paul Roe, Anita Lee Hong, and Margot Brereton. 2017. Situational When: Designing for Time Across Cultures. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, Denver Colorado USA, 6461--6474. https://doi.org/10.1145/3025453.3025936Google Scholar
Digital Library
- Thomas Teo. 2014. Epistemological Violence. In Encyclopedia of Critical Psychology, Thomas Teo (Ed.). Springer New York, New York, NY, 593--596. https://doi.org/10.1007/978--1--4614--5583--7_441Google Scholar
- Diane Vaughan. 1999. The Role of the Organization in the Production of Techno-Scientific Knowledge. Social Studies of Science 29, 6 (1999), 913--943.Google Scholar
Cross Ref
- Janet Vertesi and Paul Dourish. 2011. The Value of Data: Considering the Context of Production in Data Economies. In Proceedings of the ACM 2011 Conference on Computer Supported Cooperative Work - CSCW '11. ACM Press, Hangzhou, China, 533. https://doi.org/10.1145/1958824.1958906Google Scholar
Digital Library
- Lewis Raven Wallace. 2019. The View from Somewhere: Undoing the Myth of Journalistic Objectivity. University of Chicago Press.Google Scholar
- Jonathan Stuart Ward and Adam Barker. 2013. Undefined By Data: A Survey of Big Data Definitions. arXiv:1309.5821 [cs] (Sept. 2013). arXiv:1309.5821 [cs]Google Scholar
- Vanessa Williamson. 2016. On the Ethics of Crowdsourced Research. PS: Political Science & Politics 49, 1 (2016), 77--81. https://doi.org/10.1017/S104909651500116XGoogle Scholar
Cross Ref
- Langdon Winner. 2020. The Whale and the Reactor: A Search for Limits in an Age of High Technology (second edition ed.). University of Chicago Press, Chicago.Google Scholar
- Qing Zhang, David Elsweiler, and Christoph Trattner. 2020. Visual Cultural Biases in Food Classification. Foods 9, 6 (June 2020), 823. https://doi.org/10.3390/foods9060823Google Scholar
Cross Ref
- Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2017. Men Also like Shopping: Reducing Gender Bias Amplification Using Corpus-Level Constraints. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 2979--2989. https://doi.org/10.18653/v1/D17--1323Google Scholar
Cross Ref
Index Terms
Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development
Recommendations
Documenting Computer Vision Datasets: An Invitation to Reflexive Data Practices
FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and TransparencyIn industrial computer vision, discretionary decisions surrounding the production of image training data remain widely undocumented. Recent research taking issue with such opacity has proposed standardized processes for dataset documentation. In this ...
From Human to Data to Dataset: Mapping the Traceability of Human Subjects in Computer Vision Datasets
CSCWComputer vision is a "data hungry" field. Researchers and practitioners who work on human-centric computer vision, like facial recognition, emphasize the necessity of vast amounts of data for more robust and accurate models. Humans are seen as a data ...
Envisioning Identity: The Social Production of Human-Centric Computer Vision Systems
CSCW'22 Companion: Companion Publication of the 2022 Conference on Computer Supported Cooperative Work and Social ComputingComputer vision technologies have been increasingly scrutinized in recent years for their propensity to cause harm. Human-centric computer vision, systems designed to interpret visual data about humans for a variety of tasks, are perceived as ...






Comments