skip to main content
10.1145/3442188.3445918acmconferencesArticle/Chapter ViewAbstractPublication PagesfacctConference Proceedingsconference-collections
research-article
Open access

Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure

Published: 01 March 2021 Publication History

Abstract

Datasets that power machine learning are often used, shared, and reused with little visibility into the processes of deliberation that led to their creation. As artificial intelligence systems are increasingly used in high-stakes tasks, system development and deployment practices must be adapted to address the very real consequences of how model development data is constructed and used in practice. This includes greater transparency about data, and accountability for decisions made when developing it. In this paper, we introduce a rigorous framework for dataset development transparency that supports decision-making and accountability. The framework uses the cyclical, infrastructural and engineering nature of dataset development to draw on best practices from the software development lifecycle. Each stage of the data development lifecycle yields documents that facilitate improved communication and decision-making, as well as drawing attention to the value and necessity of careful data work. The proposed framework makes visible the often overlooked work and decisions that go into dataset creation, a critical step in closing the accountability gap in artificial intelligence and a critical/necessary resource aligned with recent work on auditing processes.

References

[1]
Janet Abbate. 2012. Recoding gender: Women's changing participation in computing. MIT Press.
[2]
Alain Abran, James W Moore, Pierre Bourque, Robert Dupuis, and L Tripp. 2004. Software engineering body of knowledge. IEEE Computer Society, Angela Burgess (2004).
[3]
Irfan A Alvi. 2013. Engineers need to get real, but can't: The role of models. In Structures Congress 2013: Bridging Your Passion with Your Profession. 916--927.
[4]
Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 291--300.
[5]
Hadis Anahideh and Abolfazl Asudeh. 2020. Fair Active Learning. arXiv preprint arXiv:2001.01796 (2020).
[6]
Helmut K Anheier. 2017. Infrastructure and the principle of the hiding hand. The Governance of Infrastructure (2017), 63.
[7]
Itamar Arel. 2012. Deep reinforcement learning as foundation for artificial general intelligence. In Theoretical Foundations of Artificial General Intelligence. Springer, 89--102.
[8]
Matthew Arnold, Rachel KE Bellamy, Michael Hind, Stephanie Houde, Sameep Mehta, A Mojsilović, Ravi Nair, K Natesan Ramamurthy, Alexandra Olteanu, David Piorkowski, et al. 2019. FactSheets: Increasing trust in AI services through supplier's declarations of conformity. IBM Journal of Research and Development 63, 4/5 (2019), 6--1.
[9]
Rob Ashmore, Radu Calinescu, and Colin Paterson. 2019. Assuring the machine learning lifecycle: Desiderata, methods, and challenges. arXiv preprint arXiv:1905.04223.
[10]
Camera & Imaging Products Association et al. 2010. Exchangeable image file format for digital still cameras: Exif Version 2.3. CIPA DC-008 Translation-2010 (2010).
[11]
Marzieh Babaeianjelodar, Stephen Lorenz, Josh Gordon, Jeanna Matthews, and Evan Freitag. 2020. Quantifying Gender Bias in Different Corpora. In Companion Proceedings of the Web Conference 2020. 752--759.
[12]
Iain Barclay, Alun Preece, Ian Taylor, and Dinesh Verma. 2019. Towards Traceability in Data Ecosystems using a Bill of Materials Model. arXiv preprint arXiv:1904.04253 (2019).
[13]
Solon Barocas and Andrew D Selbst. 2016. Big data's disparate impact. Calif. L. Rev. 104 (2016), 671.
[14]
Denis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, Mustafa Ispir, Vihan Jain, Levent Koc, et al. 2017. TFX: A tensorflow-based production-scale machine learning platform. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1387--1395.
[15]
Peter M Bednar and Christine Welch. 2009. Contextual inquiry and requirements shaping. In Information Systems Development. Springer, 225--236.
[16]
Emily M Bender and Batya Friedman. 2018. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics 6 (2018), 587--604.
[17]
Keith H Bennett and Václav T Rajlich. 2000. Software maintenance and evolution: a roadmap. In Proceedings of the Conference on the Future of Software Engineering. 73--87.
[18]
Elena Beretta, Antonio Vetrò, Bruno Lepri, and Juan Carlos De Martin. 2018. Ethical and Socially-Aware Data Labels. In Annual International Symposium on Information Management and Big Data. Springer, 320--327.
[19]
Anandhi S Bharadwaj. 2000. A resource-based perspective on information technology capability and firm performance: an empirical investigation. MIS quarterly (2000), 169--196.
[20]
Douglas Biber. 1993. Representativeness in corpus design. Literary and linguistic computing 8, 4 (1993), 243--257.
[21]
Matthias Boehm, Iulian Antonov, Sebastian Baunsgaard, Mark Dokter, Robert Ginth ör, Kevin Innerebner, Florijan Klezin, Stefanie Lindstaedt, Arnab Phani, Benjamin Rath, Berthold Reinwald, Shafaq Siddiqi, and Sebastian Benjamin Wrede. 2019. SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle. arXiv preprint arXiv:1909.02976.
[22]
Geoffrey C Bowker and Susan Leigh Star. 2000. Sorting things out: Classification and its consequences. MIT press.
[23]
Eric Breck, Shanqing Cai, Eric Nielsen, Michael Salib, and D Sculley. 2017. The ML test score: A rubric for ML production readiness and technical debt reduction. In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 1123--1132.
[24]
Harvey Brooks. 1994. The relationship between science and technology. Research policy 23, 5 (1994), 477--486.
[25]
Richard Buchanan. 1992. Wicked problems in design thinking. Design issues 8, 2 (1992), 5--21.
[26]
William Bulleit, Jon Schmidt, Irfan Alvi, Erik Nelson, and Tonatiuh Rodriguez-Nikl. 2015. Philosophy of engineering: What it is and why it matters. Journal of Professional Issues in Engineering Education and Practice 141, 3 (2015), 02514003.
[27]
Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency. 77--91.
[28]
Kaylee Burns, Lisa Hendricks, Trevor Darrell, and Anna Rohrbach. 2018. Women also Snowboard: Overcoming Bias in Captioning Models. (03 2018).
[29]
Lawrence Busch. 2014. A dozen ways to get lost in translation: Inherent challenges in large scale data sets. International Journal of Communication 8 (2014), 18.
[30]
Luiz Fernando Capretz, Daniel Varona, and Arif Raza. 2015. Influence of personality types in software tasks choices. Computers in Human behavior 52 (2015), 373--378.
[31]
Supriyo Chakraborty, Richard Tomsett, Ramya Raghavendra, Daniel Harborne, Moustafa Alzantot, Federico Cerutti, Mani Srivastava, Alun Preece, Simon Julier, Raghuveer M Rao, et al. 2017. Interpretability of deep learning models: a survey of results. In 2017 IEEE Smart World, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (Smart-World/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, 1--6.
[32]
Danielle Citron and Frank Pasquale. 2014. The scored society: Due process for automated predictions. Washington Law Review 89 (03 2014), 1--33.
[33]
CMMI Institute. 2018. Patient Demographic Data Quality (PDDQ) Framework.
[34]
IEEE Standards Coordinating Committee et al. 1990. IEEE Standard Glossary of Software Engineering Terminology (IEEE Std 610.12-1990). Los Alamitos. CA: IEEE Computer Society 169 (1990).
[35]
Peter V Coveney, Edward R Dougherty, and Roger R Highfield. 2016. Big data need big theory too. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374, 2080 (2016), 20160153.
[36]
Kate Crawford, Mary L Gray, and Kate Miltner. 2014. Critiquing Big Data: Politics, ethics, epistemology (special section introduction). International Journal of Communication 8 (2014), 10.
[37]
Kate Crawford and Trevor Paglen. 2019. Excavating AI: The politics of images in machine learning training sets. Excavating AI (2019).
[38]
Thomas Davidson, Debasmita Bhattacharya, and Ingmar Weber. 2019. Racial bias in hate speech and abusive language detection datasets. arXiv preprint arXiv:1905.12516 (2019).
[39]
Ernest Davis. 2014. The limitations of standardized science tests as benchmarks for artificial intelligence research: Position paper. arXiv preprint arXiv:1411.1629 (2014).
[40]
Terrance de Vries, Ishan Misra, Changhan Wang, and Laurens van der Maaten. 2019. Does object recognition work for everyone?. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 52--59.
[41]
Emily Denton, Alex Hanna, Razvan Amironesei, Andrew Smart, Hilary Nicole, and Morgan Klaus Scheuerman. 2020. Bringing the People Back In: Contesting Benchmark Machine Learning Datasets. arXiv preprint arXiv:2007.07399 (2020).
[42]
Torgeir Dingsøyr. 2005. Postmortem reviews: purpose and approaches in software engineering. Information and Software Technology 47, 5 (2005), 293--303.
[43]
Lisa Ehrlinger, Elisa Rusz, and Wolfram Wöß. 2019. A survey of data quality measurement and monitoring tools. arXiv preprint arXiv:1907.08138 (2019).
[44]
Virginia Eubanks. 2018. Automating inequality: How high-tech tools profile, police, and punish the poor. St. Martin's Press.
[45]
Heinz Eulau. 2007. Crossroads of social science: The ICPSR 25th anniversary volume. Algora Publishing.
[46]
Bent Flyvbjerg and Cass R Sunstein. 2016. The principle of the malevolent hiding hand; or, the planning fallacy writ large. Social Research: An International Quarterly 83, 4 (2016), 979--1004.
[47]
Food, Drug Administration, et al. 2016. Data Integrity and Compliance With CGMP Guidance for Industry. Draft Guidance (2016).
[48]
US Food, Drug Administration, et al. 2018. Data Integrity and Compliance with Drug CGMP Questions and Answers Guidance for Industry.
[49]
Brett M Frischmann. 2012. Infrastructure: The social value of shared resources. Oxford University Press.
[50]
Jonathan Furner. 2016. "Data": The data. In Information cultures in the digital age. Springer, 287--306.
[51]
Nikhil Garg, Londa Schiebinger, Dan Jurafsky, and James Zou. 2018. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences 115, 16 (2018), E3635-E3644.
[52]
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2018. Datasheets for datasets. arXiv preprint arXiv:1803.09010 (2018).
[53]
R Stuart Geiger, Kevin Yu, Yanlai Yang, Mindy Dai, Jie Qiu, Rebekah Tang, and Jenny Huang. 2020. Garbage in, garbage out? do machine learning application papers in social computing report where human-labeled training data comes from?. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 325--336.
[54]
Dave Gershgorn. 2018. If AI is going to be the world's doctor, it needs better textbooks. https://qz.com/1367177/if-ai-is-going-to-be-theworlds-doctor-it-needs-better-textbooks. Accessed: 2020-09-19.
[55]
Rick O Gilmore, Karen E Adolph, and David S Millman. 2016. Curating identifiable data for sharing: The databrary project. In 2016 New York Scientific Data Summit (NYSDS). IEEE, 1--6. https://doi.org/10.1109/NYSDS.2016.7747817
[56]
Martin Glinz. 2011. A glossary of requirements engineering terminology. Standard Glossary of the Certified Professional for Requirements Engineering (CPRE) Studies and Exam, Version 1 (2011), 56.
[57]
Steven L Goldman. 2004. Why we need a philosophy of engineering: a work in progress. Interdisciplinary Science Reviews 29, 2 (2004), 163--176.
[58]
Steven L Goldman. 2010. Beyond satisficing: Design, trade offs and the rationality of engineering. In 2010 Forum on philosophy, engineering & technology.
[59]
Ben Green. 2020. Data science as political action: Grounding data science in a politics of justice. Available at SSRN 3658431 (2020).
[60]
Tristan Greene. 2020. 2010-2019: The rise of deep learning. https://thenextweb.com/artificial-intelligence/2020/01/02/2010- 2019-the-rise-of-deep-learning/. Accessed: 2020-09-26.
[61]
Ian Hacking, Jan Hacking, et al. 1983. Representing and intervening: Introductory topics in the philosophy of natural science. Cambridge University Press.
[62]
Alon Halevy, Peter Norvig, and Fernando Pereira. 2009. The unreasonable effectiveness of data. IEEE Intelligent Systems 24, 2 (2009), 8--12.
[63]
Brendan Hall and Kevin Driscoll. 2014. Distributed System Design Checklist. (2014).
[64]
Eldon C Hall. 1996. Journey to the moon: the history of the Apollo guidance computer. Aiaa.
[65]
Donna Haraway. 1988. Situated knowledges: The science question in feminism and the privilege of partial perspective. Feminist studies 14, 3 (1988), 575--599.
[66]
Deborah K Heikes. 2004. The bias paradox: why it's not just for feminists anymore. Synthese 138, 3 (2004), 315--335.
[67]
Benjamin Heinzerling. 2020. NLP's Clever Hans Moment has Arrived. Journal of Cognitive Science 21, 1 (2020), 159--167.
[68]
Tad Hirsch, Kritzia Merced, Shrikanth Narayanan, Zac E Imel, and David C Atkins. 2017. Designing contestability: Interaction design, machine learning, and mental health. In Proceedings of the 2017 Conference on Designing Interactive Systems. 95--99.
[69]
Anna Lauren Hoffmann and Raina Bloom. 2016. Digitizing Books, Obscuring Women's Work: Google Books, Librarians, and Ideologies of Access. Ada: A Journal of Gender, New Media, and Technology 9 (2016).
[70]
Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielinski. 2018. The dataset nutrition label: A framework to drive higher data quality standards. arXiv preprint arXiv:1805.03677 (2018).
[71]
Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudik, and Hanna Wallach. 2019. Improving fairness in machine learning systems: What do industry practitioners need?. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1--16.
[72]
Ben Hutchinson, Vinodkumar Prabhakaran, Emily Denton, Kellie Webster, Yu Zhong, and Stephen Denuyl. 2020. Social Biases in NLP Models as Barriers for Persons with Disabilities. ACL (2020).
[73]
Lilly C Irani and M Six Silberman. 2013. Turkopticon: Interrupting worker invisibility in amazon mechanical turk. In Proceedings of the SIGCHI conference on human factors in computing systems. 611--620.
[74]
Abigail Z Jacobs and Hanna Wallach. 2019. Measurement and fairness. arXiv preprint arXiv:1912.05511 (2019).
[75]
Eun Seo Jo and Timnit Gebru. 2020. Lessons from archives: strategies for collecting sociocultural data in machine learning. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 306--316.
[76]
Jacint Jordana. 2017. Accountability Challenges in the Governance of Infrastructure. The Governance of Infrastructure (2017), 43.
[77]
Taehee Jung, Dongyeop Kang, Lucas Mentch, and Eduard Hovy. 2019. Earlier Isn't Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization. arXiv preprint arXiv:1908.11723 (2019).
[78]
Daniel Khashabi, Snigdha Chaturvedi, Michael Roth, Shyam Upadhyay, and Dan Roth. 2018. Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 252--262.
[79]
Adam Kilgarriff and Gregory Grefenstette. 2003. Introduction to the special issue on the web as corpus. Computational linguistics 29, 3 (2003), 333--347.
[80]
Gary King. 2007. An introduction to the dataverse network as an infrastructure for data sharing.
[81]
Gary Klein. 2007. Performing a project premortem. Harvard business review 85, 9 (2007), 18--19.
[82]
Billy Vaughn Koen. 2003. Discussion of The Method: Conducting the Engineer's Approach to Problem Solving.
[83]
Nitin Kohli, Renata Barreto, and Joshua A Kroll. 2018. Translation tutorial: a shared lexicon for research and practice in human-centered software systems. In 1st Conference on Fairness, Accountability, and Transparancy. New York, NY, USA, Vol. 7.
[84]
Tobias Krafft, Marc Hauer, Lajla Fetic, Andreas Kaminski, Michael Puntschuh, Philipp Otto, Christoph Hubig, Torsten Fleischer, Paul Grünke, Rafaela Hillerbrand, Carla Hustedt, and Sebastian Hallensleben. 2020. From Principles to Practice - An interdisciplinary framework to operationalise AI ethics.
[85]
Jonathan Krause, Hailin Jin, Jianchao Yang, and Li Fei-Fei. 2015. Fine-grained recognition without part annotations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5546--5555.
[86]
Hiroshi Kuwajima, Hirotoshi Yasuoka, and Toshihiro Nakae. 2020. Engineering problems in machine learning systems. Machine Learning (2020), 1--24.
[87]
Stuart N Lane. 2020. Editorial 2020 Part II: Data from nowhere? Earth Surface Processes and Landforms 45, 1 (2020), 5--10.
[88]
Brian Larkin. 2013. The politics and poetics of infrastructure. Annual review of anthropology 42 (2013), 327--343.
[89]
Bruno Latour. 1987. Science in action: How to follow scientists and engineers through society. Harvard university press.
[90]
Bruno Latour and Steve Woolgar. 2013. Laboratory life: The construction of scientific facts. Princeton University Press.
[91]
Alexander Lavin. 2020. Machine Learning Is No Place To "Move Fast And Break Things". https://www.forbes.com/sites/alexanderlavin/2020/02/17/machine-learning-is-no-place-to-move-fast-and-break-things/#2bfee96d15f2. Accessed: 2020-09-19.
[92]
Sabina Leonelli. 2020. Scientific Research and Big Data. https://plato.stanford.edu/entries/science-big-data/. Accessed: 2020-10-04.
[93]
Jian Liu, Leyang Cui, Hanmeng Liu, Dandan Huang, Yile Wang, and Yue Zhang. 2020. LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning. arXiv preprint arXiv:2007.08124 (2020).
[94]
Xiaoxuan Liu, Livia Faes, Aditya U Kale, Siegfried K Wagner, Dun Jack Fu, Alice Bruynseels, Thushika Mahendiran, Gabriella Moraes, Mohith Shamdas, Christoph Kern, et al. 2019. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. The lancet digital health 1, 6 (2019), e271-e297.
[95]
Eric Livingston. 1999. Cultures of proving. Social studies of science 29, 6 (1999), 867--888.
[96]
David Loshin. 2010. Master data management. Morgan Kaufmann.
[97]
David Loshin. 2010. The practitioner's guide to data quality improvement. Elsevier.
[98]
Noella Mackenzie and Sally Knipe. 2006. Research dilemmas: Paradigms, methods and methodology. Issues in educational research 16, 2 (2006), 193--205.
[99]
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency. 220--229.
[100]
Brent Daniel Mittelstadt, Patrick Allo, Mariarosaria Taddeo, Sandra Wachter, and Luciano Floridi. 2016. The ethics of algorithms: Mapping the debate. Big Data & Society 3, 2 (2016), 2053951716679679.
[101]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).
[102]
James Mullenbach, Jonathan Gordon, Nanyun Peng, and Jonathan May. 2019. Do Nuclear Submarines Have Nuclear Captains? A Challenge Dataset for Commonsense Reasoning over Adjectives and Objects. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 6054--6060.
[103]
Deirdre K Mulligan, Daniel Kluttz, and Nitin Kohli. 2019. Shaping Our Tools: Contestability as a Means to Promote Responsible Algorithmic Decision Making in the Professions. Available at SSRN 3311894 (2019).
[104]
Thomas Nagel. 1989. The view from nowhere. Oxford University Press.
[105]
Gina Neff, Anissa Tanweer, Brittany Fiore-Gartland, and Laura Osburn. 2017. Critique and contribute: A practice-based framework for improving critical data studies and data science. Big data 5, 2 (2017), 85--97.
[106]
Helen Nissenbaum. 1996. Accountability ina computerized society. Science and engineering ethics 2, 1 (1996), 25--42.
[107]
Roy Osherove. 2015. The art of unit testing. MITP-Verlags GmbH & Co. KG.
[108]
Irene V Pasquetto, Bernadette M Randles, and Christine L Borgman. 2017. On the reuse of scientific data. Data Science Journal 16 (2017), 8.
[109]
Samir Passi and Solon Barocas. 2019. Problem formulation and fairness. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 39--48.
[110]
Ron Patton. 2006. Software testing. Pearson Education India.
[111]
Tomas Petricek. 2019. Cultures of programming. (2019). unpublished.
[112]
Heather A Piwowar and Todd J Vision. 2013. Data reuse and the open data citation advantage. PeerJ 1 (2013), e175.
[113]
Neoklis Polyzotis, Sudip Roy, Steven Euijong Whang, and Martin Zinkevich. 2018. Data lifecycle challenges in production machine learning: a survey. ACM SIGMOD Record 47, no. 2 (2018), 17--28.
[114]
Vinay Uday Prabhu and Abeba Birhane. 2020. Large image datasets: A pyrrhic win for computer vision? arXiv preprint arXiv:2006.16923 (2020).
[115]
Inioluwa Deborah Raji, Andrew Smart, Rebecca N White, Margaret Mitchell, Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron, and Parker Barnes. 2020. Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 33--44.
[116]
Inioluwa Deborah Raji and Jingying Yang. 2019. ABOUT ML: Annotation and Benchmarking on Understanding and Transparency of Machine Learning Lifecycles. arXiv preprint arXiv:1912.06166 (2019).
[117]
Ari Ramkilowan. 2018. The rise and rise of AI in Africa. https://medium.com/datadriveninvestor/the-rise-and-rise-of-ai-in-africa-a6cf6bf89217. Accessed: 2020-09-26.
[118]
Matthew Richardson, Christopher J.C. Burges, and Erin Renshaw. 2013. MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, Washington, USA, 193--203. https://www.aclweb.org/anthology/D13-1020
[119]
Rashida Richardson, Jason M Schultz, and Kate Crawford. 2019. Dirty data, bad predictions: How civil rights violations impact police data, predictive policing systems, and justice. NYUL Rev. Online 94 (2019), 15.
[120]
Horst WJ Rittel and Melvin M Webber. 1973. Dilemmas in a general theory of planning. Policy sciences 4, 2 (1973), 155--169.
[121]
Yuji Roh, Geon Heo, and Steven Euijong Whang. 2019. A survey on data collection for machine learning: a big data-AI integration perspective. IEEE Transactions on Knowledge and Data Engineering (2019).
[122]
Jeanne W Ross, Cynthia Mathis Beath, and Dale L Goodhue. 1996. Develop long-term competitiveness through IT assets. Sloan management review 38, 1 (1996), 31--42.
[123]
Gilbert Ryle. 1945. Knowing how and knowing that: The presidential address. In Proceedings of the Aristotelian society, Vol. 46. JSTOR, 1--16.
[124]
SE Sachs. 2019. The algorithm at work? Explanation and repair in the enactment of similarity in art data. Information, Communication & Society (2019), 1--17.
[125]
Nithya Sambasivan, Diana Akrong, Hannah Highfill, Lora Mois Aroyo, Praveen Kumar Paritosh, and Shivani Kapania. 2021. "Everyone wants to do the model work, not the data work": Data Cascades in High-Stakes AI. In Proceedings of CHI 2021.
[126]
Morgan Klaus Scheuerman, Kandrea Wade, Caitlin Lustig, and Jed R. Brubaker. 2020. How We've Taught Algorithms to See Identity: Constructing Race and Gender in Image Databases for Facial Analysis. Proc. ACM Hum.-Comput. Interact. 4, CSCW1, Article 058 (May 2020), 35 pages. https://doi.org/10.1145/3392866
[127]
David Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden technical debt in machine learning systems. In Advances in neural information processing systems. 2503--2511.
[128]
David Sculley, Jasper Snoek, Alex Wiltschko, and Ali Rahimi. 2018. Winner's curse? On pace, progress, and empirical rigor. (2018).
[129]
Andrew D Selbst, Danah Boyd, Sorelle A Friedler, Suresh Venkatasubramanian, and Janet Vertesi. 2019. Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 59--68.
[130]
Burr Settles. 2009. Active learning literature survey. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.
[131]
Shreya Shankar, Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, and D Sculley. 2017. No classification without representation: Assessing geodiversity issues in open data sets for the developing world. arXiv preprint arXiv:1711.08536 (2017).
[132]
David Shoemaker. 2011. Attributability, answerability, and accountability: Toward a wider theory of moral responsibility. Ethics 121, 3 (2011), 602--632.
[133]
David Shoemaker. 2015. Responsibility from the Margins. Oxford University Press, USA.
[134]
Herbert A Simon. 1973. The structure of ill structured problems. Artificial intelligence 4, 3-4 (1973), 181--201.
[135]
Andrew Smart, Larry James, Ben Hutchinson, Simone Wu, and Shannon Vallor. 2020. Why Reliabilism Is not Enough: Epistemic and Moral Justification in Machine Learning. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 372--377.
[136]
Katherine Stasaski, Grace Hui Yang, and Marti A Hearst. 2020. More Diverse Dialogue Datasets via Diversity-Informed Data Collection. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 4958--4968.
[137]
Victoria Stodden and Sheila Miguez. 2014. Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research. Journal of Open Research Software 2, 1 (2014).
[138]
Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta. 2017. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE international conference on computer vision. 843--852.
[139]
Herb Sutter and Andrei Alexandrescu. 2004. C++ coding standards: 101 rules, guidelines, and best practices. Pearson Education.
[140]
Astra Taylor. 2018. The automation charade. Logic Magazine (2018).
[141]
Jennifer Wortman Vaughan. 2018. Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research. Journal of Machine Learning Research 18, 193 (2018), 1--46. http://jmlr.org/papers/v18/17-234.html
[142]
Andreas Vogelsang and Markus Borg. 2019. Requirements Engineering for Machine Learning: Perspectives from Data Scientists. In 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW). IEEE, 245--251.
[143]
Joel Walmsley. 2020. Artificial intelligence and the value of transparency. AI & SOCIETY (2020), 1--11.
[144]
Kai Wegrich and Gerhard Hammerschmid. 2017. Infrastructure governance as political choice. The governance of infrastructure (2017), 21--42.
[145]
Michael Weisberg. 2012. Simulation and similarity: Using models to understand the world. Oxford University Press.
[146]
Chris Welty, Praveen Paritosh, and Lora Aroyo. 2019. Metrology for AI: From Benchmarks to Instruments. arXiv preprint arXiv:1911.01875 (2019).
[147]
Maranke Wieringa. 2020. What to account for when accounting for algorithms: a systematic literature review on algorithmic accountability. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 1--18.
[148]
Adina Williams, Nikita Nangia, and Samuel Bowman. 2018. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 1112--1122. https://doi.org/10.18653/v1/N18-1101
[149]
Langdon Winner. 1980. Do artifacts have politics? Daedalus (1980), 121--136.
[150]
Richard Xiao. 2010. Corpus creation. Handbook of Natural Language Processing (2n Revised edition) (2010), 147--165.
[151]
Semih Yagcioglu, Aykut Erdem, Erkut Erdem, and Nazli Ikizler-Cinbis. 2018. RecipeQA: A challenge dataset for multimodal comprehension of cooking recipes. arXiv preprint arXiv:1809.00812 (2018).
[152]
Yi Yang, Wen-tau Yih, and Christopher Meek. 2015. WikiQA: A Challenge Dataset for Open-Domain Question Answering. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, 2013--2018. https://doi.org/10.18653/v1/D15-1237
[153]
Matei Zaharia, Andrew Chen, Aaron Davidson, Ali Ghodsi, Sue Ann Hong, Andy Konwinski, Siddharth Murching, Tomas Nykodym, Paul Ogilvie, Mani Parkhe, Fen Xie, and Corey Zumaret. 2018. Accelerating the Machine Learning Lifecycle with MLflow. In IEEE Data Eng. Bull. 41, no. 4. 39--45.
[154]
Rowan Zellers, Yonatan Bisk, Roy Schwartz, and Yejin Choi. 2018. Swag: A large-scale adversarial dataset for grounded commonsense inference. arXiv preprint arXiv:1808.05326 (2018).
[155]
Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. R@2017. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. arXiv preprint arXiv:1707.09457 (2017).
[156]
Junji Zhi, Vahid Garousi-Yusifoğlu, Bo Sun, Golara Garousi, Shawn Shahnewaz, and Guenther Ruhe. 2015. Cost, benefits and quality of software development documentation: A systematic mapping. Journal of Systems and Software 99 (2015), 175--198.
[157]
Matthew Zook, Solon Barocas, Danah Boyd, Kate Crawford, Emily Keller, Seeta Peña Gangadharan, Alyssa Goodman, Rachelle Hollander, Barbara A Koenig, Jacob Metcalf, et al. 2017. Ten simple rules for responsible big data research.

Cited By

View all
  • (2024)Research on Artificial Intelligence-Assisted Software Test Automation MethodsApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-28749:1Online publication date: 9-Oct-2024
  • (2024)Toward Fairness, Accountability, Transparency, and Ethics in AI for Social Media and Health Care: Scoping ReviewJMIR Medical Informatics10.2196/5004812(e50048)Online publication date: 3-Apr-2024
  • (2024)From Model Performance to Claim: How a Change of Focus in Machine Learning Replicability Can Help Bridge the Responsibility GapSSRN Electronic Journal10.2139/ssrn.4806609Online publication date: 2024
  • Show More Cited By

Index Terms

  1. Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency
      March 2021
      899 pages
      ISBN:9781450383097
      DOI:10.1145/3442188
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 March 2021

      Check for updates

      Author Tags

      1. datasets
      2. machine learning
      3. requirements engineering

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      FAccT '21
      Sponsor:

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2,525
      • Downloads (Last 6 weeks)254
      Reflects downloads up to 11 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Research on Artificial Intelligence-Assisted Software Test Automation MethodsApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-28749:1Online publication date: 9-Oct-2024
      • (2024)Toward Fairness, Accountability, Transparency, and Ethics in AI for Social Media and Health Care: Scoping ReviewJMIR Medical Informatics10.2196/5004812(e50048)Online publication date: 3-Apr-2024
      • (2024)From Model Performance to Claim: How a Change of Focus in Machine Learning Replicability Can Help Bridge the Responsibility GapSSRN Electronic Journal10.2139/ssrn.4806609Online publication date: 2024
      • (2024)When Being a Data Annotator Was Not Yet a Job: The Laboratory Origins of Dispersible Labor in Computer Vision ResearchSocius: Sociological Research for a Dynamic World10.1177/2378023124125961710Online publication date: 24-Jun-2024
      • (2024)Marginalized measures: The harmonization of diversity in precision medicine researchSocial Studies of Science10.1177/03063127241288498Online publication date: 7-Oct-2024
      • (2024)Analyzing Dataset Annotation Quality Management in the WildComputational Linguistics10.1162/coli_a_0051650:3(817-866)Online publication date: 1-Sep-2024
      • (2024)Social-sum-Mal: A Dataset for Abstractive Text Summarization in MalayalamACM Transactions on Asian and Low-Resource Language Information Processing10.1145/369610723:11(1-20)Online publication date: 21-Nov-2024
      • (2024)Making AI Work: Tracing Human Labour in the Supply Chains of Dataset ProductionCompanion Publication of the 2024 Conference on Computer-Supported Cooperative Work and Social Computing10.1145/3678884.3682055(47-49)Online publication date: 11-Nov-2024
      • (2024)Understanding the Perceptions and Practices of the Machine Learning Professionals in BangladeshCompanion Publication of the 2024 Conference on Computer-Supported Cooperative Work and Social Computing10.1145/3678884.3681920(647-652)Online publication date: 11-Nov-2024
      • (2024)Documenting Ethical Considerations in Open Source AI ModelsProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686679(177-188)Online publication date: 24-Oct-2024
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media