skip to main content
research-article

The New Reality of Reproducibility: The Role of Data Work in Scientific Research

Authors Info & Claims
Published:29 May 2020Publication History
Skip Abstract Section

Abstract

Although reproducibility--the idea that a valid scientific experiment can be repeated with similar results--is integral to our understanding of good scientific practice, it has remained a difficult concept to define precisely. Across scientific disciplines, the increasing prevalence of large datasets, and the computational techniques necessary to manage and analyze those datasets, has prompted new ways of thinking about reproducibility. We present findings from a qualitative study of a NSF--funded two-week workshop developed to introduce an interdisciplinary group of domain scientists to data-management techniques for data-intensive computing, with a focus on reproducible science. Our findings suggest that the introduction of data-related activities promotes a new understanding of reproducibility as a mechanism for local knowledge transfer and collaboration, particularly as regards efficient software reuse.

References

  1. Karen Schepeler Baker. 2017. Data work configurations in the field-based natural sciences: mesoscale infrastructures, project collectives, and data gateways. Ph.D. Dissertation. University of Illinois at Urbana-Champaign.Google ScholarGoogle Scholar
  2. C Glenn Begley, Alastair M Buchan, and Ulrich Dirnagl. 2015. Robust research: Institutions must do their part for reproducibility. Nature, Vol. 525, 7567 (2015), 25--27.Google ScholarGoogle Scholar
  3. Matthew J Bietz, Eric PS Baumer, and Charlotte P Lee. 2010. Synergizing in cyberinfrastructure development. Computer Supported Cooperative Work (CSCW), Vol. 19, 3--4 (2010), 245--281.Google ScholarGoogle ScholarCross RefCross Ref
  4. Jeremy P Birnholtz and Matthew J Bietz. 2003. Data at work: supporting sharing in science and engineering. In Proceedings of the 2003 international ACM SIGGROUP conference on Supporting group work. 339--348.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Christine L Borgman. 2015. Big data, little data, no data: Scholarship in the networked world .MIT press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Daniel Carter and Dan Sholler. 2016. Data science on the ground: Hype, criticism, and everyday work. Journal of the Association for Information Science and Technology, Vol. 67, 10 (2016), 2309--2319.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Nancy Cartwright. 1991. Replicability, reproducibility, and robustness: comments on Harry Collins. History of Political Economy, Vol. 23, 1 (1991), 143--155.Google ScholarGoogle ScholarCross RefCross Ref
  8. Kathy Charmaz. 2014. Constructing grounded theory. sage.Google ScholarGoogle Scholar
  9. Lyra J Colfer and Carliss Y Baldwin. 2016. The mirroring hypothesis: theory, evidence, and exceptions. Industrial and Corporate Change, Vol. 25, 5 (2016), 709--738.Google ScholarGoogle ScholarCross RefCross Ref
  10. Open Science Collaboration et al. 2015. Estimating the reproducibility of psychological science. Science, Vol. 349, 6251 (2015), aac4716.Google ScholarGoogle Scholar
  11. Harry M Collins. 1985. Replicating the TEA-Laser: Maintaining scientific knowledge. Ders.: Changing order. Replication and induction in scientific practice. London/Beverly Hills: Sage Publications (1985).Google ScholarGoogle Scholar
  12. Mark J Costello. 2009. Motivating online publication of data. BioScience, Vol. 59, 5 (2009), 418--427.Google ScholarGoogle ScholarCross RefCross Ref
  13. Jonathon N Cummings and Sara Kiesler. 2008. Who collaborates successfully? Prior experience reduces collaboration barriers in distributed interdisciplinary research. In Proceedings of the 2008 ACM conference on Computer supported cooperative work. 437--446.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Renata Goncc alves Curty, Kevin Crowston, Alison Specht, Bruce W Grant, and Elizabeth D Dalton. 2017. Attitudes and norms affecting scientists' data reuse. PloS one, Vol. 12, 12 (2017).Google ScholarGoogle Scholar
  15. DataONE. 2018. DataONE. https://www.dataone.org/what-dataone. Accessed: 2019-03-05.Google ScholarGoogle Scholar
  16. Susan Dominus. 2017. When the revolution came for Amy Cuddy. The New York Times (2017), 29.Google ScholarGoogle Scholar
  17. Benedikt Fecher, Sascha Friesike, and Marcel Hebing. 2015. What drives academic data sharing? PloS one, Vol. 10, 2 (2015).Google ScholarGoogle Scholar
  18. Sebastian S Feger, Sünje Dallmeier-Tiessen, Albrecht Schmidt, and Paweł W Wo'zniak. 2019. Designing for Reproducibility: A Qualitative Study of Challenges and Opportunities in High Energy Physics. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jeremy Freese and David Peterson. 2017. Replication in social science. Annual Review of Sociology, Vol. 43 (2017), 147--165.Google ScholarGoogle ScholarCross RefCross Ref
  20. Jeremy Freese and David Peterson. 2018. The emergence of statistical objectivity: changing ideas of epistemic vice and virtue in science. Sociological theory, Vol. 36, 3 (2018), 289--313.Google ScholarGoogle Scholar
  21. Carole Goble, David De Roure, and Sean Bechhofer. 2011. Accelerating scientists' knowledge turns. In International joint conference on knowledge discovery, knowledge engineering, and knowledge management. Springer, 3--25.Google ScholarGoogle Scholar
  22. Steven N Goodman, Daniele Fanelli, and John PA Ioannidis. 2016. What does research reproducibility mean? Science translational medicine, Vol. 8, 341 (2016), 341ps12--341ps12.Google ScholarGoogle Scholar
  23. Jim Gray, David T Liu, Maria Nieto-Santisteban, Alex Szalay, David J DeWitt, and Gerd Heber. 2005. Scientific data management in the coming decade. Acm Sigmod Record, Vol. 34, 4 (2005), 34--41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Melissa A Haendel, Nicole A Vasilevsky, and Jacqueline A Wirz. 2012. Dealing with data: A case study on information and data management literacy. PLoS biology, Vol. 10, 5 (2012).Google ScholarGoogle Scholar
  25. Stephanie E Hampton, Carly A Strasser, Joshua J Tewksbury, Wendy K Gram, Amber E Budden, Archer L Batcheller, Clifford S Duke, and John H Porter. 2013. Big data and the future of ecology. Frontiers in Ecology and the Environment, Vol. 11, 3 (2013), 156--162.Google ScholarGoogle ScholarCross RefCross Ref
  26. Tony Hey, Stewart Tansley, Kristin Tolle, et al. 2009. The fourth paradigm: data-intensive scientific discovery. Vol. 1. Microsoft research Redmond, WA.Google ScholarGoogle Scholar
  27. James Howison and Julia Bullard. 2016. Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature. Journal of the Association for Information Science and Technology, Vol. 67, 9 (2016), 2137--2155.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. PARSE Insight. 2009. Insight into digital preservation of research output in Europe: Survey report.Google ScholarGoogle Scholar
  29. John PA Ioannidis, Marcus R Munafo, Paolo Fusar-Poli, Brian A Nosek, and Sean P David. 2014. Publication and other reporting biases in cognitive sciences: detection, prevalence, and prevention. Trends in cognitive sciences, Vol. 18, 5 (2014), 235--241.Google ScholarGoogle Scholar
  30. Steven J Jackson and Sarah Barbrow. 2013. Infrastructure and vocation: field, calling and computation in ecology. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 2873--2882.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Steven J Jackson, Stephanie B Steinhardt, and Ayse Buyuktur. 2013. Why CSCW needs science policy (and vice versa). In Proceedings of the 2013 conference on Computer supported cooperative work. 1113--1124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sean Kandel, Andreas Paepcke, Joseph M Hellerstein, and Jeffrey Heer. 2012. Enterprise data analysis and visualization: An interview study. IEEE Transactions on Visualization and Computer Graphics, Vol. 18, 12 (2012), 2917--2926.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Helena Karasti and Jeanette Blomberg. 2018. Studying infrastructuring ethnographically. Computer Supported Cooperative Work (CSCW), Vol. 27, 2 (2018), 233--265.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E John, and Brad A Myers. 2018. The story in the notebook: Exploratory data science using a literate programming tool. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1--11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. 2016. The emerging role of data scientists on software development teams. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE, 96--107.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Justin Kitzes, Daniel Turek, and Fatma Deniz. 2017. The practice of reproducible research: case studies and lessons from the data-intensive sciences .Univ of California Press.Google ScholarGoogle Scholar
  37. Sabina Leonelli. 2016. Data-centric biology: A philosophical study .University of Chicago Press.Google ScholarGoogle Scholar
  38. George E Marcus. 1998. Ethnography through thick and thin .Princeton University Press.Google ScholarGoogle Scholar
  39. David R Millen. 2000. Rapid ethnography: time deepening strategies for HCI field research. In Proceedings of the 3rd conference on Designing interactive systems: processes, practices, methods, and techniques. 280--286.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q Vera Liao, Casey Dugan, and Thomas Erickson. 2019. How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1--15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Gina Neff, Anissa Tanweer, Brittany Fiore-Gartland, and Laura Osburn. 2017. Critique and contribute: A practice-based framework for improving critical data studies and data science. Big data, Vol. 5, 2 (2017), 85--97.Google ScholarGoogle Scholar
  42. Brian A Nosek, George Alter, George C Banks, Denny Borsboom, Sara D Bowman, Steven J Breckler, Stuart Buck, Christopher D Chambers, Gilbert Chin, Garret Christensen, et al. 2015. Promoting an open research culture. Science, Vol. 348, 6242 (2015), 1422--1425.Google ScholarGoogle Scholar
  43. Irene V Pasquetto, Bernadette M Randles, and Christine L Borgman. 2017. On the reuse of scientific data. (2017).Google ScholarGoogle Scholar
  44. Samir Passi and Steven J Jackson. 2018. Trust in Data Science: Collaboration, Translation, and Accountability in Corporate Data Science Projects. Proceedings of the ACM on Human-Computer Interaction, Vol. 2, CSCW (2018), 1--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Gregory Piatetsky and Preet Ghandi. 2018. How many Data Scientists are there and is there a Shortage. KDNuggets (2018).Google ScholarGoogle Scholar
  46. Heather A Piwowar. 2011. Who shares? Who doesn't? Factors associated with openly archiving raw research data. PloS one, Vol. 6, 7 (2011).Google ScholarGoogle Scholar
  47. David Ribes. 2014. Ethnography of scaling, or, how to a fit a national research infrastructure in the room. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. 158--170.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Dominique G Roche, Robert Lanfear, Sandra A Binning, Tonya M Haff, Lisa E Schwanz, Kristal E Cain, Hanna Kokko, Michael D Jennions, and Loeske EB Kruuk. 2014. Troubleshooting public data archiving: suggestions to increase participation. PLoS biology, Vol. 12, 1 (2014).Google ScholarGoogle Scholar
  49. Caroline J Savage and Andrew J Vickers. 2009. Empirical study of data sharing by authors publishing in PLoS journals. PloS one, Vol. 4, 9 (2009), e7078.Google ScholarGoogle ScholarCross RefCross Ref
  50. Stefan Schmidt. 2009. Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of general psychology, Vol. 13, 2 (2009), 90--100.Google ScholarGoogle Scholar
  51. Dan Sholler, Sara Stoudt, Chris Kennedy, Fernando Hoces de la Guardia, Francois Lanusse, Karthik Ram, Kellie Ottoboni, Marla Stuart, Maryam Vareth, Nelle Varoquaux, et al. 2019. Resistance to Adoption of Best Practices. (2019).Google ScholarGoogle Scholar
  52. Victoria Stodden, Peixuan Guo, and Zhaokun Ma. 2013. Toward reproducible computational research: an empirical analysis of data and code policy adoption by journals. PloS one, Vol. 8, 6 (2013).Google ScholarGoogle Scholar
  53. Victoria Stodden and Sheila Miguez. 2013. Best practices for computational science: Software infrastructure and environments for reproducible and extensible research. Available at SSRN 2322276 (2013).Google ScholarGoogle Scholar
  54. Carol Tenopir, Suzie Allard, Kimberly Douglass, Arsev Umur Aydinoglu, Lei Wu, Eleanor Read, Maribeth Manoff, and Mike Frame. 2011. Data sharing by scientists: practices and perceptions. PloS one, Vol. 6, 6 (2011), e21101.Google ScholarGoogle Scholar
  55. Carol Tenopir, Elizabeth D Dalton, Suzie Allard, Mike Frame, Ivanka Pjesivac, Ben Birch, Danielle Pollock, and Kristina Dorsett. 2015. Changes in data sharing and data reuse practices and perceptions among scientists worldwide. PloS one, Vol. 10, 8 (2015).Google ScholarGoogle Scholar
  56. Erik H Trainer, Chalalai Chaihirunkarn, Arun Kalyanasundaram, and James D Herbsleb. 2015. From personal tool to community resource: What's the extra work and who will do it?. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. 417--430.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Jillian C Wallis, Elizabeth Rolando, and Christine L Borgman. 2013. If we share data, will anyone use them? Data sharing and reuse in the long tail of science and technology. PloS one, Vol. 8, 7 (2013).Google ScholarGoogle ScholarCross RefCross Ref
  58. Jelte M Wicherts, Marjan Bakker, and Dylan Molenaar. 2011. Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results. PloS one, Vol. 6, 11 (2011).Google ScholarGoogle ScholarCross RefCross Ref
  59. Greg Wilson, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K Teal. 2017. Good enough practices in scientific computing. PLoS computational biology, Vol. 13, 6 (2017).Google ScholarGoogle Scholar
  60. Ann S Zimmerman. 2008. New knowledge from old data: The role of standards in the sharing and reuse of ecological data. Science, Technology, & Human Values, Vol. 33, 5 (2008), 631--652.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. The New Reality of Reproducibility: The Role of Data Work in Scientific Research

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!