Abstract
Although reproducibility--the idea that a valid scientific experiment can be repeated with similar results--is integral to our understanding of good scientific practice, it has remained a difficult concept to define precisely. Across scientific disciplines, the increasing prevalence of large datasets, and the computational techniques necessary to manage and analyze those datasets, has prompted new ways of thinking about reproducibility. We present findings from a qualitative study of a NSF--funded two-week workshop developed to introduce an interdisciplinary group of domain scientists to data-management techniques for data-intensive computing, with a focus on reproducible science. Our findings suggest that the introduction of data-related activities promotes a new understanding of reproducibility as a mechanism for local knowledge transfer and collaboration, particularly as regards efficient software reuse.
- Karen Schepeler Baker. 2017. Data work configurations in the field-based natural sciences: mesoscale infrastructures, project collectives, and data gateways. Ph.D. Dissertation. University of Illinois at Urbana-Champaign.Google Scholar
- C Glenn Begley, Alastair M Buchan, and Ulrich Dirnagl. 2015. Robust research: Institutions must do their part for reproducibility. Nature, Vol. 525, 7567 (2015), 25--27.Google Scholar
- Matthew J Bietz, Eric PS Baumer, and Charlotte P Lee. 2010. Synergizing in cyberinfrastructure development. Computer Supported Cooperative Work (CSCW), Vol. 19, 3--4 (2010), 245--281.Google Scholar
Cross Ref
- Jeremy P Birnholtz and Matthew J Bietz. 2003. Data at work: supporting sharing in science and engineering. In Proceedings of the 2003 international ACM SIGGROUP conference on Supporting group work. 339--348.Google Scholar
Digital Library
- Christine L Borgman. 2015. Big data, little data, no data: Scholarship in the networked world .MIT press.Google Scholar
Digital Library
- Daniel Carter and Dan Sholler. 2016. Data science on the ground: Hype, criticism, and everyday work. Journal of the Association for Information Science and Technology, Vol. 67, 10 (2016), 2309--2319.Google Scholar
Digital Library
- Nancy Cartwright. 1991. Replicability, reproducibility, and robustness: comments on Harry Collins. History of Political Economy, Vol. 23, 1 (1991), 143--155.Google Scholar
Cross Ref
- Kathy Charmaz. 2014. Constructing grounded theory. sage.Google Scholar
- Lyra J Colfer and Carliss Y Baldwin. 2016. The mirroring hypothesis: theory, evidence, and exceptions. Industrial and Corporate Change, Vol. 25, 5 (2016), 709--738.Google Scholar
Cross Ref
- Open Science Collaboration et al. 2015. Estimating the reproducibility of psychological science. Science, Vol. 349, 6251 (2015), aac4716.Google Scholar
- Harry M Collins. 1985. Replicating the TEA-Laser: Maintaining scientific knowledge. Ders.: Changing order. Replication and induction in scientific practice. London/Beverly Hills: Sage Publications (1985).Google Scholar
- Mark J Costello. 2009. Motivating online publication of data. BioScience, Vol. 59, 5 (2009), 418--427.Google Scholar
Cross Ref
- Jonathon N Cummings and Sara Kiesler. 2008. Who collaborates successfully? Prior experience reduces collaboration barriers in distributed interdisciplinary research. In Proceedings of the 2008 ACM conference on Computer supported cooperative work. 437--446.Google Scholar
Digital Library
- Renata Goncc alves Curty, Kevin Crowston, Alison Specht, Bruce W Grant, and Elizabeth D Dalton. 2017. Attitudes and norms affecting scientists' data reuse. PloS one, Vol. 12, 12 (2017).Google Scholar
- DataONE. 2018. DataONE. https://www.dataone.org/what-dataone. Accessed: 2019-03-05.Google Scholar
- Susan Dominus. 2017. When the revolution came for Amy Cuddy. The New York Times (2017), 29.Google Scholar
- Benedikt Fecher, Sascha Friesike, and Marcel Hebing. 2015. What drives academic data sharing? PloS one, Vol. 10, 2 (2015).Google Scholar
- Sebastian S Feger, Sünje Dallmeier-Tiessen, Albrecht Schmidt, and Paweł W Wo'zniak. 2019. Designing for Reproducibility: A Qualitative Study of Challenges and Opportunities in High Energy Physics. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1--14.Google Scholar
Digital Library
- Jeremy Freese and David Peterson. 2017. Replication in social science. Annual Review of Sociology, Vol. 43 (2017), 147--165.Google Scholar
Cross Ref
- Jeremy Freese and David Peterson. 2018. The emergence of statistical objectivity: changing ideas of epistemic vice and virtue in science. Sociological theory, Vol. 36, 3 (2018), 289--313.Google Scholar
- Carole Goble, David De Roure, and Sean Bechhofer. 2011. Accelerating scientists' knowledge turns. In International joint conference on knowledge discovery, knowledge engineering, and knowledge management. Springer, 3--25.Google Scholar
- Steven N Goodman, Daniele Fanelli, and John PA Ioannidis. 2016. What does research reproducibility mean? Science translational medicine, Vol. 8, 341 (2016), 341ps12--341ps12.Google Scholar
- Jim Gray, David T Liu, Maria Nieto-Santisteban, Alex Szalay, David J DeWitt, and Gerd Heber. 2005. Scientific data management in the coming decade. Acm Sigmod Record, Vol. 34, 4 (2005), 34--41.Google Scholar
Digital Library
- Melissa A Haendel, Nicole A Vasilevsky, and Jacqueline A Wirz. 2012. Dealing with data: A case study on information and data management literacy. PLoS biology, Vol. 10, 5 (2012).Google Scholar
- Stephanie E Hampton, Carly A Strasser, Joshua J Tewksbury, Wendy K Gram, Amber E Budden, Archer L Batcheller, Clifford S Duke, and John H Porter. 2013. Big data and the future of ecology. Frontiers in Ecology and the Environment, Vol. 11, 3 (2013), 156--162.Google Scholar
Cross Ref
- Tony Hey, Stewart Tansley, Kristin Tolle, et al. 2009. The fourth paradigm: data-intensive scientific discovery. Vol. 1. Microsoft research Redmond, WA.Google Scholar
- James Howison and Julia Bullard. 2016. Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature. Journal of the Association for Information Science and Technology, Vol. 67, 9 (2016), 2137--2155.Google Scholar
Digital Library
- PARSE Insight. 2009. Insight into digital preservation of research output in Europe: Survey report.Google Scholar
- John PA Ioannidis, Marcus R Munafo, Paolo Fusar-Poli, Brian A Nosek, and Sean P David. 2014. Publication and other reporting biases in cognitive sciences: detection, prevalence, and prevention. Trends in cognitive sciences, Vol. 18, 5 (2014), 235--241.Google Scholar
- Steven J Jackson and Sarah Barbrow. 2013. Infrastructure and vocation: field, calling and computation in ecology. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 2873--2882.Google Scholar
Digital Library
- Steven J Jackson, Stephanie B Steinhardt, and Ayse Buyuktur. 2013. Why CSCW needs science policy (and vice versa). In Proceedings of the 2013 conference on Computer supported cooperative work. 1113--1124.Google Scholar
Digital Library
- Sean Kandel, Andreas Paepcke, Joseph M Hellerstein, and Jeffrey Heer. 2012. Enterprise data analysis and visualization: An interview study. IEEE Transactions on Visualization and Computer Graphics, Vol. 18, 12 (2012), 2917--2926.Google Scholar
Digital Library
- Helena Karasti and Jeanette Blomberg. 2018. Studying infrastructuring ethnographically. Computer Supported Cooperative Work (CSCW), Vol. 27, 2 (2018), 233--265.Google Scholar
Digital Library
- Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E John, and Brad A Myers. 2018. The story in the notebook: Exploratory data science using a literate programming tool. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1--11.Google Scholar
Digital Library
- Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. 2016. The emerging role of data scientists on software development teams. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE, 96--107.Google Scholar
Digital Library
- Justin Kitzes, Daniel Turek, and Fatma Deniz. 2017. The practice of reproducible research: case studies and lessons from the data-intensive sciences .Univ of California Press.Google Scholar
- Sabina Leonelli. 2016. Data-centric biology: A philosophical study .University of Chicago Press.Google Scholar
- George E Marcus. 1998. Ethnography through thick and thin .Princeton University Press.Google Scholar
- David R Millen. 2000. Rapid ethnography: time deepening strategies for HCI field research. In Proceedings of the 3rd conference on Designing interactive systems: processes, practices, methods, and techniques. 280--286.Google Scholar
Digital Library
- Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q Vera Liao, Casey Dugan, and Thomas Erickson. 2019. How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1--15.Google Scholar
Digital Library
- Gina Neff, Anissa Tanweer, Brittany Fiore-Gartland, and Laura Osburn. 2017. Critique and contribute: A practice-based framework for improving critical data studies and data science. Big data, Vol. 5, 2 (2017), 85--97.Google Scholar
- Brian A Nosek, George Alter, George C Banks, Denny Borsboom, Sara D Bowman, Steven J Breckler, Stuart Buck, Christopher D Chambers, Gilbert Chin, Garret Christensen, et al. 2015. Promoting an open research culture. Science, Vol. 348, 6242 (2015), 1422--1425.Google Scholar
- Irene V Pasquetto, Bernadette M Randles, and Christine L Borgman. 2017. On the reuse of scientific data. (2017).Google Scholar
- Samir Passi and Steven J Jackson. 2018. Trust in Data Science: Collaboration, Translation, and Accountability in Corporate Data Science Projects. Proceedings of the ACM on Human-Computer Interaction, Vol. 2, CSCW (2018), 1--28.Google Scholar
Digital Library
- Gregory Piatetsky and Preet Ghandi. 2018. How many Data Scientists are there and is there a Shortage. KDNuggets (2018).Google Scholar
- Heather A Piwowar. 2011. Who shares? Who doesn't? Factors associated with openly archiving raw research data. PloS one, Vol. 6, 7 (2011).Google Scholar
- David Ribes. 2014. Ethnography of scaling, or, how to a fit a national research infrastructure in the room. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. 158--170.Google Scholar
Digital Library
- Dominique G Roche, Robert Lanfear, Sandra A Binning, Tonya M Haff, Lisa E Schwanz, Kristal E Cain, Hanna Kokko, Michael D Jennions, and Loeske EB Kruuk. 2014. Troubleshooting public data archiving: suggestions to increase participation. PLoS biology, Vol. 12, 1 (2014).Google Scholar
- Caroline J Savage and Andrew J Vickers. 2009. Empirical study of data sharing by authors publishing in PLoS journals. PloS one, Vol. 4, 9 (2009), e7078.Google Scholar
Cross Ref
- Stefan Schmidt. 2009. Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of general psychology, Vol. 13, 2 (2009), 90--100.Google Scholar
- Dan Sholler, Sara Stoudt, Chris Kennedy, Fernando Hoces de la Guardia, Francois Lanusse, Karthik Ram, Kellie Ottoboni, Marla Stuart, Maryam Vareth, Nelle Varoquaux, et al. 2019. Resistance to Adoption of Best Practices. (2019).Google Scholar
- Victoria Stodden, Peixuan Guo, and Zhaokun Ma. 2013. Toward reproducible computational research: an empirical analysis of data and code policy adoption by journals. PloS one, Vol. 8, 6 (2013).Google Scholar
- Victoria Stodden and Sheila Miguez. 2013. Best practices for computational science: Software infrastructure and environments for reproducible and extensible research. Available at SSRN 2322276 (2013).Google Scholar
- Carol Tenopir, Suzie Allard, Kimberly Douglass, Arsev Umur Aydinoglu, Lei Wu, Eleanor Read, Maribeth Manoff, and Mike Frame. 2011. Data sharing by scientists: practices and perceptions. PloS one, Vol. 6, 6 (2011), e21101.Google Scholar
- Carol Tenopir, Elizabeth D Dalton, Suzie Allard, Mike Frame, Ivanka Pjesivac, Ben Birch, Danielle Pollock, and Kristina Dorsett. 2015. Changes in data sharing and data reuse practices and perceptions among scientists worldwide. PloS one, Vol. 10, 8 (2015).Google Scholar
- Erik H Trainer, Chalalai Chaihirunkarn, Arun Kalyanasundaram, and James D Herbsleb. 2015. From personal tool to community resource: What's the extra work and who will do it?. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. 417--430.Google Scholar
Digital Library
- Jillian C Wallis, Elizabeth Rolando, and Christine L Borgman. 2013. If we share data, will anyone use them? Data sharing and reuse in the long tail of science and technology. PloS one, Vol. 8, 7 (2013).Google Scholar
Cross Ref
- Jelte M Wicherts, Marjan Bakker, and Dylan Molenaar. 2011. Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results. PloS one, Vol. 6, 11 (2011).Google Scholar
Cross Ref
- Greg Wilson, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K Teal. 2017. Good enough practices in scientific computing. PLoS computational biology, Vol. 13, 6 (2017).Google Scholar
- Ann S Zimmerman. 2008. New knowledge from old data: The role of standards in the sharing and reuse of ecological data. Science, Technology, & Human Values, Vol. 33, 5 (2008), 631--652.Google Scholar
Cross Ref
Index Terms
The New Reality of Reproducibility: The Role of Data Work in Scientific Research
Recommendations
Replicability and Reproducibility of a Schema Evolution Study in Embedded Databases
Advances in Conceptual ModelingAbstractAscertaining the feasibility of independent falsification or repetition of published results is vital to the scientific process, and replication or reproduction experiments are routinely performed in many disciplines. Unfortunately, such studies ...
Scientific Tests and Continuous Integration Strategies to Enhance Reproducibility in the Scientific Software Context
P-RECS '19: Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer SystemsContinuous integration (CI) is a well-established technique in commercial and open-source software projects, although not routinely used in scientific publishing. In the scientific software context, CI can serve two functions to increase reproducibility ...
Research data explored: an extended analysis of citations and altmetrics
In this study, we explore the citedness of research data, its distribution over time and its relation to the availability of a digital object identifier (DOI) in the Thomson Reuters database Data Citation Index (DCI). We investigate if cited research ...






Comments