Abstract
Effective collaboration in data science can leverage domain expertise from each team member and thus improve the quality and efficiency of the work. Computational notebooks give data scientists a convenient interactive solution for sharing and keeping track of the data exploration process through a combination of code, narrative text, visualizations, and other rich media. In this paper, we report how synchronous editing in computational notebooks changes the way data scientists work together compared to working on individual notebooks. We first conducted a formative survey with 195 data scientists to understand their past experience with collaboration in the context of data science. Next, we carried out an observational study of 24 data scientists working in pairs remotely to solve a typical data science predictive modeling problem, working on either notebooks supported by synchronous groupware or individual notebooks in a collaborative setting. The study showed that working on the synchronous notebooks improves collaboration by creating a shared context, encouraging more exploration, and reducing communication costs. However, the current synchronous editing features may lead to unbalanced participation and activity interference without strategic coordination. The synchronous notebooks may also amplify the tension between quick exploration and clear explanations. Building on these findings, we propose several design implications aimed at better supporting collaborative editing in computational notebooks, and thus improving efficiency in teamwork among data scientists.
Supplemental Material
Available for Download
- Ronald M Baecker, Dimitrios Nastos, Ilona R Posner, and Kelly L Mawby. 1995. The user-centred iterative design of collaborative writing software. In Readings in Human-Computer Interaction. Elsevier, 775--782.Google Scholar
- Andrew Begel. 2008. Effecting Change: Coordination in Large-scale Software Development. In Proceedings of the 2008 International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE '08). ACM, New York, NY, USA, 17--20. https://doi.org/10.1145/1370114.1370119Google Scholar
Digital Library
- Jeremy Birnholtz, Stephanie Steinhardt, and Antonella Pavese. 2013. Write Here, Write Now!: An Experimental Study of Group Maintenance in Collaborative Writing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13). ACM, New York, NY, USA, 961--970. https://doi.org/10.1145/2470654.2466123Google Scholar
Digital Library
- Netflix Technology Blog. 2018. Beyond Interactive: Notebook Innovation at Netflix. https://medium.com/netflix-techblog/notebook-innovation-591ee3221233Google Scholar
- Sallyann Bryant, Pablo Romero, and Benedict du Boulay. 2008. Pair programming and the mysterious role of the navigator. International Journal of Human-Computer Studies, Vol. 66, 7 (2008), 519 -- 529. https://doi.org/10.1016/j.ijhcs.2007.03.005 Collaborative and social aspects of software development.Google Scholar
Digital Library
- Yan Chen, Sang Won Lee, Yin Xie, YiWei Yang, Walter S. Lasecki, and Steve Oney. 2017. Codeon: On-Demand Software Development Assistance. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 6220--6231. https://doi.org/10.1145/3025453.3025972Google Scholar
Digital Library
- Yan Chen, Steve Oney, and Walter S. Lasecki. 2016. Towards Providing On-Demand Expert Support for Software Developers. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA, 3192--3203. https://doi.org/10.1145/2858036.2858512Google Scholar
- Matthew Conlen and Jeffrey Heer. 2018. Idyll: A Markup Language for Authoring and Publishing Interactive Articles on the Web. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology (UIST '18). ACM, New York, NY, USA, 977--989. https://doi.org/10.1145/3242587.3242600Google Scholar
Digital Library
- Juliet Corbin and Anselm Strauss. 2008. Basics of qualitative research: Techniques and procedures for developing grounded theory, 3rd ed. Sage Publications, Inc. https://doi.org/10.4135/9781452230153Google Scholar
- Gabriele D'Angelo, Angelo Di Iorio, and Stefano Zacchiroli. 2018. Spacetime Characterization of Real-Time Collaborative Editing. Proc. ACM Hum.-Comput. Interact., Vol. 2, CSCW, Article 41 (Nov. 2018), 19 pages. https://doi.org/10.1145/3274310Google Scholar
Digital Library
- Thomas H. Davenport and D. J. Patil. 2012. Data Scientist: The Sexiest Job of the 21st Century. (2012). Issue October 2012. https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-centuryGoogle Scholar
- Prasun Dewan and John Riedl. 1993. Toward computer-supported concurrent software engineering. Computer, Vol. 26, 1 (Jan 1993), 17--27. https://doi.org/10.1109/2.179149Google Scholar
Digital Library
- David Donoho. 2017. 50 Years of Data Science. Journal of Computational and Graphical Statistics, Vol. 26, 4 (2017), 745--766. https://doi.org/10.1080/10618600.2017.1384734Google Scholar
- Paul Dourish and Victoria Bellotti. 1992. Awareness and Coordination in Shared Workspaces. In Proceedings of the 1992 ACM Conference on Computer-supported Cooperative Work (CSCW '92). ACM, New York, NY, USA, 107--114. https://doi.org/10.1145/143457.143468Google Scholar
Digital Library
- Hongfei Fan, Chengzheng Sun, and Haifeng Shen. 2012. ATCoPE: Any-time Collaborative Programming Environment for Seamless Integration of Real-time and Non-real-time Teamwork in Software Development. In Proceedings of the 17th ACM International Conference on Supporting Group Work (GROUP '12). ACM, New York, NY, USA, 107--116. https://doi.org/10.1145/2389176.2389194Google Scholar
Digital Library
- John C Flanagan. 1954. The critical incident technique. Psychological bulletin, Vol. 51, 4 (1954), 327.Google Scholar
- Gregg Stanley Foster. 1986. Collaborative Systems and Multi-user Interfaces. Ph.D. Dissertation. AAI8717981.Google Scholar
- Max Goldman. 2010. Test-driven Roles for Pair Programming. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 2 (ICSE '10). ACM, New York, NY, USA, 515--516. https://doi.org/10.1145/1810295.1810458Google Scholar
- Max Goldman. 2012. Software Development with Real-time Collaborative Editing. Ph.D. Dissertation. Cambridge, MA, USA. AAI0829066.Google Scholar
- Max Goldman, Greg Little, and Robert C. Miller. 2011. Real-time Collaborative Coding in a Web IDE. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST '11). ACM, New York, NY, USA, 155--164. https://doi.org/10.1145/2047196.2047215Google Scholar
- Philip J. Guo. 2012. Software tools to facilitate research programming. Ph.D. Dissertation. Stanford University Stanford, CA.Google Scholar
- Philip J. Guo. 2015. Codeopticon: Real-Time, One-To-Many Human Tutoring for Computer Programming. In Proceedings of the 28th Annual ACM Symposium on User Interface Software and Technology (UIST '15). ACM, New York, NY, USA, 599--608. https://doi.org/10.1145/2807442.2807469Google Scholar
Digital Library
- Philip J. Guo and Margo Seltzer. 2012. BURRITO: Wrapping Your Lab Notebook in Computational Infrastructure. In Proceedings of the 4th USENIX Conference on Theory and Practice of Provenance (TaPP'12). USENIX Association, 7--7. http://dl.acm.org/citation.cfm?id=2342875.2342882Google Scholar
- Carl Gutwin and Saul Greenberg. 1998. Effects of Awareness Support on Groupware Usability. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '98). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 511--518. https://doi.org/10.1145/274644.274713Google Scholar
Digital Library
- Carl Gutwin and Saul Greenberg. 2002. A Descriptive Framework of Workspace Awareness for Real-Time Groupware. Computer Supported Cooperative Work (CSCW), Vol. 11, 3 (01 Sep 2002), 411--446. https://doi.org/10.1023/A:1021271517844Google Scholar
Digital Library
- Caroline Haythornthwaite. 2005. Introduction: Computer-Mediated Collaborative Practices., Vol. 10, 4 (2005). https://doi.org/10.1111/j.1083--6101.2005.tb00274.xGoogle Scholar
- Andrew Head, Fred Hohman, Titus Barik, Steven M. Drucker, and Robert DeLine. 2019. Managing Messes in Computational Notebooks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). ACM, New York, NY, USA, Article 270, 12 pages. https://doi.org/10.1145/3290605.3300500Google Scholar
Digital Library
- Jeffery Heer and Maneesh Agrawala. 2007. Design Considerations for Collaborative Visual Analytics. In 2007 IEEE Symposium on Visual Analytics Science and Technology. 171--178. https://doi.org/10.1109/VAST.2007.4389011Google Scholar
- James D. Herbsleb and Audris Mockus. 2003. Formulation and Preliminary Test of an Empirical Theory of Coordination in Software Engineering. SIGSOFT Softw. Eng. Notes, Vol. 28, 5 (Sept. 2003), 138--137. https://doi.org/10.1145/949952.940091Google Scholar
Digital Library
- Petra Isenberg, Niklas Elmqvist, Jean Scholtz, Daniel Cernea, Kwan-Liu Ma, and Hans Hagen. 2011. Collaborative visualization: Definition, challenges, and research agenda. Information Visualization, Vol. 10, 4 (2011), 310--326. https://doi.org/10.1177/1473871611412817 https://doi.org/10.1109/MCSE.2007.53Google Scholar
Digital Library
- Jeffrey M. Perkel. 2018. Why Jupyter is data scientists' computational notebook of choice. Nature, Vol. 563 (2018), 145. https://doi.org/10.1038/d41586-018-07196--1Google Scholar
Cross Ref
- Ilona R. Posner and Ron Baecker. 1992. How people write together (groupware). In Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences, Vol. iv. 127--138 vol.4. https://doi.org/10.1109/HICSS.1992.183420Google Scholar
- Bernadette M. Randles, Irene V. Pasquetto, Milena S. Golshan, and Christine L. Borgman. 2017. Using the Jupyter Notebook as a Tool for Open Science: An Empirical Study. In 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL). 1--2. https://doi.org/10.1109/JCDL.2017.7991618Google Scholar
Cross Ref
- Adam Rule, Ian Drosos, Aurélien Tabard, and James D. Hollan. 2018a. Aiding Collaborative Reuse of Computational Notebooks with Annotated Cell Folding. Proc. ACM Hum.-Comput. Interact., Vol. 2 (2018), 150:1--150:12. Issue CSCW. https://doi.org/10.1145/3274419Google Scholar
- Adam Rule, Aurélien Tabard, and James D. Hollan. 2018b. Exploration and Explanation in Computational Notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 32, 12 pages. https://doi.org/10.1145/3173574.3173606Google Scholar
- Adam Carl Rule. 2018. Design and Use of Computational Notebooks. Ph.D. Dissertation. University of California San Diege.Google Scholar
- Helen Sharp, Robert Biddle, Phil Gray, Lynn Miller, and Jeff Patton. 2006. Agile Development: Opportunity or Fad?. In CHI '06 Extended Abstracts on Human Factors in Computing Systems (CHI EA '06). ACM, New York, NY, USA, 32--35. https://doi.org/10.1145/1125451.1125461Google Scholar
Digital Library
- Aurélien Tabard, Wendy E. Mackay, and Evelyn Eastmond. 2008. From Individual to Collaborative: The Evolution of Prism, a Hybrid Laboratory Notebook. In Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work (CSCW '08). ACM, 569--578. https://doi.org/10.1145/1460563.1460653Google Scholar
Digital Library
- M. Rita Thissen, Jean M. Page, Madhavi C. Bharathi, and Toyia L. Austin. 2007. Communication Tools for Distributed Software Development Teams. In Proceedings of the 2007 ACM SIGMIS CPR Conference on Computer Personnel Research: The Global Information Technology Workforce (SIGMIS CPR '07). ACM, New York, NY, USA, 28--35. https://doi.org/10.1145/1235000.1235007Google Scholar
- Darja vSmite, Nils Brede Moe, and Richard Torkar. 2008. Pitfalls in Remote Team Coordination: Lessons Learned from a Case Study. In Proceedings of the 9th International Conference on Product-Focused Software Process Improvement (PROFES '08). Springer-Verlag, Berlin, Heidelberg, 345--359. https://doi.org/10.1007/978--3--540--69566-0_28Google Scholar
Digital Library
- Dakuo Wang, Haodan Tan, and Tun Lu. 2017. Why Users Do Not Want to Write Together When They Are Writing Together: Users' Rationales for Today's Collaborative Writing Practices. Proc. ACM Hum.-Comput. Interact., Vol. 1, CSCW, Article 107 (Dec. 2017), 18 pages. https://doi.org/10.1145/3134742Google Scholar
Digital Library
- Jeremy Warner and Philip J. Guo. 2017. CodePilot: Scaffolding End-to-End Collaborative Software Development for Novice Programmers. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 1136--1141. https://doi.org/10.1145/3025453.3025876Google Scholar
- Judith D. Wilson, Nathan Hoskin, and John T. Nosek. 1993. The Benefits of Collaboration for Student Programmers. In Proceedings of the Twenty-fourth SIGCSE Technical Symposium on Computer Science Education (SIGCSE '93). ACM, New York, NY, USA, 160--164. https://doi.org/10.1145/169070.169383Google Scholar
- Soobin Yim, Dakuo Wang, Judith Olson, Viet Vu, and Mark Warschauer. 2017. Synchronous Collaborative Writing in the Classroom: Undergraduates' Collaboration Practices and Their Impact on Writing Style, Quality, and Quantity. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW '17). ACM, New York, NY, USA, 468--479. https://doi.org/10.1145/2998181.2998356Google Scholar
Digital Library
- Xiong Zhang and Philip J. Guo. 2017. DS.Js: Turn Any Webpage into an Example-Centric Live Programming Environment for Learning Data Science. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (UIST '17). ACM, New York, NY, USA, 691--702. https://doi.org/10.1145/3126594.3126663Google Scholar
Index Terms
How Data Scientists Use Computational Notebooks for Real-Time Collaboration
Recommendations
Exploration and Explanation in Computational Notebooks
CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing SystemsComputational notebooks combine code, visualizations, and text in a single document. Researchers, data analysts, and even journalists are rapidly adopting this new medium. We present three studies of how they are using notebooks to document and share ...
What's Wrong with Computational Notebooks? Pain Points, Needs, and Design Opportunities
CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing SystemsComputational notebooks - such as Azure, Databricks, and Jupyter - are a popular, interactive paradigm for data scientists to author code, analyze data, and interleave visualizations, all within a single document. Nevertheless, as data scientists ...
Eliciting Best Practices for Collaboration with Computational Notebooks
CSCW1Despite the widespread adoption of computational notebooks, little is known about best practices for their usage in collaborative contexts. In this paper, we fill this gap by eliciting a catalog of best practices for collaborative data science with ...






Comments