skip to main content
research-article
Open Access

Orienting, Framing, Bridging, Magic, and Counseling: How Data Scientists Navigate the Outer Loop of Client Collaborations in Industry and Academia

Authors Info & Claims
Published:18 October 2021Publication History
Skip Abstract Section

Abstract

Data scientists often collaborate with clients to analyze data to meet a client's needs. What does the end-to-end workflow of a data scientist's collaboration with clients look like throughout the lifetime of a project? To investigate this question, we interviewed ten data scientists (5 female, 4 male, 1 non-binary) in diverse roles across industry and academia. We discovered that they work with clients in a six-stage outer-loop workflow, which involves 1) laying groundwork by building trust before a project begins, 2) orienting to the constraints of the client's environment, 3) collaboratively framing the problem, 4) bridging the gap between data science and domain expertise, 5) the inner loop of technical data analysis work, 6) counseling to help clients emotionally cope with analysis results. This novel outer-loop workflow contributes to CSCW by expanding the notion of what collaboration means in data science beyond the widely-known inner-loop technical workflow stages of acquiring, cleaning, analyzing, modeling, and visualizing data. We conclude by discussing the implications of our findings for data science education, parallels to design work, and unmet needs for tool development.

References

  1. Sara Alspaugh, Nava Zokaei, Andrea Liu, Cindy Jin, and Marti A. Hearst. 2019. Futzing and Moseying: Interviews with Professional Data Analysts on Exploration Practices. IEEE Transactions on Visualization and Computer Graphics 25, 1 (Jan. 2019), 22--31. https://doi.org/10.1109/TVCG.2018.2865040Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alex Ball. [n.d.]. Review of Data Management Lifecycle Models.Google ScholarGoogle Scholar
  3. Johan Kaj Blomkvist, Johan Persson, and Johan Åberg. 2015. Communication through Boundary Objects in Distributed Agile Teams. Association for Computing Machinery, New York, NY, USA, 1875--1884. https://doi.org/10.1145/2702123.2702366Google ScholarGoogle Scholar
  4. Irwin D. J. Bross. 1974. The Role of the Statistician: Scientist or Shoe Clerk. The American Statistician 28, 4 (1974), 126--127. https://doi.org/10.1080/00031305.1974.10479092Google ScholarGoogle Scholar
  5. Joohee Choi and Yla Tausczik. 2017. Characteristics of Collaboration in the Emerging Practice of Open Data Analysis. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (Portland, Oregon, USA) (CSCW '17). Association for Computing Machinery, New York, NY, USA, 835--846. https://doi.org/10.1145/2998181.2998265Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Herbert H. Clark and Susan E. Brennan. 1991. Grounding in Communication. In Perspectives on Socially Shared Cognition, L.B. Resnick, J.M. Levine, and S.D. Teasley (Eds.). American Psychological Association, 127--149.Google ScholarGoogle Scholar
  7. Juliet M. Corbin and Anselm L. Strauss. 2008. Basics of qualitative research: techniques and procedures for developing grounded theory. SAGE Publications, Inc.Google ScholarGoogle Scholar
  8. Nigel Cross. 2011. Design Thinking: Understanding How Designers Think and Work. Bloomsbury.Google ScholarGoogle ScholarCross RefCross Ref
  9. Nigel Cross. 2018. Expertise in Professional Design (2 ed.). Cambridge University Press, 372--388. https://doi.org/10.1017/9781316480748.021Google ScholarGoogle Scholar
  10. James Densmore. 2017. There are two types of data scientists -- and two types of problems to solve. https://medium.com/@jamesdensmore/there-are-two-types-of-data-scientists-and-two-types-of-problems-to-solve-a149a0148e64. Accessed: 2020--10--10.Google ScholarGoogle Scholar
  11. Conor Dewey. 2018. An Ode to the Type A Data Scientist. Towards Data Science -- https://towardsdatascience.com/ode-to-the-type-a-data-scientist-78d11456019. Accessed: 2020--10--10.Google ScholarGoogle Scholar
  12. David Donoho. 2017. 50 Years of Data Science. Journal of Computational and Graphical Statistics 26, 4 (2017), 745--766. https://doi.org/10.1080/10618600.2017.1384734 arXiv:https://doi.org/10.1080/10618600.2017.1384734Google ScholarGoogle Scholar
  13. Paul Dourish. 2001. Process Descriptions as Organisational Accounting Devices: The Dual Use of Workflow Technologies. In Proceedings of the 2001 International ACM SIGGROUP Conference on Supporting Group Work (Boulder, Colorado, USA) (GROUP '01). Association for Computing Machinery, New York, NY, USA, 52--60. https://doi.org/10.1145/500286.500297Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ian Drosos, Titus Barik, Philip J. Guo, Robert DeLine, and Sumit Gulwani. 2020. Wrex: A Unified Programming-by-Example Interaction for Synthesizing Readable Code for Data Scientists. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI '20). Association for Computing Machinery, New York, NY, USA, 1--12. https://doi.org/10.1145/3313831.3376442Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. J. Finney. 1982. The questioning statistician. Statistics in Medicine 1, 1 (1982), 5--13. https://doi.org/10.1002/sim.4780010103 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.4780010103Google ScholarGoogle ScholarCross RefCross Ref
  16. Cristina Gallego Gómez and Consuelo Puchades Ruiz. 2016. The Inclusion of Methodologies User Experience in the Consulting Industry: An Approach to the Experience of Capgemini. In Proceedings of the XVII International Conference on Human Computer Interaction (Salamanca, Spain) (Interacción '16). Association for Computing Machinery, New York, NY, USA, Article 24, 2 pages. https://doi.org/10.1145/2998626.2998635Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Alicia A Grandey. 2000. Emotion regulation in the workplace: a new way to conceptualize emotional labour. Journal of Occupational Health Psychology 5 (2000), 95--100.Google ScholarGoogle ScholarCross RefCross Ref
  18. Philip J. Guo. 2012. Software Tools to Facilitate Research Programming. Ph.D. Dissertation. Stanford University.Google ScholarGoogle Scholar
  19. Philip J. Guo, Sean Kandel, Joseph M. Hellerstein, and Jeffrey Heer. 2011. Proactive Wrangling: Mixed-Initiative End-User Programming of Data Transformation Scripts. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (Santa Barbara, California, USA) (UIST '11). Association for Computing Machinery, New York, NY, USA, 65--74. https://doi.org/10.1145/2047196.2047205Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Bob Hayes. 2020. Who Does the Machine Learning and Data Science Work? customer think -- https://customerthink.com/who-does-the-machine-learning-and-data-science-work/. Accessed: 2021-01--10.Google ScholarGoogle Scholar
  21. Daniel Hellmann, Carleen Maitland, and Andrea Tapia. 2016. Collaborative Analytics and Brokering in Digital Humanitarian Response. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work and Social Computing (San Francisco, California, USA) (CSCW '16). Association for Computing Machinery, New York, NY, USA, 1284--1294. https://doi.org/10.1145/2818048.2820067Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Stephanie C. Hicks and Roger D. Peng. 2019. Elements and Principles for Characterizing Variation between Data Analyses. arXiv:1903.07639 [stat.AP]Google ScholarGoogle Scholar
  23. C. Hill, R. Bellamy, T. Erickson, and M. Burnett. 2016. Trials and tribulations of developers of intelligent systems: A field study. In 2016 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 162--170. https://doi.org/10.1109/VLHCC.2016.7739680Google ScholarGoogle ScholarCross RefCross Ref
  24. Arlie Russell Hochschild. 2012. The Managed Heart: Commercialization of Human Feeling (1 ed.). University of California Press.Google ScholarGoogle Scholar
  25. Youyang Hou and Dakuo Wang. 2017. Hacking with NPOs: Collaborative Analytics and Broker Roles in Civic Data Hackathons. Proc. ACM Hum.-Comput. Interact. 1, CSCW, Article 53 (Dec. 2017), 16 pages. https://doi.org/10.1145/3134688Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Marina Jirotka, Charlotte P. Lee, and Gary M. Olson. 2013. Supporting Scientific Collaboration: Methods, Tools and Concepts. Comput. Supported Coop. Work 22, 4--6 (Aug. 2013), 667--715. https://doi.org/10.1007/s10606-012--9184-0Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. 2011. Wrangler: Interactive Visual Specification of Data Transformation Scripts. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI '11). Association for Computing Machinery, New York, NY, USA, 3363--3372. https://doi.org/10.1145/1978942.1979444Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sean Kandel, Andreas Paepcke, Joseph M. Hellerstein, and Jeffrey Heer. 2012. Enterprise Data Analysis and Visualization: An Interview Study. IEEE Transactions on Visualization and Computer Graphics 18, 12 (Dec. 2012), 2917--2926. https://doi.org/10.1109/TVCG.2012.219Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. 2016. The Emerging Role of Data Scientists on Software Development Teams. In Proceedings of the 38th International Conference on Software Engineering (Austin, Texas) (ICSE '16). Association for Computing Machinery, New York, NY, USA, 96--107. https://doi.org/10.1145/2884781.2884783Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sean Kross and Philip J. Guo. 2019. Practitioners Teaching Data Science in Industry and Academia: Expectations, Workflows, and Challenges. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI '19). Association for Computing Machinery, New York, NY, USA, 1--14. https://doi.org/10.1145/3290605.3300493Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sean Kross, Roger D. Peng, Brian S. Caffo, Ira Gooding, and Jeffrey T. Leek. 2020. The Democratization of Data Science Education. The American Statistician 74, 1 (2020), 1--7. https://doi.org/10.1080/00031305.2019.1668849 arXiv:https://doi.org/10.1080/00031305.2019.1668849Google ScholarGoogle ScholarCross RefCross Ref
  32. Sam Lau, Ian Drosos, Julia M. Markel, and Philip J. Guo. 2020. The Design Space of Computational Notebooks: An Analysis of 60 Systems in Academia and Industry. In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) (VL/HCC '20).Google ScholarGoogle Scholar
  33. Katherine A. Lawrence. 2006. Walking the Tightrope: The Balancing Acts of a Large e-Research Project. Comput. Supported Coop. Work 15, 4 (Aug. 2006), 385--411. https://doi.org/10.1007/s10606-006--9025-0Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Diane Lending and Thomas W. Dillon. 2013. Identifying Skills for Entry-Level IT Consultants. In Proceedings of the 2013 Annual Conference on Computers and People Research (Cincinnati, Ohio, USA) (SIGMIS-CPR '13). Association for Computing Machinery, New York, NY, USA, 87--92. https://doi.org/10.1145/2487294.2487311Google ScholarGoogle Scholar
  35. Bertram Ludäscher, Ilkay Altintas, Chad Berkley, Dan Higgins, Efrat Jaeger, Matthew Jones, Edward A. Lee, Jing Tao, and Yang Zhao. 2006. Scientific workflow management and the Kepler system: Research Articles. Concurr. Comput.: Pract. Exper. 18, 10 (2006), 1039--1065. https://doi.org/10.1002/cpe.v18:10Google ScholarGoogle ScholarCross RefCross Ref
  36. Willam Lurie. 1958. The Impertinent Questioner: The Scientist's Guide to the Statistician's Mind. American Scientist 46, 1 (1958), 57--61.Google ScholarGoogle Scholar
  37. Yaoli Mao, Dakuo Wang, Michael Muller, Kush R. Varshney, Ioana Baldini, Casey Dugan, and Aleksandra Mojsilovic. 2019. How Data Scientists Work Together With Domain Experts in Scientific Collaborations: To Find The Right Answer Or To Ask The Right Question? Proc. ACM Hum.-Comput. Interact. 3, GROUP, Article 237 (Dec. 2019), 23 pages. https://doi.org/10.1145/3361118Google ScholarGoogle Scholar
  38. Pietro Mazzoleni, Sweefen Goh, Richard Goodwin, Manisha Bhandar, Shyh-Kwei Chen, Juhnyoung Lee, Vibha Singhal Sinha, Senthil Mani, Debdoot Mukherjee, Biplav Srivastava, Pankaj Dhoolia, Elad Fein, and Natalia Razinkov. 2009. Consultant Assistant: A Tool for Collaborative Requirements Gathering and Business Process Documentation. In Proceedings of the 24th ACM SIGPLAN Conference Companion on Object Oriented Programming Systems Languages and Applications (Orlando, Florida, USA) (OOPSLA '09). Association for Computing Machinery, New York, NY, USA, 807--808. https://doi.org/10.1145/1639950.1640025Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Hui Miao, Ang Li, Larry S. Davis, and Amol Deshpande. 2017. Towards Unified Data and Lifecycle Management for Deep Learning. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). 571--582. https://doi.org/10.1109/ICDE.2017.112Google ScholarGoogle Scholar
  40. Wendy Moncur. 2013. The Emotional Wellbeing of Researchers: Considerations for Practice. Association for Computing Machinery, New York, NY, USA, 1883--1890. https://doi.org/10.1145/2470654.2466248Google ScholarGoogle Scholar
  41. Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q. Vera Liao, Casey Dugan, and Thomas Erickson. 2019. How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI '19). Association for Computing Machinery, New York, NY, USA, Article 126, 15 pages. https://doi.org/10.1145/3290605.3300356Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Tom Oinn, Mark Greenwood, Matthew Addis, M. Nedim Alpdemir, Justin Ferris, Kevin Glover, Carole Goble, Antoon Goderis, Duncan Hull, Darren Marvin, Peter Li, Phillip Lord, Matthew R. Pocock, Martin Senger, Robert Stevens, Anil Wipat, and Chris Wroe. 2006. Taverna: lessons in creating a workflow environment for the life sciences: Research Articles. Concurr. Comput. : Pract. Exper. 18, 10 (2006), 1067--1100. https://doi.org/10.1002/cpe.v18:10Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Gary M. Olson and Judith S. Olson. 2000. Distance Matters. Hum.-Comput. Interact. 15, 2 (Sept. 2000), 139--178.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Gary M. Olson, Ann Zimmerman, and Nathan Bos. 2008. Scientific Collaboration on the Internet. The MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Roger Peng. 2019. How Data Scientists Think - A Mini Case Study. Simply Stats blog -- https://simplystatistics.org/2019/01/09/how-data-scientists-think-a-mini-case-study/. Accessed: 2020--10--10.Google ScholarGoogle Scholar
  46. Roger Peng. 2019. The Tentpoles of Data Science. Simply Stats blog -- https://simplystatistics.org/2019/01/18/the-tentpoles-of-data-science/. Accessed: 2020--10--10.Google ScholarGoogle Scholar
  47. Roger Peng and Hilary Parker. 2018. Not So Standard Deviations podcast, episodes on Design Thinking (Episodes 63--69). https://nssdeviations.com/63-book-club-part-1. Accessed: 2020--10--10.Google ScholarGoogle Scholar
  48. Roger D. Peng. 2011. Reproducible Research in Computational Science. Science 334, 6060 (2011), 1226--1227. https://doi.org/10.1126/science.1213847 arXiv: https://science.sciencemag.org/content/334/6060/1226.full.pdfGoogle ScholarGoogle ScholarCross RefCross Ref
  49. Kathleen H. Pine and Max Liboiron. 2015. The Politics of Measurement and Action. Association for Computing Machinery, New York, NY, USA, 3147--3156. https://doi.org/10.1145/2702123.2702298Google ScholarGoogle Scholar
  50. ProjectPro. 2020. Type A Data Scientist vs. Type B Data Scientist. https://www.dezyre.com/article/type-a-data-scientist-vs-type-b-data-scientist/194. Accessed: 2020--10--10.Google ScholarGoogle Scholar
  51. Noopur Raval and Paul Dourish. 2016. Standing Out from the Crowd: Emotional Labor, Body Labor, and Temporal Labor in Ridesharing. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (San Francisco, California, USA) (CSCW '16). Association for Computing Machinery, New York, NY, USA, 97--107. https://doi.org/10.1145/2818048.2820026Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Arvind Satyanarayan and Jeffrey Heer. 2014. Lyra: An Interactive Visualization Design Environment. In Proceedings of the 16th Eurographics Conference on Visualization (Swansea, Wales, United Kingdom) (EuroVis '14). Eurographics Association, Goslar, DEU, 351--360.Google ScholarGoogle ScholarCross RefCross Ref
  53. Benjamin Saunders, Julius Sim, Tom Kingstone, Shula Baker, Jackie Waterfield, Bernadette Bartlam, Heather Burroughs, and Clare Jinks. 2018. Saturation in qualitative research: exploring its conceptualization and operationalization. Quality & quantity 52, 4 (2018).Google ScholarGoogle Scholar
  54. Carlos E. Scheidegger, Huy T. Vo, David Koop, Juliana Freire, and Claudio T. Silva. 2008. Querying and re-using workflows with VisTrails. In SIGMOD '08 (Vancouver, Canada). ACM. https://doi.org/10.1145/1376616.1376747Google ScholarGoogle Scholar
  55. Petr Slovák and Geraldine Fitzpatrick. 2015. Teaching and Developing Social and Emotional Skills with Technology. ACM Trans. Comput.-Hum. Interact. 22, 4, Article 19 (June 2015), 34 pages. https://doi.org/10.1145/2744195Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Susan Stager. 1986. The Consultant as Collaborator: The Process Facilatator Model. SIGUCCS Newsl. 16, 2 (June 1986), 22--26. https://doi.org/10.1145/382151.382978Google ScholarGoogle Scholar
  57. Sara Stoudt, Váleri N. Vásquez, and Ciera C. Martinez. 2021. Principles for data analysis workflows. PLOS Computational Biology 17, 3 (03 2021), 1--26. https://doi.org/10.1371/journal.pcbi.1008770Google ScholarGoogle Scholar
  58. Lucy Suchman. 1993. Do Categories Have Politics? The Language/Action Perspective Reconsidered. In Proceedings of the Third Conference on European Conference on Computer-Supported Cooperative Work (Milan, Italy) (ECSCW'93). Kluwer Academic Publishers, USA, 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Hanxin Tang. 2019. The Building of Trust in Client-Consultant Relationships and Its Influence on Data Protection in Consulting. In Proceedings of the 2019 2nd International Conference on Information Management and Management Sciences (Chengdu, China) (IMMS 2019). Association for Computing Machinery, New York, NY, USA, 75--79. https://doi.org/10.1145/3357292.3357295Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Dakuo Wang, Justin D. Weisz, Michael Muller, Parikshit Ram, Werner Geyer, Casey Dugan, Yla Tausczik, Horst Samulowitz, and Alexander Gray. 2019. Human-AI Collaboration in Data Science: Exploring Data Scientists' Perceptions of Automated AI. Proc. ACM Hum.-Comput. Interact. 3, CSCW, Article 211 (Nov. 2019), 24 pages. https://doi.org/10.1145/3359313Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Hadley Wickham and Garrett Grolemund. 2017. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data (1st ed.). O'Reilly Media, Inc.Google ScholarGoogle Scholar
  62. Karlijn Willems. 2017. Data Scientist vs. Data Engineer. https://www.datacamp.com/community/blog/data-scientist-vs-data-engineer. Accessed: 2020--10--10.Google ScholarGoogle Scholar
  63. Kanit Wongsuphasawat, Yang Liu, and Jeffrey Heer. 2019. Goals, Process, and Challenges of Exploratory Data Analysis: An Interview Study. arXiv:1911.00568 [cs.HC]Google ScholarGoogle Scholar
  64. Kanit Wongsuphasawat, Dominik Moritz, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. 2016. Voyager: Exploratory Analysis via Faceted Browsing of Visualization Recommendations. IEEE Trans. Visualization & Comp. Graphics (Proc. InfoVis) (2016). http://idl.cs.washington.edu/papers/voyagerGoogle ScholarGoogle Scholar
  65. Amy X. Zhang, Michael Muller, and Dakuo Wang. 2020. How Do Data Science Workers Collaborate? Roles, Workflows, and Tools. Proc. ACM Hum.-Comput. Interact. 4, CSCW1, Article 022 (May 2020), 23 pages. https://doi.org/10.1145/3392826Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Orienting, Framing, Bridging, Magic, and Counseling: How Data Scientists Navigate the Outer Loop of Client Collaborations in Industry and Academia

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the ACM on Human-Computer Interaction
      Proceedings of the ACM on Human-Computer Interaction  Volume 5, Issue CSCW2
      CSCW2
      October 2021
      5376 pages
      EISSN:2573-0142
      DOI:10.1145/3493286
      Issue’s Table of Contents

      Copyright © 2021 Owner/Author

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 October 2021
      Published in pacmhci Volume 5, Issue CSCW2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!