Abstract
Scientific Workflow Management Systems (SWfMSs) have become popular for accelerating the specification, execution, visualization, and monitoring of data-intensive scientific experiments. Unfortunately, to the best of our knowledge no existing SWfMSs directly support collaboration. Data is increasing in complexity, dimensionality, and volume, and the efficient analysis of data often goes beyond the realm of an individual and requires collaboration with multiple researchers from varying domains. In this paper, we propose a groupware system architecture for data analysis that in addition to supporting collaboration, also incorporates features from SWfMSs to support modern data analysis processes. As a proof of concept for the proposed architecture we developed SciWorCS - a groupware system for scientific data analysis. We present two real-world use-cases: collaborative software repository analysis and bioinformatics data analysis. The results of the experiments evaluating the proposed system are promising. Our bioinformatics user study demonstrates that SciWorCS can leverage real-world data analysis tasks by supporting real-time collaboration among users.
- Enis Afgan, Dannon Baker, Marius Van den Beek, Daniel Blankenberg, Dave Bouvier, Martin vC ech, John Chilton, Dave Clements, Nate Coraor, Carl Eberhard, et al. 2016. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic acids research, Vol. 44, W1 (2016), W3--W10.Google Scholar
- Brenda S Baker. 1995. On finding duplication and near-duplication in large software systems. In Reverse Engineering, 1995., Proceedings of 2nd Working Conference on. IEEE, 86--95. Google Scholar
Digital Library
- Adam Barker and Jano Van Hemert. 2007. Scientific workflow: a survey and research directions. In International Conference on Parallel Processing and Applied Mathematics. Springer, 746--753. Google Scholar
Digital Library
- Aaron Bauer and Zoran Popović. 2017. Collaborative Problem Solving in an Open-Ended Scientific Discovery Game. Proc. ACM Hum.-Comput. Interact., Vol. 1, CSCW (Dec. 2017), 22:1--22:21. Google Scholar
Digital Library
- Fahima Bhuyan, Shiyong Lu, Robert Reynolds, Ishtiaq Ahmed, and Jia Zhang. 2018. Quality Analysis for Scientific Workflow Provenance Access Control Policies. In 2018 IEEE International Conference on Services Computing (SCC). IEEE, 261--264.Google Scholar
- Robert P Bostrom. 1980. Role conflict and ambiguity: Critical variables in the MIS user-designer relationship. In Proceedings of the seventeenth annual computer personnel research conference. ACM, 88--115. Google Scholar
Digital Library
- Joseph Brown, Meg Pirrung, and Lee Ann McCue. 2017. FQC Dashboard: integrates FastQC results into a web-based, interactive, and extensible FASTQ quality control tool. Bioinformatics, Vol. 33, 19 (2017), 3137--3139.Google Scholar
Cross Ref
- Jeffrey L Brown, Clayton S Ferner, Thomas C Hudson, Ann E Stapleton, Ronald J Vetter, Tristan Carland, Andrew Martin, Jerry Martin, Allen Rawls, William J Shipman, et al. 2005. Gridnexus: A grid services scientific workflow system. International Journal of Computer Information Science (IJCIS), Vol. 6, 2 (2005), 72--82.Google Scholar
- Steven P Callahan, Juliana Freire, Emanuele Santos, Carlos E Scheidegger, Cláudio T Silva, and Huy T Vo. 2006. VisTrails: visualization meets data management. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data. ACM, 745--747. Google Scholar
Digital Library
- Esther Care, Patrick Griffin, Claire Scoular, Nafisa Awwal, and Nathan Zoanetti. 2015. Collaborative problem solving tasks. In Assessment and teaching of 21st century skills. Springer, 85--104.Google Scholar
- Artem Chebotko, Shiyong Lu, Seunghan Chang, Farshad Fotouhi, and Ping Yang. 2010. Secure abstraction views for scientific workflow provenance querying. IEEE Transactions on Services Computing 4 (2010), 322--337. Google Scholar
Digital Library
- Yuan Cheng, Fazhi He, Yiqi Wu, and Dejun Zhang. 2016. Meta-operation conflict resolution for human-human interaction in collaborative feature-based CAD systems. Cluster Computing, Vol. 19, 1 (2016), 237--253. Google Scholar
Digital Library
- Brian Corrie and Todd Zimmerman. 2009. Build It: Will They Come? In Media Space 20Google Scholar
- Years of Mediated Life. Springer, 393--413.Google Scholar
- David De, Roure Carole, and Goble Robert Stevens. 2008. The design and realisation of the myexperiment virtual research environment for social sharing of workflows. (2008).Google Scholar
- Ewa Deelman, Dennis Gannon, Matthew Shields, and Ian Taylor. 2009. Workflows and e-Science: An overview of workflow system features and capabilities. Future generation computer systems, Vol. 25, 5 (2009), 528--540. Google Scholar
Digital Library
- Ewa Deelman, Gurmeet Singh, Mei-Hui Su, James Blythe, Yolanda Gil, Carl Kesselman, Gaurang Mehta, Karan Vahi, G Bruce Berriman, John Good, et al. 2005. Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Scientific Programming, Vol. 13, 3 (2005), 219--237. Google Scholar
Digital Library
- Joanna DeFranco-Tommarello and F Deek. 2002. Collaborative software development: a discussion of problem solving models and groupware technologies. In hicss. IEEE, 41. Google Scholar
Digital Library
- FastQC. {n. d.}. A quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.Google Scholar
- Xubo Fei and Shiyong Lu. 2012. A dataflow-based scientific workflow composition framework. IEEE Transactions on Services Computing, Vol. 5, 1 (2012), 45--58. Google Scholar
Digital Library
- Xubo Fei, Shiyong Lu, and Jia Zhang. 2011. A Granular Concurrency Control for Collaborative Scientific Workflow Composition. In Services Computing (SCC), 2011 IEEE International Conference on. IEEE, 410--417. Google Scholar
Digital Library
- Stephen M Fiore and Travis J Wiltshire. 2016. Technology as teammate: Examining the role of external cognition in support of team cognitive processes. Frontiers in psychology, Vol. 7 (2016), 1531.Google Scholar
- Juliana Freire, David Koop, Emanuele Santos, and Cláudio T Silva. 2008. Provenance for computational tasks: A survey. Computing in Science & Engineering, Vol. 10, 3 (2008). Google Scholar
Digital Library
- Liping Gao, Fangyu Yu, Qingkui Chen, and Naixue Xiong. 2016. Consistency maintenance of Do and Undo/Redo operations in real-time collaborative bitmap editing systems. Cluster Computing, Vol. 19, 1 (2016), 255--267. Google Scholar
Digital Library
- Ritu Garg and Awadhesh Kumar Singh. 2015. Adaptive workflow scheduling in grid computing based on dynamic resource availability. Engineering Science and Technology, an International Journal, Vol. 18, 2 (2015), 256--269.Google Scholar
- Jeremy Goecks, Anton Nekrutenko, and James Taylor. 2010. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome biology, Vol. 11, 8 (2010), R86.Google Scholar
- GoJS. {n. d.}. Interactive JavaScript Diagrams in HTML. https://gojs.net/latest/index.html .Google Scholar
- Ian Goldin. 2010. World wide research: Reshaping the sciences and humanities .MIT Press. Google Scholar
Digital Library
- Katharina Görlach, Mirko Sonntag, Dimka Karastoyanova, Frank Leymann, and Michael Reiter. 2011. Conventional workflow technology for scientific simulation. In Guide to e-Science. Springer, 323--352.Google Scholar
- Saul Greenberg, Carl Gutwin, and Mark Roseman. 1996. Semantic telepointers for groupware. In Computer-Human Interaction, 1996. Proceedings., Sixth Australian Conference on. IEEE, 54--61. Google Scholar
Digital Library
- Carl Gutwin and Saul Greenberg. 1995. Support for group awareness in real-time desktop conferences. (1995).Google Scholar
- Sandra G Hart and Lowell E Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In Advances in psychology. Vol. 52. Elsevier, 139--183.Google Scholar
- Mark Hartswood, Rob Procter, Mark Rouncefield, and Roger Slack. 2003. Making a case in medical work: implications for the electronic medical record. Computer Supported Cooperative Work (CSCW), Vol. 12, 3 (2003), 241--266. Google Scholar
Digital Library
- David Hollingsworth and UK Hampshire. 1995. Workflow management coalition: The workflow reference model. Document Number TC00--1003, Vol. 19 (1995), 16.Google Scholar
- Marina Jirotka, Charlotte P Lee, and Gary M Olson. 2013. Supporting scientific collaboration: Methods, tools and concepts. Computer Supported Cooperative Work (CSCW), Vol. 22, 4--6 (2013), 667--715.Google Scholar
Cross Ref
- Marina Jirotka, Rob Procter, Mark Hartswood, Roger Slack, Andrew Simpson, Catelijne Coopmans, Chris Hinds, and Alex Voss. 2005. Collaboration and trust in healthcare innovation: The eDiaMoND case study. Computer Supported Cooperative Work (CSCW), Vol. 14, 4 (2005), 369--398. Google Scholar
Digital Library
- Kaggle. {n. d.}. Titanic: Machine Learning from Disaster. https://www.kaggle.com/c/titanic/data .Google Scholar
- Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue. 2002. CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering, Vol. 28, 7 (2002), 654--670. Google Scholar
Digital Library
- Cory J Kapser and Michael W Godfrey. 2006. Supporting the analysis of clones in software systems. Journal of Software: Evolution and Process, Vol. 18, 2 (2006), 61--82. Google Scholar
Digital Library
- VR Kavitha and N Suresh Kumar. 2013. A Method for identifying loops in a Workflow using Petri Nets. Life Science Journal, Vol. 10, 3 (2013).Google Scholar
- Terhi Kilamo, Antti Nieminen, Janne Lautam"aki, Timo Aho, Johannes Koskinen, Jarmo Palviainen, and Tommi Mikkonen. 2014. Knowledge transfer in collaborative teams: experiences from a two-week code camp. In Companion Proceedings of the 36th International Conference on Software Engineering. ACM, 264--271. Google Scholar
Digital Library
- Rainer Koschke, Raimar Falke, and Pierre Frenzel. 2006. Clone detection using abstract syntax suffix trees. In Reverse Engineering, 2006. WCRE'06. 13th Working Conference on. IEEE, 253--262. Google Scholar
Digital Library
- Cui Lin, Shiyong Lu, Xubo Fei, Artem Chebotko, Darshan Pai, Zhaoqiang Lai, Farshad Fotouhi, and Jing Hua. 2009. A reference architecture for scientific workflow management systems and the VIEW SOA solution. IEEE Transactions on Services Computing, Vol. 2, 1 (2009), 79--92. Google Scholar
Digital Library
- Ji Liu, Esther Pacitti, Patrick Valduriez, and Marta Mattoso. 2015. A survey of data-intensive scientific workflow management. Journal of Grid Computing, Vol. 13, 4 (2015), 457--493. Google Scholar
Digital Library
- Salvatore Loreto and Simon Pietro Romano. 2014. Real-Time Communication with WebRTC: Peer-to-Peer in the Browser ." O'Reilly Media, Inc.".Google Scholar
- LSST. 2009. Large Synoptic Survey Telescope. http://www.lsst.org/lsst/science .Google Scholar
- Shiyong Lu and Jia Zhang. 2009. Collaborative scientific workflows. In Web Services, 2009. ICWS 2009. IEEE International Conference on. IEEE, 527--534.Google Scholar
Digital Library
- Bertram Ludascher, Ilkay Altintas, Chad Berkley, Dan Higgins, Efrat Jaeger, Matthew Jones, Edward A Lee, Jing Tao, and Yang Zhao. 2006. Scientific workflow management and the Kepler system. Concurrency and Computation: Practice and Experience, Vol. 18, 10 (2006), 1039--1065.Google Scholar
Digital Library
- Paul Luff, Jon Hindmarsh, and Christian Heath. 2000. Workplace studies: Recovering work practice and informing system design .Cambridge university press.Google Scholar
Cross Ref
- Ruiqi Luo, Ping Yang, Shiyong Lu, and Mikhail Gofman. 2012. Analysis of scientific workflow provenance access control policies. In Services Computing (SCC), 2012 IEEE Ninth International Conference on. IEEE, 266--273. Google Scholar
Digital Library
- D.H. Honemann M. Robert, W.J. Evans and T.J. Balch. {n. d.}. Robert's Rules of Order. Newly Revised, 10th Edition. Perseus Publishing Company, 2000.Google Scholar
- Marta Mattoso, Claudia Werner, Guilherme Horta Travassos, Vanessa Braganholo, Eduardo Ogasawara, Daniel Oliveira, Sergio Cruz, Wallace Martinho, and Leonardo Murta. 2010. Towards supporting the life cycle of large scale scientific experiments. International Journal of Business Process Integration and Management, Vol. 5, 1 (2010), 79--92.Google Scholar
Cross Ref
- Ana Isabel Molina, Miguel Ángel Redondo, and Manuel Ortega. 2009. A methodological approach for user interface development of collaborative applications: A case study. Science of Computer Programming, Vol. 74, 9 (2009), 754--776. Google Scholar
Digital Library
- Ana I Molina, Miguel A Redondo, Manuel Ortega, and Ulrich Hoppe. 2008. CIAM: A methodology for the development of groupware user interfaces. J. UCS, Vol. 14, 9 (2008), 1435--1446.Google Scholar
- Golam Mostaeen, Banani Roy, Chanchal K. Roy, and Kevin A. Schneider. 2018a. Fine-Grained Attribute Level Locking Scheme for Collaborative Scientific Workflow Development. In Services Computing (SCC), 2018 IEEE International Conference on. IEEE, 273--277.Google Scholar
- G. Mostaeen, Jeffrey Svajlenko, Banani Roy, Chanchal K. Roy, and K. Schneider. 2018b. On the Use of Machine Learning Techniques Towards the Design of Cloud Based Automatic Code Clone Validation Tools. In Source Code Analysis and Manipulation, 2018. SCAM 2018. 18th IEEE International Working Conference on. IEEE.Google Scholar
- myExperiment. {n. d.}. Advanced FastQ manipulation. https://www.myexperiment.org/workflows/2944.html .Google Scholar
- myExperiment. {n. d.}. galaxy_101. https://www.myexperiment.org/workflows/2939.html .Google Scholar
- myExperiment. {n. d.}. NGS : Pair reads assembly with Velvet Workflow. https://www.myexperiment.org/workflows/4095.html .Google Scholar
- myExperiment. {n. d.} d. Tuto Galaxy 2013 : CPB2012 - BasicProtocol3 - Calling Peaks for ChIP-seq Data. https://www.myexperiment.org/workflows/4094.html .Google Scholar
- Davide Nicolini. 2012. Practice theory, work, and organization: An introduction .OUP Oxford.Google Scholar
- Eduardo Ogasawara, Jonas Dias, Vitor Silva, Fernando Chirigati, Daniel Oliveira, Fabio Porto, Patrick Valduriez, and Marta Mattoso. 2013. Chiron: a parallel engine for algebraic scientific workflows. Concurrency and Computation: Practice and Experience, Vol. 25, 16 (2013), 2327--2341.Google Scholar
Cross Ref
- Tom Oinn, Mark Greenwood, Matthew Addis, M Nedim Alpdemir, Justin Ferris, Kevin Glover, Carole Goble, Antoon Goderis, Duncan Hull, Darren Marvin, et al. 2006. Taverna: lessons in creating a workflow environment for the life sciences. Concurrency and Computation: Practice and Experience, Vol. 18, 10 (2006), 1067--1100. Google Scholar
Digital Library
- Angela Orebaugh, Gilbert Ramirez, and Jay Beale. 2006. Wireshark & Ethereal network protocol analyzer toolkit .Elsevier. Google Scholar
Digital Library
- Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. Journal of machine learning research, Vol. 12, Oct (2011), 2825--2830.Google Scholar
Digital Library
- Jeffrey M Perkel. 2014. Scientific writing: the online cooperative: collaborative browser-based tools aim to change the way researchers write and publish their papers. Nature, Vol. 514, 7520 (2014), 127--129.Google Scholar
- Radu Prodan and Thomas Fahringer. 2005. Dynamic scheduling of scientific workflow applications on the grid: a case study. In Proceedings of the 2005 ACM symposium on Applied computing. ACM, 687--694. Google Scholar
Digital Library
- David Randall, Richard Harper, and Mark Rouncefield. 2007. Fieldwork for design: theory and practice .Springer Science & Business Media. Google Scholar
Digital Library
- Banani Roy and TC Nicholas Graham. 2008. An iterative framework for software architecture recovery: An experience report. In European Conference on Software Architecture. Springer, 210--224. Google Scholar
Digital Library
- Banani Roy, Amit Kumar Mondal, Chanchal K Roy, Kevin A Schneider, and Kawser Wazed. 2017. Towards a reference architecture for cloud-based plant genotyping and phenotyping analysis frameworks. In 2017 IEEE International Conference on Software Architecture (ICSA). IEEE, 41--50.Google Scholar
Cross Ref
- Chanchal K. Roy and James R. Cordy. 2007. A survey on software clone detection research. Queen's School of Computing TR, Vol. 541, 115 (2007), 64--68.Google Scholar
- Chanchal K. Roy and James R. Cordy. 2008. An empirical study of function clones in open source software. In Reverse Engineering, 2008. WCRE'08. 15th Working Conference on. IEEE, 81--90. Google Scholar
Digital Library
- Chanchal K Roy and James R Cordy. 2008. NICAD: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. In Program Comprehension, 2008. ICPC 2008. The 16th IEEE International Conference on. IEEE, 172--181. Google Scholar
Digital Library
- Gergely Sipos. 2012. Protecting the consistency of workflow applications in collaborative development environments. Future Generation Computer Systems, Vol. 28, 3 (2012), 500--512.Google Scholar
Digital Library
- Gergely Sipos and Péter Kacsuk. 2009. Maintaining consistency properties of grid workflows in collaborative editing systems. In Grid and Cooperative Computing, 2009. GCC'09. Eighth International Conference on. IEEE, 168--175.Google Scholar
Digital Library
- Gergely Sipos and Peter K Kacsuk. 2005. Collaborative workflow editing in the P-GRADE portal. (2005).Google Scholar
- Gergely Sipos, Gareth Lewis, Péter Kacsuk, and Vassil Alexandrov. 2005. Workflow-oriented collaborative grid portals. Advances in Grid Computing-EGC 2005 (2005), 64--69.Google Scholar
- Apache Spark. {n. d.}. Apache Spark Lightning-fast cluster computing. https://spark.apache.org/.Google Scholar
- Chengzheng Sun. 2002. Optional and responsive fine-grain locking in Internet-based collaborative systems. IEEE Transactions on Parallel and Distributed Systems, Vol. 13, 9 (2002), 994--1008. Google Scholar
Digital Library
- Chengzheng Sun and David Chen. 2002. Consistency maintenance in real-time collaborative graphics editing systems. ACM Transactions on Computer-Human Interaction (TOCHI), Vol. 9, 1 (2002), 1--41. Google Scholar
Digital Library
- David Sun and Chengzheng Sun. 2006. Operation context and context-based operational transformation. In Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work. ACM, 279--288. Google Scholar
Digital Library
- Jeff Thomas Svajlenko et al. 2018. Large-Scale Clone Detection and Benchmarking. Ph.D. Dissertation. University of Saskatchewan.Google Scholar
- Ian Taylor, Matthew Shields, Ian Wang, and Andrew Harrison. 2007. The triana workflow environment: Architecture and applications. Workflows for e-Science (2007), 320--339.Google Scholar
- Helga Thorvaldsdóttir, James T Robinson, and Jill P Mesirov. 2013. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Briefings in bioinformatics, Vol. 14, 2 (2013), 178--192.Google Scholar
- useGalaxy. {n. d.}. An open source, web-based platform for data intensive biomedical research. https://usegalaxy.org/.Google Scholar
- Tiantian Wang, Mark Harman, Yue Jia, and Jens Krinke. 2013. Searching for better configurations: a rigorous approach to clone evaluation. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. ACM, 455--465.Google Scholar
Digital Library
- Jiachen Yang, Keisuke Hotta, Yoshiki Higo, Hiroshi Igaki, and Shinji Kusumoto. 2015. Classification model for code clones based on machine learning. Empirical Software Engineering, Vol. 20, 4 (2015), 1095--1125. Google Scholar
Digital Library
- Jia Zhang. 2010. Co-Taverna: a tool supporting collaborative scientific workflows. In Services Computing (SCC), 2010 IEEE International Conference on. IEEE, 41--48. Google Scholar
Digital Library
- Jia Zhang, Daniel Kuc, and Shiyong Lu. 2014. Confucius: A tool supporting collaborative scientific workflow composition. IEEE Transactions on Services Computing, Vol. 7, 1 (2014), 2--17. Google Scholar
Digital Library
- Haibin Zhu and MengChu Zhou. 2006. Role-based collaboration and its kernel mechanisms. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), Vol. 36, 4 (2006), 578--589.Google Scholar
Digital Library
Index Terms
Designing for Real-Time Groupware Systems to Support Complex Scientific Data Analysis
Recommendations
The facilitators perspective on meetings and implications for group support systems design
Based on research into group process facilitation, a meeting model is proposed that defines the many activities comprising group work and highlights the critical facilitator actions. Facilitating group work is a dynamic process that involves managing ...
A groupware system to support collaborative programming: Design and experiences
The advances in network and collaboration technologies enable the creation of powerful environments for collaborative programming. One such environment is COLLECE, a groupware system to support collaborative edition, compilation and execution of ...
Real-Time Collaboration and Experience Reuse for Cloud-Based Workflow Management Systems
CBI '13: Proceedings of the 2013 IEEE 15th Conference on Business InformaticsIn this paper we explore the concept of a real-time workflow collaboration platform. The work presents how a cloud-based Workflow Management System (WfMS) combines the technologic features which are offered by the cloud computing paradigm with a ...






Comments