skip to main content
research-article

Designing for Recommending Intermediate States in A Scientific Workflow Management System

Published:29 May 2021Publication History
Skip Abstract Section

Abstract

To process a large amount of data sequentially and systematically, proper management of workflow components (i.e., modules, data, configurations, associations among ports and links) in a Scientific Workflow Management System (SWfMS) is inevitable. Managing data with provenance in a SWfMS to support reusability of workflows, modules, and data is not a simple task. Handling such components is even more burdensome for frequently assembled and executed complex workflows for investigating large datasets with different technologies (i.e., various learning algorithms or models). However, a great many studies propose various techniques and technologies for managing and recommending services in a SWfMS, but only a very few studies consider the management of data in a SWfMS for efficient storing and facilitating workflow executions. Furthermore, there is no study to inquire about the effectiveness and efficiency of such data management in a SWfMS from a user perspective. In this paper, we present and evaluate a GUI version of such a novel approach of intermediate data management with two use cases (Plant Phenotyping and Bioinformatics). The technique we call GUI-RISPTS (Recommending Intermediate States from Pipelines Considering Tool-States) can facilitate executions of workflows with processed data (i.e., intermediate outcomes of modules in a workflow) and can thus reduce the computational time of some modules in a SWfMS. We integrated GUI-RISPTS with an existing workflow management system called SciWorCS. In SciWorCS, we present an interface that users use for selecting the recommendation of intermediate states (i.e., modules' outcomes). We investigated GUI-RISPTS's effectiveness from users' perspectives along with measuring its overhead in terms of storage and efficiency in workflow execution.

References

  1. Enis Afgan, Dannon Baker, Bérénice Batut, Marius van den Beek, Dave Bouvier, Martin ?ech, John Chilton, Dave Clements, Nate Coraor, Björn A Grüning, Aysam Guerler, Jennifer Hillman-Jackson, Saskia Hiltemann, Vahid Jalili, Helena Rasche, Nicola Soranzo, Jeremy Goecks, James Taylor, Anton Nekrutenko, and Daniel Blankenberg. 2018. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Research, Vol. 46 (2018), W537--W544. https://doi.org/10.1093/nar/gky379Google ScholarGoogle ScholarCross RefCross Ref
  2. Rakesh Agrawal, Tomasz Imieli'nski, and Arun Swami. 1993. Mining Association Rules Between Sets of Items in Large Databases. SIGMOD Rec., Vol. 22 (1993), 207--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Emir M. Bahsi, Emrah Ceyhan, and Tevfik Kosar. 2007. Conditional Workflow Management: A Survey and Analysis. Sci. Program., Vol. 15, 4 (Dec. 2007), 283--297. https://doi.org/10.1155/2007/680291 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Duncan A. Brown, Patrick R. Brady, Alexander Dietz, Junwei Cao, Ben Johnson, and John McNabb. 2007. A Case Study on the Use of Workflow Technologies for Scientific Analysis: Gravitational Wave Data Analysis .Springer London, London, 39--59. https://doi.org/10.1007/978--1--84628--757--2_4Google ScholarGoogle Scholar
  5. D. Chakroborti, M. Mondal, B. Roy, C. K. Roy, and K. A. Schneider. 2018. Optimized Storing of Workflow Outputs through Mining Association Rules. In 2018 IEEE International Conference on Big Data (Big Data). 508--515. https://doi.org/10.1109/BigData.2018.8622351Google ScholarGoogle ScholarCross RefCross Ref
  6. Debasish Chakroborti, Banani Roy, Amit Mondal, Golam Mostaeen, Chanchal K. Roy, Kevin A. Schneider, and Ralph Deters. 2020. A Data Management Scheme for Micro-Level Modular Computation-Intensive Programs in Big Data Platforms .Springer International Publishing, Cham, 135--153. https://doi.org/10.1007/978--3-030--32587--9_9Google ScholarGoogle Scholar
  7. Eran Chinthaka, Jaliya Ekanayake, David Leake, and Beth Plale. 2009. CBR based workflow composition assistant. In Proc. of World Congress on Services. 352 -- 355. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Brian Clifton. 2012. Advanced web metrics with Google Analytics .John Wiley & Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Susan B. Davidson and Juliana Freire. 2008. Provenance and Scientific Workflows: Challenges and Opportunities. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08). ACM, New York, NY, USA, 1345--1350. https://doi.org/10.1145/1376616.1376772 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. Deelman and A. Chervenak. 2008. Data Management Challenges of Data-Intensive Scientific Workflows. In 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID). 687--692. https://doi.org/10.1109/CCGRID.2008.24 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Juliana Freire, Cláudio T. Silva, Steven P. Callahan, Emanuele Santos, Carlos E. Scheidegger, and Huy T. Vo. 2006. Managing Rapidly-Evolving Scientific Workflows. In Provenance and Annotation of Data, Luc Moreau and Ian Foster (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 10--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ritu Garg and Awadhesh Kumar Singh. 2015. Adaptive workflow scheduling in grid computing based on dynamic resource availability. Engineering Science and Technology, an International Journal, Vol. 18, 2 (2015), 256 -- 269. https://doi.org/10.1016/j.jestch.2015.01.001Google ScholarGoogle Scholar
  13. Y. Gil, E. Deelman, M. Ellisman, T. Fahringer, G. Fox, D. Gannon, C. Goble, M. Livny, L. Moreau, and J. Myers. 2007. Examining the Challenges of Scientific Workflows. Computer, Vol. 40, 12 (Dec 2007), 24--32. https://doi.org/10.1109/MC.2007.421 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yolanda Gil, Pedro Szekely, Sandra Villamizar, Thomas C. Harmon, Varun Ratnakar, Shubham Gupta, Maria Muslea, Fabio Silva, and Craig A. Knoblock. 2011. Mind Your Metadata: Exploiting Semantics for Configuration, Adaptation, and Provenance in Scientific Workflows. In Proceedings of the 10th International Conference on The Semantic Web - Volume Part II (ISWC'11). Springer-Verlag, Berlin, Heidelberg, 65--80. http://dl.acm.org/citation.cfm?id=2063076.2063082 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jim Gray, David T. Liu, Maria Nieto-Santisteban, Alex Szalay, David J. DeWitt, and Gerd Heber. 2005. Scientific Data Management in the Coming Decade. SIGMOD Rec., Vol. 34, 4 (Dec. 2005), 34--41. https://doi.org/10.1145/1107499.1107503 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Emily H Halili. 2008. Apache JMeter: A practical beginner's guide to automated testing and performance measurement for your websites .Packt Publishing Ltd. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Koop, C. E. Scheidegger, S. P. Callahan, J. Freire, and C. T. Silva. 2008. VisComplete: Automating Suggestions for Visualization Pipelines. IEEE Transactions on Visualization and Computer Graphics, Vol. 14 (2008), 1691--1698. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. David Leake and Joseph Kendall-Morwick. 2008. Towards Case-Based Support for e-Science Workflow Generation by Mining Provenance. In Advances in Case-Based Reasoning, Klaus-Dieter Althoff, Ralph Bergmann, Mirjam Minor, and Alexandre Hanft (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 269--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Golam Mostaeen, Banani Roy, Chanchal Roy, and Kevin Schneider. 2019. Designing for Real-Time Groupware Systems to Support Complex Scientific Data Analysis. Proc. ACM Hum.-Comput. Interact., Vol. 3, EICS, Article Article 9 (June 2019), 28 pages. https://doi.org/10.1145/3331151 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Margo Seltzer. 2006. Provenance-aware Storage Systems. In Proceedings of the Annual Conference on USENIX '06 Annual Technical Conference (ATEC '06). USENIX Association, Berkeley, CA, USA, 4--4. http://dl.acm.org/citation.cfm?id=1267359.1267363 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Radu Prodan and Thomas Fahringer. 2005. Dynamic Scheduling of Scientific Workflow Applications on the Grid: A Case Study. In Proceedings of the 2005 ACM Symposium on Applied Computing (SAC '05). ACM, New York, NY, USA, 687--694. https://doi.org/10.1145/1066677.1066835 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Arcot Rajasekar, Mike Wan, Reagan Moore, and Wayne Schroeder. 2006. A prototype rule-based distributed data management system. (01 2006).Google ScholarGoogle Scholar
  23. H.A. Reijers, I. Vanderfeesten, and W.M.P. van der Aalst. 2016. The effectiveness of workflow management systems: A longitudinal study. International Journal of Information Management, Vol. 36, 1 (2016), 126 -- 141. https://doi.org/10.1016/j.ijinfomgt.2015.08.003 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Peter Sevcik. 2005. Defining the application performance index. Business Communications Review, Vol. 20 (2005).Google ScholarGoogle Scholar
  25. Yogesh L. Simmhan, Beth Plale, and Dennis Gannon. 2005. A Survey of Data Provenance in e-Science. SIGMOD Rec., Vol. 34, 3 (Sept. 2005), 31--36. https://doi.org/10.1145/1084805.1084812 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ola Spjuth, Erik Bongcam-Rudloff, Guillermo Carrasco Hernández, Lukas Forer, Mario Giovacchini, Roman Valls Guimera, Aleksi Kallio, Eija Korpelainen, Maciej M. Ka'n duła, Milko Krachunov, David P. Kreil, Ognyan Kulev, Paweł P. Łabaj, Samuel Lampa, Luca Pireddu, Sebastian Schönherr, Alexey Siretskiy, and Dimitar Vassilev. 2015. Experiences with workflows for automating data-intensive bioinformatics. Biology Direct, Vol. 10 (2015), 43.Google ScholarGoogle ScholarCross RefCross Ref
  27. Jianwu Wang, Daniel Crawl, and Ilkay Altintas. 2009. Kepler+Hadoop: A General Architecture Facilitating Data-intensive Applications in Scientific Workflow Systems. In Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science (WORKS '09). ACM, New York, NY, USA, Article 12, 8 pages. https://doi.org/10.1145/1645164.1645176 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Simon Woodman, Hugo Hiden, and Paul Watson. 2015. Workflow Provenance: An Analysis of Long Term Storage Costs. In Proc. of WORKS. 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Qishi Wu, Mengxia Zhu, Yi Gu, Patrick Brown, Xukang Lu, Wuyin Lin, and Yangang Liu. 2012. A Distributed Workflow Management System with Case Study of Real-life Scientific Applications on Grids. Journal of Grid Computing, Vol. 10, 3 (01 Sep 2012), 367--393. https://doi.org/10.1007/s10723-012--9222--7 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Dong Yuan, Yun Yang, Xiao Liu, and Jinjun Chen. 2011. On-demand Minimum Cost Benchmarking for Intermediate Dataset Storage in Scientific Cloud Workflow Systems. J. Parallel Distrib. Comput., Vol. 71 (2011), 316--332. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jia Zhang, Wei Tan, Alexander John, Ian Foster, and Ravi Madduri. 2011. Recommend-as-you-go: A novel approach supporting services-oriented scientific workflow reuse. In Proc. of SCC. 48 -- 55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Charles Zheng and Douglas Thain. 2015. Integrating Containers into Workflows: A Case Study Using Makeflow, Work Queue, and Docker. In Proceedings of the 8th International Workshop on Virtualization Technologies in Distributed Computing (VTDC '15). ACM, New York, NY, USA, 31--38. https://doi.org/10.1145/2755979.2755984 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Thomas Zimmermann, Peter Weisgerber, Stephan Diehl, and Andreas Zeller. 2004. Mining Version Histories to Guide Software Changes. In Proc. of ICSE. 563--572. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. zur Muhlen. 1999. Evaluation of workflow management systems using meta models. In Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers, Vol. Track 5. 11 pp.--. https://doi.org/10.1109/HICSS.1999.772961 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Designing for Recommending Intermediate States in A Scientific Workflow Management System

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!