Abstract
This paper presents a system deployed on parallel clusters to manage a collection of parallel simulations that make up a computational study. It explores how such a system can extend traditional parallel job scheduling and resource allocation techniques to incorporate knowledge specific to the study.
Using a UINTAH-based helium gas simulation code (ARCHES) and the SimX system for multi-experiment computational studies, this paper demonstrates that, by using application-specific knowledge in resource allocation and scheduling decisions, one can reduce the run time of a computational study from over 20 hours to under 4.5 hours on a 32-processor cluster, and from almost 11 hours to just over 3.5 hours on a 64-processor cluster.
- D. Abramson, A. Lewis, T. Peachey, and C. Fletcher. An automatic design optimization tool and its application to computational fluid dynamics. In Proc. SC'01, 2001. Google Scholar
Digital Library
- D. Abramson, R. Sosic, J. Giddy, and B. Hall. Nimrod: A tool for performing parameterised simulations using distributed workstations. In Proc. HPDC, 1995. Google Scholar
Digital Library
- H. Casanova and J. Dongarra. Netsolve: A network server for computational science problems. Intl. J. of Supercomp. Appl. and High Perf. Comp., 11(3):212--223, 1997.Google Scholar
Digital Library
- J. Davison de St. Germain, J. McCorquodale, S. Parker, and C. Johnson. Uintah: a massively parallel problem solving environment. In Proc. HPDC, pages 33--41, 2000. Google Scholar
Digital Library
- P. E. DesJardin, T. J. O'Hern, and S. R. Tieszen. Large eddy simulation and experimental measurements of the near-field of a large turbulent helium plume. Physics of Fluids, 16(6):1866--1883, 2004.Google Scholar
Cross Ref
- M. Faerman, A. Birnbaum, H. Casanova, and F. Berman. Resource allocation for steerable parallel parameter searches. In Proc. Grid'02, Nov 2002. Google Scholar
Digital Library
- D. G. Feitelson. Job scheduling in multiprogrammed parallel systems. IBM Research Report RC 19790 (87657), Aug 1997.Google Scholar
- Fujita and Yamashita. Approximation algorithms for multiprocessor scheduling problem. TIEICE, 2000.Google Scholar
- M. Gries. Methods for evaluating and covering the design space during early design development. Integration, the VLSI Journal, 38(2):131--183, 2004. Google Scholar
Digital Library
- A. Messac. Physical programming: Effective optimization for computational design. AIAA Journal, 31(4):149--158, 1996.Google Scholar
Cross Ref
- J. Nabrzyski, J. Schopf, and J. Weglarz, editors. Grid Resource Management: State of the Art and Future Trends. Kluwer, 2003. Google Scholar
Digital Library
- S. Parker and C. Johnson. SCIRun: a scientific programming environment for computational steering. In Proc. SC'95, pages 1419--39, 1995. Google Scholar
Digital Library
- S. Parker, M. Miller, C. Hansen, and C. Johnson. An integrated problem solving environment: the SCIRun computational steering system. In Proc. HICSS, volume vol.7, pages 147--56, 1998. Google Scholar
Digital Library
- J. Schmidt and C. Johnson. DefibSim: An interactive defibrillation device design tool. In Proc. EMBS Conf., 1995.Google Scholar
Cross Ref
- M. Scott and E. Antonsson. Preliminary vehicle structure design: An industrial application of imprecision in engineering design.Google Scholar
- J. Spinti, J. Thornock, E. Eddings, P. Smith, and A. Sarofim. Transport Phenomena in Fires, chapter Heat Transfer to objects in pool fires. Witpress, 2008.Google Scholar
- S. Srinivasan, S. Krishnamoorthy, and P. Sadayappan. A robust scheduling strategy for moldable scheduling of parallel jobs. Cluster, 00:92, 2003.Google Scholar
- S. Srinivasan, V. Subramani, R. Kettimuthu, P. Holenarsipur, and P. Sadayappan. Effective selection of partition sizes for moldable scheduling of parallel jobs. In HiPC, pages 174--183, 2002. Google Scholar
Digital Library
- V. Subramani and R. Kettimuthu. Selective buddy allocation for scheduling parallel jobs on clusters. In Cluster 2002, pages 107--116, 2002. Google Scholar
Digital Library
- D. Thain, T. Tannenbaum, and M. Livny. Distributed computing in practice: The condor experience. CC-PE, 2004. Google Scholar
Digital Library
- B. Wilson, D. Cappelleri, T. W. Simpson, and M. Frecker. Efficient pareto frontier exploration using surrogate approximations. Optimization and Engineering, 2(1):31--50, 2001.Google Scholar
Cross Ref
- S. Yau, K. Damevski, V. Karamcheti, S. Parker, and D. Zorin. Result reuse in design space exploration: A study in system support for interactive parallel computing. In Proc. IPDPS, 2008.Google Scholar
- S. Yau, E. Grinspun, V. Karamcheti, and D. Zorin. Sim-X: Parallel system software for interactive multi-experiment computational studies. In Proc. IPDPS, 2006. Google Scholar
Digital Library
- S. M. Yau, E. Grinspun, V. Karamcheti, and D. Zorin. SimX meets SCIRun: A component-based implementation of a computational study system. In NSFNGS Workshop, IPDPS, pages 1--6, 2007.Google Scholar
Cross Ref
Index Terms
Application-aware management of parallel simulation collections
Recommendations
Application-aware management of parallel simulation collections
PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programmingThis paper presents a system deployed on parallel clusters to manage a collection of parallel simulations that make up a computational study. It explores how such a system can extend traditional parallel job scheduling and resource allocation techniques ...
Communication complexity for parallel divide-and-conquer
SFCS '91: Proceedings of the 32nd annual symposium on Foundations of computer scienceThe relationship between parallel computation cost and communication cost for performing divide-and-conquer (D&C) computations on a parallel system of p processors is studied. The parallel computation cost is the maximal number of the D&C nodes that any ...
IBM System/390 Division: Overview of IBM System/390 Parallel Sysplex - A Commercial Parallel Processing System
IPPS '96: Proceedings of the 10th International Parallel Processing SymposiumScalability has never been more a part of System/390 than with Parallel Sysplex. The Parallel Sysplex environment permits a mainframe or Parallel Enterprise Server to grow from a single system to a configuration of 32 systems (initially), and appear as ...







Comments