skip to main content
10.1145/2063348.2063363acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Cloud versus in-house cluster: evaluating Amazon cluster compute instances for running MPI applications

Published:12 November 2011Publication History

ABSTRACT

The emergence of cloud services brings new possibilities for constructing and using HPC platforms. However, while cloud services provide the flexibility and convenience of customized, pay-as-you-go parallel computing, multiple previous studies in the past three years have indicated that cloud-based clusters need a significant performance boost to become a competitive choice, especially for tightly coupled parallel applications.

In this work, we examine the feasibility of running HPC applications in clouds. This study distinguishes itself from existing investigations in several ways: 1) We carry out a comprehensive examination of issues relevant to the HPC community, including performance, cost, user experience, and range of user activities. 2) We compare an Amazon EC2-based platform built upon its newly available HPC-oriented virtual machines with typical local cluster and supercomputer options, using benchmarks and applications with scale and problem size unprecedented in previous cloud HPC studies. 3) We perform detailed performance and scalability analysis to locate the chief limiting factors of the state-of-the-art cloud based clusters. 4) We present a case study on the impact of per-application parallel I/O system configuration uniquely enabled by cloud services. Our results reveal that though the scalability of EC2-based virtual clusters still lags behind traditional HPC alternatives, they are rapidly gaining in overall performance and cost-effectiveness, making them feasible candidates for performing tightly coupled scientific computing. In addition, our detailed benchmarking and profiling discloses and analyzes several problems regarding the performance and performance stability on EC2.

References

  1. Y. Abe and G. Gibson. pWalrus: Towards Better Integration of Parallel File Systems into Cloud Storage. In Workshop on Interfaces and Abstractions for Scientific Data Storage, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  2. Amazon Inc. High Performance Computing (HPC). http://aws.amazon.com/ec2/hpc-applications/, 2011.Google ScholarGoogle Scholar
  3. A. G. Carlyle, S. L. Harrell, and P. M. Smith. Cost-effective hpc: The community or the cloud? In IEEE International Conference on Cloud Computing Technology and Science, Los Alamitos, CA, USA, 2010. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Carns, W. Ligon III, R. Ross, and R. Thakur. PVFS: A parallel file system for Linux clusters. In Proceedings of the 4th annual Linux Showcase & Conference-Volume 4, pages 28--28. USENIX Association, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. S. Chase, D. E. Irwin, L. E. Grit, J. D. Moore, and S. E. Sprenkle. Dynamic Virtual Clusters in a Grid Site Manager. In International Symposium on High-Performance Distributed Computing. IEEE Computer Society, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Chen, J. Xue, X. Yang, H. Zhang, X. Shen, J. Hu, Y. Wang, L. Ji, and J. Chen. New generation of multi-scale NWP system (GRAPES): general scientific design. Chinese Science Bulletin, 53(22):3433--3445, 2008.Google ScholarGoogle Scholar
  7. Cluster File Systems, Inc. Lustre: A scalable, high-performance file system. http://www.lustre.org/docs/whitepaper.pdf, 2002.Google ScholarGoogle Scholar
  8. A. Darling, L. Carey, and W. Feng. The design, implementation, and evaluation of mpiBLAST. In Proceedings of the ClusterWorld Conference and Expo, in conjunction with the 4th International Conference on Linux Clusters: The HPC Revolution, 2003.Google ScholarGoogle Scholar
  9. E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good. The Cost of Doing Science on the Cloud: the Montage Example. In Proceedings of the ACM/IEEE conference on Supercomputing, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Evangelinos and C. Hill. Cloud Computing for parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere-Ocean Climate Models on Amazon's EC2. In The 1st Workshop on Cloud Computing and its Applications (CCA), 2008.Google ScholarGoogle Scholar
  11. R. J. Figueiredo, P. A. Dinda, and J. A. B. Fortes. A Case For Grid Computing On Virtual Machines. In International Conference on Distributed Computing Systems, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Q. He, S. Zhou, B. Kobler, D. Duffy, and T. McGlynn. Case Study for Running HPC Applications in Public Clouds. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Hemsoth. Amazon adds hpc capability to ec2. HPC in the Cloud, July 2010.Google ScholarGoogle Scholar
  14. Z. Hill and M. Humphrey. A Quantitative Analysis of High Performance Computing with Amazon's EC2 Infrastructure: The Death of the Local Cluster? In Proceedings of the 10th IEEE/ACM International Conference on Grid Computing, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  15. C. Hoffa, G. Mehta, T. Freeman, E. Deelman, K. Keahey, B. Berriman, and J. Good. On the Use of Cloud Computing for Scientific Workflows. IEEE International Conference on eScience, pages 640--645, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. Huang, J. Liu, B. Abali, and D. K. Panda. A Case for High Performance Computing with Virtual Machines. In Proceedings of the 20th International Conference on Supercomputing, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Intel Inc. Intel MPI Benchmarks. http://software.intel.com/en-us/articles/intel-mpi-benchmarks/.Google ScholarGoogle Scholar
  18. A. Iosup, S. Ostermann, N. Yigitbasi, R. Prodan, T. Fahringer, and D. Epema. Performance analysis of cloud computing services for many-tasks scientific computing. IEEE Transactions on Parallel and Distributed Systems, 99, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K. R. Jackson, L. Ramakrishnan, K. Muriki, S. Canon, S. Cholia, J. Shalf, H. J. Wasserman, and N. J. Wright. Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud. In IEEE Second International Conference on Cloud Computing Technology and Science, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Juve, E. Deelman, K. Vahi, G. Mehta, B. Berriman, B. P. Berman, and P. Maechling. Data sharing options for scientific workflows on amazon ec2. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pages 1--9, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. Keahey, R. Figueiredo, J. Fortes, T. Freeman, and M. Tsugawa. Science Clouds: Early Experiences in Cloud Computing for Scientific Applications. In The 1st Workshop on Cloud Computing and its Applications (CCA), 2008.Google ScholarGoogle Scholar
  22. LANL. Parallel ocean program (pop). http://climate.lanl.gov/Models/POP, April 2011.Google ScholarGoogle Scholar
  23. J. Li, M. Humphrey, D. Agarwal, K. Jackson, C. van Ingen, and Y. Ryu. eScience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in the Windows Azure Platform. In IEEE International Symposium on Parallel Distributed Processing, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  24. H. Lin, P. Balaji, R. Poole, C. Sosa, X. Ma, and W. Feng. Massively parallel genomic sequence search on the Blue Gene/P architecture. Austin, TX, Nov. 2008.Google ScholarGoogle Scholar
  25. P. Marshall, K. Keahey, and T. Freeman. Elastic Site: Using Clouds to Elastically Extend Site Resources. In Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Napper and P. Bientinesi. Can Cloud Computing Reach the Top500? In Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. The NAS Parallel Benchmarks. http://www.nas.nasa.gov/Resources/Software/npb.html.Google ScholarGoogle Scholar
  28. National Center for Biotechnology Information. NCBI BLAST. http://www.ncbi.nlm.nih.gov/BLAST/.Google ScholarGoogle Scholar
  29. B. Nowicki. NFS: Network File System Protocol Specification. Network Working Group RFC1094, 1989.Google ScholarGoogle Scholar
  30. S. Ostermann, A. Iosup, N. Yigitbasi, R. Prodan, T. Fahringer, and D. Epema. A performance analysis of ec2 cloud computing services for scientific computing. In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 2010.Google ScholarGoogle Scholar
  31. M. R. Palankar, A. Iamnitchi, M. Ripeanu, and S. Garfinkel. Amazon S3 for Science Grids: A Viable Solution? In Proceedings of the International Workshop on Data-Aware Distributed Computing. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. F. Schmuck and R. Haskin. GPFS: a shared-disk file system for large computing clusters. In Proceedings of the First Conference on File and Storage Technologies, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. Shan, K. Antypas, and J. Shalf. Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, page 42. IEEE Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. T. Sterling and D. Stark. A High-Performance Computing Forecast: Partly Cloudy. Computing in Science and Engineering, 11, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Top500 supercomputer sites. http://www.top500.org/.Google ScholarGoogle Scholar
  36. T. University. Technique report r2011.4.10. http://www.hpctest.org.cn/resources/cloud.pdf.Google ScholarGoogle Scholar
  37. C. Vecchiola, S. Pandey, and R. Buyya. High-performance cloud computing: A view of scientific applications. In International Symposium on Parallel Architectures, Algorithms, and Networks. IEEE Computer Society, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. E. Walker. Benchmarking Amazon EC2 for High-Performance Scientific Computing. Login, 33(5), 2008.Google ScholarGoogle Scholar
  39. H. Wang, Q. Jing, R. Chen, B. He, Z. Qian, and L. Zhou. Distributed Systems Meet Economics: Pricing in the Cloud. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, HotCloud'10. USENIX Association, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. L. Youseff, R. Wolski, B. Gorda, and C. Krintz. Evaluating the Performance Impact of Xen on MPI and Process Execution For HPC Systems. In Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. W. Yu and J. S. Vetter. Xen-Based HPC: A Parallel I/O Perspective. In IEEE International Symposium on Cluster Computing and the Grid. IEEE Computer Society, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

(auto-classified)
  1. Cloud versus in-house cluster: evaluating Amazon cluster compute instances for running MPI applications

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!