ABSTRACT
The emergence of cloud services brings new possibilities for constructing and using HPC platforms. However, while cloud services provide the flexibility and convenience of customized, pay-as-you-go parallel computing, multiple previous studies in the past three years have indicated that cloud-based clusters need a significant performance boost to become a competitive choice, especially for tightly coupled parallel applications.
In this work, we examine the feasibility of running HPC applications in clouds. This study distinguishes itself from existing investigations in several ways: 1) We carry out a comprehensive examination of issues relevant to the HPC community, including performance, cost, user experience, and range of user activities. 2) We compare an Amazon EC2-based platform built upon its newly available HPC-oriented virtual machines with typical local cluster and supercomputer options, using benchmarks and applications with scale and problem size unprecedented in previous cloud HPC studies. 3) We perform detailed performance and scalability analysis to locate the chief limiting factors of the state-of-the-art cloud based clusters. 4) We present a case study on the impact of per-application parallel I/O system configuration uniquely enabled by cloud services. Our results reveal that though the scalability of EC2-based virtual clusters still lags behind traditional HPC alternatives, they are rapidly gaining in overall performance and cost-effectiveness, making them feasible candidates for performing tightly coupled scientific computing. In addition, our detailed benchmarking and profiling discloses and analyzes several problems regarding the performance and performance stability on EC2.
- Y. Abe and G. Gibson. pWalrus: Towards Better Integration of Parallel File Systems into Cloud Storage. In Workshop on Interfaces and Abstractions for Scientific Data Storage, 2010.Google Scholar
Cross Ref
- Amazon Inc. High Performance Computing (HPC). http://aws.amazon.com/ec2/hpc-applications/, 2011.Google Scholar
- A. G. Carlyle, S. L. Harrell, and P. M. Smith. Cost-effective hpc: The community or the cloud? In IEEE International Conference on Cloud Computing Technology and Science, Los Alamitos, CA, USA, 2010. IEEE Computer Society. Google Scholar
Digital Library
- P. Carns, W. Ligon III, R. Ross, and R. Thakur. PVFS: A parallel file system for Linux clusters. In Proceedings of the 4th annual Linux Showcase & Conference-Volume 4, pages 28--28. USENIX Association, 2000. Google Scholar
Digital Library
- J. S. Chase, D. E. Irwin, L. E. Grit, J. D. Moore, and S. E. Sprenkle. Dynamic Virtual Clusters in a Grid Site Manager. In International Symposium on High-Performance Distributed Computing. IEEE Computer Society, 2003. Google Scholar
Digital Library
- D. Chen, J. Xue, X. Yang, H. Zhang, X. Shen, J. Hu, Y. Wang, L. Ji, and J. Chen. New generation of multi-scale NWP system (GRAPES): general scientific design. Chinese Science Bulletin, 53(22):3433--3445, 2008.Google Scholar
- Cluster File Systems, Inc. Lustre: A scalable, high-performance file system. http://www.lustre.org/docs/whitepaper.pdf, 2002.Google Scholar
- A. Darling, L. Carey, and W. Feng. The design, implementation, and evaluation of mpiBLAST. In Proceedings of the ClusterWorld Conference and Expo, in conjunction with the 4th International Conference on Linux Clusters: The HPC Revolution, 2003.Google Scholar
- E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good. The Cost of Doing Science on the Cloud: the Montage Example. In Proceedings of the ACM/IEEE conference on Supercomputing, 2008. Google Scholar
Digital Library
- C. Evangelinos and C. Hill. Cloud Computing for parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere-Ocean Climate Models on Amazon's EC2. In The 1st Workshop on Cloud Computing and its Applications (CCA), 2008.Google Scholar
- R. J. Figueiredo, P. A. Dinda, and J. A. B. Fortes. A Case For Grid Computing On Virtual Machines. In International Conference on Distributed Computing Systems, 2003. Google Scholar
Digital Library
- Q. He, S. Zhou, B. Kobler, D. Duffy, and T. McGlynn. Case Study for Running HPC Applications in Public Clouds. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, New York, NY, USA, 2010. ACM. Google Scholar
Digital Library
- N. Hemsoth. Amazon adds hpc capability to ec2. HPC in the Cloud, July 2010.Google Scholar
- Z. Hill and M. Humphrey. A Quantitative Analysis of High Performance Computing with Amazon's EC2 Infrastructure: The Death of the Local Cluster? In Proceedings of the 10th IEEE/ACM International Conference on Grid Computing, 2009.Google Scholar
Cross Ref
- C. Hoffa, G. Mehta, T. Freeman, E. Deelman, K. Keahey, B. Berriman, and J. Good. On the Use of Cloud Computing for Scientific Workflows. IEEE International Conference on eScience, pages 640--645, 2008. Google Scholar
Digital Library
- W. Huang, J. Liu, B. Abali, and D. K. Panda. A Case for High Performance Computing with Virtual Machines. In Proceedings of the 20th International Conference on Supercomputing, 2006. Google Scholar
Digital Library
- Intel Inc. Intel MPI Benchmarks. http://software.intel.com/en-us/articles/intel-mpi-benchmarks/.Google Scholar
- A. Iosup, S. Ostermann, N. Yigitbasi, R. Prodan, T. Fahringer, and D. Epema. Performance analysis of cloud computing services for many-tasks scientific computing. IEEE Transactions on Parallel and Distributed Systems, 99, 2011. Google Scholar
Digital Library
- K. R. Jackson, L. Ramakrishnan, K. Muriki, S. Canon, S. Cholia, J. Shalf, H. J. Wasserman, and N. J. Wright. Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud. In IEEE Second International Conference on Cloud Computing Technology and Science, 2010. Google Scholar
Digital Library
- G. Juve, E. Deelman, K. Vahi, G. Mehta, B. Berriman, B. P. Berman, and P. Maechling. Data sharing options for scientific workflows on amazon ec2. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pages 1--9, 2010. Google Scholar
Digital Library
- K. Keahey, R. Figueiredo, J. Fortes, T. Freeman, and M. Tsugawa. Science Clouds: Early Experiences in Cloud Computing for Scientific Applications. In The 1st Workshop on Cloud Computing and its Applications (CCA), 2008.Google Scholar
- LANL. Parallel ocean program (pop). http://climate.lanl.gov/Models/POP, April 2011.Google Scholar
- J. Li, M. Humphrey, D. Agarwal, K. Jackson, C. van Ingen, and Y. Ryu. eScience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in the Windows Azure Platform. In IEEE International Symposium on Parallel Distributed Processing, 2010.Google Scholar
Cross Ref
- H. Lin, P. Balaji, R. Poole, C. Sosa, X. Ma, and W. Feng. Massively parallel genomic sequence search on the Blue Gene/P architecture. Austin, TX, Nov. 2008.Google Scholar
- P. Marshall, K. Keahey, and T. Freeman. Elastic Site: Using Clouds to Elastically Extend Site Resources. In Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, 2010. Google Scholar
Digital Library
- J. Napper and P. Bientinesi. Can Cloud Computing Reach the Top500? In Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop, New York, NY, USA, 2009. ACM. Google Scholar
Digital Library
- The NAS Parallel Benchmarks. http://www.nas.nasa.gov/Resources/Software/npb.html.Google Scholar
- National Center for Biotechnology Information. NCBI BLAST. http://www.ncbi.nlm.nih.gov/BLAST/.Google Scholar
- B. Nowicki. NFS: Network File System Protocol Specification. Network Working Group RFC1094, 1989.Google Scholar
- S. Ostermann, A. Iosup, N. Yigitbasi, R. Prodan, T. Fahringer, and D. Epema. A performance analysis of ec2 cloud computing services for scientific computing. In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 2010.Google Scholar
- M. R. Palankar, A. Iamnitchi, M. Ripeanu, and S. Garfinkel. Amazon S3 for Science Grids: A Viable Solution? In Proceedings of the International Workshop on Data-Aware Distributed Computing. ACM, 2008. Google Scholar
Digital Library
- F. Schmuck and R. Haskin. GPFS: a shared-disk file system for large computing clusters. In Proceedings of the First Conference on File and Storage Technologies, 2002. Google Scholar
Digital Library
- H. Shan, K. Antypas, and J. Shalf. Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, page 42. IEEE Press, 2008. Google Scholar
Digital Library
- T. Sterling and D. Stark. A High-Performance Computing Forecast: Partly Cloudy. Computing in Science and Engineering, 11, 2009. Google Scholar
Digital Library
- Top500 supercomputer sites. http://www.top500.org/.Google Scholar
- T. University. Technique report r2011.4.10. http://www.hpctest.org.cn/resources/cloud.pdf.Google Scholar
- C. Vecchiola, S. Pandey, and R. Buyya. High-performance cloud computing: A view of scientific applications. In International Symposium on Parallel Architectures, Algorithms, and Networks. IEEE Computer Society, 2009. Google Scholar
Digital Library
- E. Walker. Benchmarking Amazon EC2 for High-Performance Scientific Computing. Login, 33(5), 2008.Google Scholar
- H. Wang, Q. Jing, R. Chen, B. He, Z. Qian, and L. Zhou. Distributed Systems Meet Economics: Pricing in the Cloud. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, HotCloud'10. USENIX Association, 2010. Google Scholar
Digital Library
- L. Youseff, R. Wolski, B. Gorda, and C. Krintz. Evaluating the Performance Impact of Xen on MPI and Process Execution For HPC Systems. In Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing, 2006. Google Scholar
Digital Library
- W. Yu and J. S. Vetter. Xen-Based HPC: A Parallel I/O Perspective. In IEEE International Symposium on Cluster Computing and the Grid. IEEE Computer Society, 2008. Google Scholar
Digital Library
Index Terms
(auto-classified)Cloud versus in-house cluster: evaluating Amazon cluster compute instances for running MPI applications
Recommendations
Cloud service engineering
ICSE '10: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2Building on compute and storage virtualization, Cloud Computing provides scalable, network-centric, abstracted IT infrastructure, platforms, and applications as on-demand services that are billed by consumption. Cloud Service Engineering is the ...
Cost-benefit analysis of Cloud Computing versus desktop grids
IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed ProcessingCloud Computing has taken commercial computing by storm. However, adoption of cloud computing platforms and services by the scientific community is in its infancy as the performance and monetary cost-benefits for scientific applications are not ...
Cloud Storage as the Infrastructure of Cloud Computing
ICICCI '10: Proceedings of the 2010 International Conference on Intelligent Computing and Cognitive InformaticsAs an emerging technology and business paradigm, Cloud Computing has taken commercial computing by storm. Cloud computing platforms provide easy access to a company’s high-performance computing and storage infrastructure through web services. With cloud ...





Comments