skip to main content
research-article

Minimal-overhead virtualization of a large scale supercomputer

Published:09 March 2011Publication History
Skip Abstract Section

Abstract

Virtualization has the potential to dramatically increase the usability and reliability of high performance computing (HPC) systems. However, this potential will remain unrealized unless overheads can be minimized. This is particularly challenging on large scale machines that run carefully crafted HPC OSes supporting tightly-coupled, parallel applications. In this paper, we show how careful use of hardware and VMM features enables the virtualization of a large-scale HPC system, specifically a Cray XT4 machine, with < = 5% overhead on key HPC applications, microbenchmarks, and guests at scales of up to 4096 nodes. We describe three techniques essential for achieving such low overhead: passthrough I/O, workload-sensitive selection of paging mechanisms, and carefully controlled preemption. These techniques are forms of symbiotic virtualization, an approach on which we elaborate.

References

  1. K. Adams and O. Agesen. A comparison of software and hardware techniques for x86 virtualization. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. R. Alam, J. A. Kuehn, R. F. Barrett, J. M. Larkin, M. R. Fahey, R. Sankaran, and P. H. Worley. Cray XT4: an early evaluation for petascale scientific simulation. In SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing, pages 1--12, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-764-3. http://doi.acm.org/10.1145/1362622.1362675. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. PACIFICAAMD Corporation. AMD64 virtualization codenamed "Pacifica" technology: Secure Virtual Machine Architecture reference manual, May 2005.Google ScholarGoogle Scholar
  4. J. Appavoo, V. Uhlig, and A. Waterland. Project kittyhawk: building a global-scale computer: Blue gene/p as a generic computing platform. SIGOPS Oper. Syst. Rev., 42: 77--84, January 2008. ISSN 0163-5980. http://doi.acm.org/10.1145/1341312.1341326. URL http://doi.acm.org/10.1145/1341312.1341326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Bae, J. Lange, and P. Dinda. Comparing approaches to virtualized page translation in modern VMMs. Technical Report NWU-EECS-10-07, Department of Electrical Engineering and Computer Science, Northwestern University, April 2010.Google ScholarGoogle Scholar
  6. R. Bhargava, B. Serebrin, F. Spanini, and S. Manne. Accelerating two-dimensional page walks for virtualized systems. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Brightwell, T. Hudson, K. T. Pedretti, and K. D. Underwood. SeaStar Interconnect: Balanced bandwidth for scalable performance. IEEE Micro, 26 (3): 41--57, May/June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. E.S. Hertel, R. Bell, M. Elrick, A. Farnsworth, G. Kerley, J. McGlaun, S. Petney, S. Silling, P. Taylor, and L. Yarrington. CTH: A Software Family for Multi-Dimensional Shock Physics Analysis. In 19th International Symposium on Shock Waves, held at Marseille, France, pages 377--382, July 1993.Google ScholarGoogle Scholar
  9. K. B. Ferreira, R. Brightwell, and P. G. Bridges. Characterizing application sensitivity to OS interference using kernel-level noise injection. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, November 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Figueiredo, P. A. Dinda, and J. Fortes. A case for grid computing on virtual machines. In 23rd IEEE Conference on Distributed Computing (ICDCS 2003, pages 550--559, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Gavrilovska, S. Kumar, H. Raj, K. Schwan, V. Gupta, R. Nathuji, R. Niranjan, A. Ranadive, and P. Saraiya. High performance hypervisor architectures: Virtualization in HPC systems. In 1st Workshop on System-level Virtualization for High Performance Computing (HPCVirt), 2007.Google ScholarGoogle Scholar
  12. M. Heroux. HPCCG MicroApp. https://software.sandia.gov/mantevo/downloads/HPCCG-0.5.tar.gz, July 2007.Google ScholarGoogle Scholar
  13. W. Huang, J. Liu, B. Abali, and D. K. Panda. A case for high performance computing with virtual machines. In 20th Annual International Conference on Supercomputing (ICS), pages 125--134, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Intel Corporation. Intel virtualization technology specification for the IA-32 Intel architecture, April 2005.Google ScholarGoogle Scholar
  15. Intel GmbH. Intel MPI benchmarks: Users guide and methodology description, 2004.Google ScholarGoogle Scholar
  16. L. Kaplan. Cray CNL. In FastOS PI Meeting and Workshop, June 2007. URL http://www.cs.unm.edu/fastos/07meeting/CNL_FASTOS.pdf.Google ScholarGoogle Scholar
  17. S. Kelly and R. Brightwell. Software architecture of the lightweight kernel, Catamount. In 2005 Cray Users' Group Annual Technical Conference. Cray Users' Group, May 2005.Google ScholarGoogle Scholar
  18. D. Kerbyson, H. Alme, A. Hoisie, F. Petrini, H. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In Proceedings of ACM/IEEE Supercomputing, November 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Lange and P. Dinda. SymCall: Symbiotic virtualization through VMM-to-guest upcalls. In Proceedings of the 2011 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE 2011), Newport Beach, CA, March 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Lange, K. Pedretti, T. Hudson, P. Dinda, Z. Cui, L. Xia, P. Bridges, A. Gocke, S. Jaconette, M. Levenhagen, and R. Brightwell. Palacios and kitten: New high performance operating systems for scalable virtualized and native supercomputing. In Proceedings of the 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010), April 2010.Google ScholarGoogle ScholarCross RefCross Ref
  21. J. Liu, W. Huang, B. Abali, and D. Panda. High Performance VMM-Bypass I/O in Virtual Machines. In Proceedings of the USENIX Annual Technical Conference, May 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Luszczek, J. Dongarra, and J. Kepner. Design and implementation of the HPCC benchmark suite. CT Watch Quarterly, 2 (4A), Nov. 2006.Google ScholarGoogle Scholar
  23. M. F. Mergen, V. Uhlig, O. Krieger, and J. Xenidis. Virtualization for high-performance computing. Operating Systems Review, 40 (2): 8--11, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Nishimura, N. Maruyama, and S. Matsuoka. Virtual clusters on the fly - fast, scalable, and flexible installation. In 7th IEEE International Symposium on Cluster Computing and the Grid (CCGRID), pages 549--556, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. F. Petrini, D. Kerbyson, and S. Pakin. The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of ASCI Q. In Proceedings of SC'03, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H. Raj and K. Schwan. High performance and scalable I/O virtualization via self-virtualized devices. In 16th IEEE International Symposium on High Performance Distributed Computing, July 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Song, R. Ge, X. Feng, and K. W. Cameron. Energy profiling and analysis of the HPC Challenge benchmarks. International Journal of High Performance Computing Applications, Vol. 23, No. 3: 265--276, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Top500. Top 500 Supercomputing Sites. URL http://www.top500.org/.Google ScholarGoogle Scholar
  29. D. Williams, P. Reynolds, K. Walsh, E. G. Sirer, and F. B. Schneider. Device driver safety through a reference validation mechanism. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI'08), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Minimal-overhead virtualization of a large scale supercomputer

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 46, Issue 7
        VEE '11
        July 2011
        231 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2007477
        Issue’s Table of Contents
        • cover image ACM Conferences
          VEE '11: Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
          March 2011
          250 pages
          ISBN:9781450306874
          DOI:10.1145/1952682

        Copyright © 2011 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 March 2011

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!