skip to main content
research-article
Public Access

Enabling Hybrid Parallel Runtimes Through Kernel and Virtualization Support

Published:25 March 2016Publication History
Skip Abstract Section

Abstract

In our hybrid runtime (HRT) model, a parallel runtime system and the application are together transformed into a specialized OS kernel that operates entirely in kernel mode and can thus implement exactly its desired abstractions on top of fully privileged hardware access. We describe the design and implementation of two new tools that support the HRT model. The first, the Nautilus Aerokernel, is a kernel framework specifically designed to enable HRTs for x64 and Xeon Phi hardware. Aerokernel primitives are specialized for HRT creation and thus can operate much faster, up to two orders of magnitude faster, than related primitives in Linux. Aerokernel primitives also exhibit much lower variance in their performance, an important consideration for some forms of parallelism. We have realized several prototype HRTs, including one based on the Legion runtime, and we provide application macrobenchmark numbers for our Legion HRT. The second tool, the hybrid virtual machine (HVM), is an extension to the Palacios virtual machine monitor that allows a single virtual machine to simultaneously support a traditional OS and software stack alongside an HRT with specialized hardware access. The HRT can be booted in a time comparable to a Linux user process startup, and functions in the HRT, which operate over the user process's memory, can be invoked by the process with latencies not much higher than those of a function call.

References

  1. Ammons, G., Appavoo, J., Butrico, M., Da Silva, D., Grove, D., Kawachiya, K., Krieger, O., Rosenburg, B., Hensbergen, E. V., and Wisniewski, R. W. Libra: A library operating system for a jvm in a virtualized execution environment. In Proceedings of the 3rd International Conference on Virtual Execution Environments (VEE 2007) (June 2007), pp. 44--54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Anderson, T. E., Bershad, B. N., Lazowska, E. D., and Levy, H. M. Scheduler activations: Effective kernel support for the user-level management of parallelism. In Proceedings of the $13^th$ ACM Symposium on Operating Systems Principles (SOSP 1991) (Oct. 1991), pp. 95--109.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bae, C., Lange, J., and Dinda, P. Enhancing virtualized application performance through dynamic adaptive paging mode selection. In Proceedings of the 8th International Conference on Autonomic Computing (ICAC 2011) (June 2011).Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bauer, M., Treichler, S., Slaughter, E., and Aiken, A. Legion: Expressing locality and independence with logical regions. In Proceedings of Supercomputing (SC 2012) (Nov. 2012).Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Baumann, A., Barham, P., Dagand, P. E., Harris, T., Isaacs, R., Peter, S., Roscoe, T., Schüpbach, A., and Singhania, A. The Multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the $22^nd$ ACM Symposium on Operating Systems Principles (SOSP 2009) (Oct. 2009), pp. 29--44.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Baumann, A., Lee, D., Fonseca, P., Glendenning, L., Lorch, J. R., Bond, B., Olinsky, R., and Hunt, G. C. Composing OS extensions safely and efficiently with Bascule. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys 2013) (Apr. 2013), pp. 239--252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Belay, A., Bittau, A., Mashtizadeh, A., Terei, D., Mazières, D., and Kozyrakis, C. Dune: Safe user-level access to privileged CPU features. In Proceedings of the $10^th$ USENIX Conference on Operating Systems Design and Implementation (OSDI 2012) (Oct. 2012), pp. 335--348.Google ScholarGoogle Scholar
  8. Bergstrom, L., Fluet, M., Rainey, M., Reppy, J., Rosen, S., and Shaw, A. Data-only flattening for nested data parallelism. In Proceedings of the $18^th$ ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2013) (Feb. 2013), pp. 81--92.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bergstrom, L., and Reppy, J. Nested data-parallelism on the GPU. In Proceedings of the $17^th$ ACM SIGPLAN International Conference on Functional Programming (ICFP 2012) (Sept. 2012), pp. 247--258.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bershad, B. N., Savage, S., Pardyak, P., Sirer, E. G., Fiuczynski, M. E., Becker, D., Chambers, C., and Eggers, S. Extensibility, safety and performance in the SPIN operating system. In Proceedings of the $15^th$ ACM Symposium on Operating Systems Principles (SOSP 1995) (Dec. 1995), pp. 267--283.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Black, D. L., Golub, D. B., Julin, D. P., Rashid, R. F., Draves, R. P., Dean, R. W., Forin, A., Barrera, J., Tokuda, H., Malan, G., and Bohman, D. Microkernel operating system architecture and Mach. In Proceedings of the USENIX Workshop on Micro-Kernels and Other Kernel Architectures (Apr. 1992), pp. 11--30.Google ScholarGoogle Scholar
  12. Blelloch, G. E., Chatterjee, S., Hardwick, J., Sipelstein, J., and Zagha, M. Implementation of a portable nested data-parallel language. Journal of Parallel and Distributed Computing 21, 1 (Apr. 1994), 4--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Blelloch, G. E., and Greiner, J. A provable time and space efficient implementation of NESL. In Proceedings of the 1st ACM SIGPLAN International Conference on Functional Programming (ICFP 1996) (May 1996), pp. 213--225.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Bomberger, A. C., Frantz, W. S., Hardy, A. C., Hardy, N., Landau, C. R., and Shapiro, J. S. The KeyKOS nanokernel architecture. In Proceedings of the USENIX Workshop on Micro-kernels and Other Kernel Architectures (Apr. 1992), pp. 95--112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Boyd-Wickizer, S., Chen, H., Chen, R., Mao, Y., Kaashoek, F., Morris, R., Pesterev, A., Stein, L., Wu, M., Dai, Y., Zhang, Y., and Zhang, Z. Corey: An operating system for many cores. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI 2008) (Dec. 2008), pp. 43--57.Google ScholarGoogle Scholar
  16. Boyd-Wickizer, S., Clements, A. T., Mao, Y., Pesterev, A., Kaashoek, M. F., Morris, R., and Zeldovich, N. An analysis of Linux scalability to many cores. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2010) (Oct. 2010).Google ScholarGoogle Scholar
  17. Cadamb, S., Coviello, G., Li, C.-H., Phull, R., Rao, K., Sankaradass, M., and Chakradhar, S. COSMIC: Middleware for high performance and reliable multiprocessing on xeon phi coprocessors. In Proceedings of the $22^nd$ ACM Symposium on High-performance Parallel and Distributed Computing (HPDC 2013) (June 2013), pp. 215--226.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Candea, G., Kawamoto, S., Fujiki, Y., Friedman, G., and Fox, A. Microreboot: A technique for cheap recovery. In Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2004) (Dec. 2004), pp. 31--44.Google ScholarGoogle Scholar
  19. Carlson, W., Draper, J., Culler, D., Yelick, K., Brooks, E., and Warren, K. Introduction to upc and language specification. Tech. Rep. CCS-TR-99--157, IDA Center for Computing Sciences, May 1999.Google ScholarGoogle Scholar
  20. Chakravarty, M., Keller, G., Leshchinskiy, R., and Pfannenstiel, W. Nepal--nested data-parallelism in haskell. In Proceedings of the 7th International Euro-Par Conference (EUROPAR 2001) (Aug. 2001).Google ScholarGoogle ScholarCross RefCross Ref
  21. Chakravarty, M., Leshchinskiy, R., Jones, S. P., Keller, G., and Marlow, S. Data parallel haskell: A status report. In Proceedings of the Workshop on Declarative Aspects of Multicore Programming (Jan. 2007).Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Chamberlain, B., Callahan, D., and Zima, H. Parallel programmability and the chapel langauge. International Journal of High Performance Computing Applications 21, 3 (Aug. 2007), 291--312.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Charles, P., Donawa, C., Ebicioglu, K., Grothoff, C., Kielstra, A., von Praun, C., Saraswat, V., and Sarkar, V. X10: An object-oriented approach to non-uniform cluster computing. In Proceedings of the $20^th$ ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA 2005) (Oct. 2005), pp. 519--538.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Chase, J. S., Levy, H. M., Levy, H. M., Feeley, M. J., Feeley, M. J., Lazowska, E. D., and Lazowska, E. D. Sharing and protection in a single address space operating system. ACM Transactions on Computer Systems 12, 4 (Nov. 1994), 271--307.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Cheriton, D. R., and Duda, K. J. A caching model of operating system kernel functionality. In Proceedings of the 1st USENIX Symposium on Operating Systems Design and Implementation (OSDI 2004) (Nov. 1994).Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Das, R., Uysal, M., Saltz, J., and Hwang, Y.-S. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing 22, 3 (September 1994), 462--478.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Dongarra, J., and Heroux, M. A. Toward a new metric for ranking high performance computing systems. Tech. Rep. SAND2013--4744, Sandia National Laboratories, June 2013.Google ScholarGoogle Scholar
  28. Dotsenko, Y., Coarfa, C., and Mellor-Crummey, J. A multi-platform co-array fortran compiler. In Proceedings of the $13^th$ International Conference on Parallel Architectures and Compilation Techniques (PACT 2004) (Sept. 2004), pp. 29--40.Google ScholarGoogle ScholarCross RefCross Ref
  29. Engler, D. R., and Kaashoek, M. F. Exterminate all operating system abstractions. In Proceedings of the 5th Workshop on Hot Topics in Operating Systems (HotOS 1995) (May 1995), pp. 78--83.Google ScholarGoogle ScholarCross RefCross Ref
  30. Engler, D. R., Kaashoek, M. F., and O'Toole, Jr., J. Exokernel: An operating system architecture for application-level resource management. In Proceedings of the $15^th$ ACM Symposium on Operating Systems Principles (SOSP 1995) (Dec. 1995), pp. 251--266.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ferreira, K. B., Bridges, P., and Brightwell, R. Characterizing application sensitivity to OS interference using kernel-level noise injection. In Proceedings of Supercomputing (SC 2008) (Nov. 2008).Google ScholarGoogle ScholarCross RefCross Ref
  32. Ferreira, K. B., Bridges, P. G., Brightwell, R., and Pedretti, K. T. Impact of system design parameters on application noise sensitivity. Journal of Cluster Computing 16, 1 (Mar. 2013).Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Fluet, M., Rainey, M., Reppy, J., and Shaw, A. Implicitly threaded parallelism in manticore. In Proceedings of the $13^th$ ACM SIGPLAN International Conference on Functional Programming (ICFP 2008) (Sept. 2008), pp. 119--130.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Fluet, M., Rainey, M., Reppy, J., Shaw, A., and Xiao, Y. Manticore: A heterogeneous parallel language. In Proceedings of the Workshop on Declarative Aspects of Multicore Programming (DAMP 2007) (Jan. 2007), pp. 37--44.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Goerbessiotis, A. V., and Valiant, L. G. Direct bulk-synchronous parallel algorithms. Journal of Parallel and Distributed Computing 22, 2 (1994), 251--267.Google ScholarGoogle Scholar
  36. Hale, K. C., and Dinda, P. A. Guarded modules: Adaptively extending the VMM's privilege into the guest. In Proceedings of the $11^th$ International Conference on Autonomic Computing (ICAC 2014) (June 2014), pp. 85--96.Google ScholarGoogle Scholar
  37. Hale, K. C., and Dinda, P. A. A case for transforming parallel runtimes into operating system kernels. In Proceedings of the $24^th$ International Symposium on High-performance Parallel and Distributed Computing (HPDC 2015) (June 2015), pp. 27--32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Heroux, M. A., Dongarra, J., and Luszczek, P. HPCG technical specification. Tech. Rep. SAND2013--8752, Sandia National Laboratories, October 2013.Google ScholarGoogle Scholar
  39. High Performance Fortran Forum. High Performance Fortran language specification, version 2.0. Tech. rep., Center for Research on Parallel Computation, Rice University, January 1996.Google ScholarGoogle Scholar
  40. Hoefler, T., Schneider, T., and Lumsdaine, A. Characterizing the influence of system noise on large-scale applications by simulation. In Proceedings of Supercomputing (SC 2010) (Nov. 2010).Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Hofmeyr, S., Colmenares, J. A., Iancu, C., and Kubiatowicz, J. Juggle: Proactive load balancing on multicore computers. In Proceedings of the $20^th$ ACM Symposium on High-performance Parallel and Distributed Computing (HPDC 2011) (June 2011), pp. 3--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Hunt, G. C., and Larus, J. R. Singularity: Rethinking the software stack. SIGOPS Operating Systems Review 41, 2 (Apr. 2007), 37--49.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Kaiser, H., Brodowicz, M., and Sterling, T. ParalleX: An advanced parallel execution model for scaling-impaired applications. In Proceedings of the $38^th$ International Conference on Parallel Processing Workshops (ICPPW 2009) (Sept. 2009), pp. 394--401.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Kalé, L. V., Ramkumar, B., Sinha, A., and Gursoy, A. The Charm parallel programming language and system: Part II--the runtime system. Tech. Rep. 95-03, Parallel Programming Laboratory, University of Illinois at Urbana-Champaign, 1994.Google ScholarGoogle Scholar
  45. Kivity, A., Laor, D., Costa, G., Enberg, P., Har\textquoterightEl, N., Marti, D., and Zolotarov, V. OSv\textemdashoptimizing the operating system for virtual machines. In Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC 2014) (June 2014).Google ScholarGoogle Scholar
  46. Krieder, S., Wozniak, J., Armstrong, T., Wilde, M., Katz, D., Grimmer, B., Foster, I., and Raicu, I. Design and evaluation of the GeMTC framework for gpu-enabled many-task computing. In Proceedings of the $23^rd$ ACM Symposium on High-performance Parallel and Distributed Computing (HPDC 2014) (June 2014), pp. 153--164.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Krieger, O., Auslander, M., Rosenburg, B., Wisniewski, R. W., Xenidis, J., Da Silva, D., Ostrowski, M., Appavoo, J., Butrico, M., Mergen, M., Waterland, A., and Uhlig, V. K42: Building a complete operating system. In Proceedings of the 1st ACM European Conference on Computer Systems (EuroSys 2006) (Apr. 2006), pp. 133--145.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Lange, J., Pedretti, K., Hudson, T., Dinda, P., Cui, Z., Xia, L., Bridges, P., Gocke, A., Jaconette, S., Levenhagen, M., and Brightwell, R. Palacios and kitten: New high performance operating systems for scalable virtualized and native supercomputing. In Proceedings of the $24^th$ IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010) (Apr. 2010).Google ScholarGoogle ScholarCross RefCross Ref
  49. Lauderdale, C., and Khan, R. Towards a codelet-based runtime for exascale computing. In Proceedings of the 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era (EXADAPT 2012) (Mar. 2012), pp. 21--26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Lee, S., and Vetter, J. OpenARC: Open accelerator research compiler for directive-based, efficient heterogeneous computing. In Proceedings of the $23^rd$ ACM Symposium on High-performance Parallel and Distributed Computing (HPDC 2014) (June 2014), pp. 115--120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Liedtke, J. On micro-kernel construction. In Proceedings of the $15^th$ ACM Symposium on Operating Systems Principles (SOSP 1995) (Dec. 1995), pp. 237--250.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Liu, R., Klues, K., Bird, S., Hofmeyr, S., Asanović, K., and Kubiatowicz, J. Tessellation: Space-time partitioning in a manycore client OS. In Proceedings of the 1st USENIX Conference on Hot Topics in Parallelism (HotPar 2009) (Mar. 2009).Google ScholarGoogle Scholar
  53. Madhavapeddy, A., Mortier, R., Rotsos, C., Scott, D., Singh, B., Gazagnaire, T., Smith, S., Hand, S., and Crowcroft, J. Unikernels: Library operating systems for the cloud. In Proceedings of the $18^th$ International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2013) (Mar. 2013), pp. 461--472.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Menage, P. B. Adding generic process containers to the Linux kernel. In Proceedings of the Linux Symposium (June 2007), pp. 45--58.Google ScholarGoogle Scholar
  55. Montz, A. B., Mosberger, D., O'Malley, S. W., Peterson, L. L., and Proebsting, T. A. Scout: A communications-oriented operating system. In Proceedings of the 5th Workshop on Hot Topics in Operating Systems (HotOS 1995) (May 1995), pp. 58--61.Google ScholarGoogle ScholarCross RefCross Ref
  56. NVIDIA Corporation. Dynamic parallelism in CUDA, Dec. 2012.Google ScholarGoogle Scholar
  57. Oayang, J., Kocoloski, B., Lange, J., and Pedretti, K. Achieving performance isolation with lightweight co-kernels. In Proceedings of the $24^th$ International ACM Symposium on High Performance Parallel and Distributed Computing (HPDC 2015) (June 2015), pp. 149--160.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Okuji, Y. K., Ford, B., Boleyn, E. S., and Ishiguro, K. The multiboot specification--version 1.6. Tech. rep., Free Software Foundation, Inc., 2010.Google ScholarGoogle Scholar
  59. Peter, S., and Anderson, T. Arrakis: A case for the end of the empire. In Proceedings of the $14^th$ Workshop on Hot Topics in Operating Systems (HotOS 2013) (May 2013).Google ScholarGoogle Scholar
  60. Porter, D. E., Boyd-Wickizer, S., Howell, J., Olinsky, R., and Hunt, G. C. Rethinking the library OS from the top down. In Proceedings of the $16^th$ International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2011) (Mar. 2011), pp. 291--304.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Roscoe, T. Linkage in the Nemesis single address space operating system. ACM SIGOPS Operating Systems Review 28, 4 (Oct. 1994), 48--55.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Rossbach, C. J., Currey, J., Silberstein, M., Ray, B., and Witchel, E. Ptask: Operating system abstractions to manage gpus as compute devices. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP 2011) (2011).Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Swaine, J., Fetscher, B., St-Amour, V., Findler, R. B., and Flatt, M. Seeing the futures: Profiling shared-memory parallel Racket. In Proceedings of the 1st ACM SIGPLAN Workshop on Functional High-performance Computing (FHPC 2012) (Sept. 2012).Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Swaine, J., Tew, K., Dinda, P., Findler, R., and Flatt, M. Back to the futures: Incremental parallelization of existing sequential runtime systems. In Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2010) (October 2010).Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Tew, K., Swaine, J., Flatt, M., Findler, R., and Dinda, P. Places: Adding message passing parallelism to racket. In Proceedings of the 7th Dynamic Languages Symposium (DLS 2011) (Oct. 2011), pp. 85--96.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Treichler, S., Bauer, M., and Aiken, A. Language support for dynamic, hierarchical data partitioning. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA 2013) (Oct. 2013), pp. 495--514.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Wheeler, K. B., Murphy, R. C., and Thain, D. Qthreads: An API for programming with millions of lightweight threads. In Proceedings of the $22^nd$ International Symposium on Parallel and Distributed Processing (IPDPS 2008) (Apr. 2008).Google ScholarGoogle ScholarCross RefCross Ref
  68. Wisniewski, R. W., Inglett, T., Keppel, P., Murty, R., and Riesen, R. mOS: An architecture for extreme-scale operating systems. In Proceedings of the 4th International Workshop on Runtime and Operating Systems for Supercomputers (ROSS 2014) (June 2014).Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Yaghmour, K. Adaptive domain environment for operating systems. http://www.opersys.com/ftp/pub/Adeos/adeos.pdf.Google ScholarGoogle Scholar

Index Terms

  1. Enabling Hybrid Parallel Runtimes Through Kernel and Virtualization Support

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 51, Issue 7
          VEE '16
          July 2016
          167 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/3007611
          Issue’s Table of Contents
          • cover image ACM Conferences
            VEE '16: Proceedings of the12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
            March 2016
            186 pages
            ISBN:9781450339476
            DOI:10.1145/2892242

          Copyright © 2016 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 March 2016

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!