skip to main content
research-article
Open Access

HPVM: heterogeneous parallel virtual machine

Published:10 February 2018Publication History
Skip Abstract Section

Abstract

We propose a parallel program representation for heterogeneous systems, designed to enable performance portability across a wide range of popular parallel hardware, including GPUs, vector instruction sets, multicore CPUs and potentially FPGAs. Our representation, which we call HPVM, is a hierarchical dataflow graph with shared memory and vector instructions. HPVM supports three important capabilities for programming heterogeneous systems: a compiler intermediate representation (IR), a virtual instruction set (ISA), and a basis for runtime scheduling; previous systems focus on only one of these capabilities. As a compiler IR, HPVM aims to enable effective code generation and optimization for heterogeneous systems. As a virtual ISA, it can be used to ship executable programs, in order to achieve both functional portability and performance portability across such systems. At runtime, HPVM enables flexible scheduling policies, both through the graph structure and the ability to compile individual nodes in a program to any of the target devices on a system. We have implemented a prototype HPVM system, defining the HPVM IR as an extension of the LLVM compiler IR, compiler optimizations that operate directly on HPVM graphs, and code generators that translate the virtual ISA to NVIDIA GPUs, Intel's AVX vector units, and to multicore X86-64 processors. Experimental results show that HPVM optimizations achieve significant performance improvements, HPVM translators achieve performance competitive with manually developed OpenCL code for both GPUs and vector hardware, and that runtime scheduling policies can make use of both program and runtime information to exploit the flexible compilation capabilities. Overall, we conclude that the HPVM representation is a promising basis for achieving performance portability and for implementing parallelizing compilers for heterogeneous parallel systems.

References

  1. R. Allen and K. Kennedy. 2002. Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers, Inc., San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jason Ansel, Cy Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, and Saman Amarasinghe. 2009. PetaBricks: A Language and Compiler for Algorithmic Choice (PLDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. A. Ashcroft and W. W. Wadge. 1977. Lucid, a Nonprocedural Language with Iteration. Commun. ACM (1977). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. CÃl'dric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-AndrÃl' Wacrenier. 2011. StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Riyadh Baghdadi, Ulysse Beaugnon, Albert Cohen, Tobias Grosser, Michael Kruse, Chandan Reddy, Sven Verdoolaege, Adam Betts, Alas-tair F. Donaldson, Jeroen Ketema, Javed Absar, Sven van Haastregt, Alexey Kravets, Anton Lokhmotov, Robert David, and Elnar Hajiyev. 2015. PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming. In Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT) (PACT '15). IEEE Computer Society, Washington, DC, USA, 138--149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Michael Bauer, Sean Treichler, Elliot Slaughter, and Alex Aiken. 2012. Legion: Expressing Locality and Independence with Logical Regions (SC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Tal Ben-Nun, Michael Sutton, Sreepathi Pai, and Keshav Pingali. 2017. Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations. In Proceedings of the 22Nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '17). ACM, New York, NY, USA, 235--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Nicolas Benoit and Stéphane Louise. 2010. Extending GCC with a Multi-grain Parallelism Adaptation Framework for MPSoCs. In 2nd Int'l Workshop on GCC Research Opportunities.Google ScholarGoogle Scholar
  9. Nicolas Benoit and Stéphane Louise. 2016. Using an Intermediate Representation to Map Workloads on Heterogeneous Parallel Systems. In 24th Euromicro Conference.Google ScholarGoogle ScholarCross RefCross Ref
  10. Zoran Budimlic, Michael Burke, Vincent CavÃl', Kathleen Knobe, Geoff Lowney, Ryan Newton, Jens Palsberg, David Peixotto, Vivek Sarkar, Frank Schlimbach, and Sagnak Tasirlar. 2010. Concurrent Collections. Scientific Programming 18, 3--4 (2010), 203--217. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Li-wen Chang, Abdul Dakkak, Christopher I. Rodrigues, and Wen mei Hwu. 2015. Tangram: a High-level Language for Performance Portable Code Synthesis (MULTIPROG 2015).Google ScholarGoogle Scholar
  12. D.E. Culler, S.C. Goldstein, K.E. Schauser, and T. Voneicken. 1993. TAM - A Compiler Controlled Threaded Abstract Machine. Parallel and Distributed Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kayvon Fatahalian, Daniel Reiter Horn, Timothy J. Knight, Larkhoon Leem, Mike Houston, Ji Young Park, Mattan Erez, Manman Ren, Alex Aiken, William J. Dally, and Pat Hanrahan. 2006. Sequoia: Programming the Memory Hierarchy (SC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. HSA Foundation. 2015. HSAIL. (2015). Retrieved January 17, 2018from http://www.hsafoundation.com/standards/Google ScholarGoogle Scholar
  15. Vladimir Gajinov, Srdjan Stipic, Osman S. Unsal, Tim Harris, Eduard Ayguadé, and Adrián Cristal. 2012. Integrating Dataflow Abstractions into the Shared Memory Model. In 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing. 243--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Vladimir Gajinov, Srdjan Stipic, Osman S. Unsal, Tim Harris, Eduard Ayguadé, and Adrián Cristal. 2012. Supporting Stateful Tasks in a Dataflow Graph. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT '12). ACM, New York, NY, USA, 435--436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Al Geist, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Robert Manchek, and Vaidyalingam S. Sunderam. 1994. PVM: A Users' Guide and Tutorial for Networked Parallel Computing. MIT press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Google. 2013. Google Cloud Dataflow. (2013). Retrieved January 17, 2018 from https://cloud.google.com/dataflow/Google ScholarGoogle Scholar
  19. Dounia Khaldi, Pierre Jouvelot, Francois Irigoin, and Corinne Ancourt. 2012. SPIRE: A Methodology for Sequential to Parallel Intermediate Representation Extension (CPC).Google ScholarGoogle Scholar
  20. Khronos Group. 2012. SPIR 1.2 Specification. https://www.khronos.org/registry/spir/specs/spir_spec-1.2.pdf. (2012).Google ScholarGoogle Scholar
  21. Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis and Transformation. In Proc. Conf. on Code Generation and Optimization. San Jose, CA, USA, 75--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Li-wen Chang. 2015. Personal Communication. (2015).Google ScholarGoogle Scholar
  23. D. Majeti and V. Sarkar. 2015. Heterogeneous Habanero-C (H2C): A Portable Programming Model for Heterogeneous Processors (IPDPS Workshop). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Tim Mattson, Romain Cledat, Zoran Budimlic, Vincent Cave, Sanjay Chatterjee, Bala Seshasayee, Wijngaart Rob van der, and Vivek Sarkar. 2015. OCR: The Open Community Runtime Interface. Technical Report.Google ScholarGoogle Scholar
  25. Takamichi Miyamoto, Saori Asaka, Hiroki Mikami, Masayoshi Mase, Yasutaka Wada, Hirofumi Nakano, Keiji Kimura, and Hironori Kasahara. 2008. Parallelization with Automatic Parallelizing Compiler Generating Consumer Electronics Multicore API. In 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Rishiyur S. Nikhil. 1993. The Parallel Programming Language Id and its compilation for parallel machines (IJHSC).Google ScholarGoogle Scholar
  27. NVIDIA. 2009. PTX: Parallel Thread Execution ISA. http://docs.nvidia.com/cuda/parallel-thread-execution/index.html. (2009).Google ScholarGoogle Scholar
  28. NVIDIA. 2013. NVVM IR. http://docs.nvidia.com/cuda/nvvm-ir-spec. (2013).Google ScholarGoogle Scholar
  29. M. Okamoto, K. Yamashita, H. Kasahara, and S. Narita. 1995. Hierarchical macro-dataflow computation scheme. In IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing. Proceedings. IEEE.Google ScholarGoogle Scholar
  30. LLVM Project. 2003. LLVM Language Reference Manual. (2003). Retrieved January 17, 2018from http://llvm.org/docs/LangRef.htmlGoogle ScholarGoogle Scholar
  31. Qualcomm Technologies, Inc. 2014. MARE: Enabling Applications for Heterogeneous Mobile Devices. Technical Report.Google ScholarGoogle Scholar
  32. Tao B. Schardl, William S. Moses, and Charles E. Leiserson. 2017. Tapir: Embedding Fork-Join Parallelism into LLVM's Intermediate Representation (PPoPP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. John A Stratton, Christopher Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Daniel Liu, and Wen-Mei W Hwu. 2012. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Technical Report.Google ScholarGoogle Scholar
  34. Arvind K. Sujeeth, Kevin J. Brown, Hyoukjoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2014. Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages (ACM TECS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. William Thies, Michal Karczmarek, and Saman Amarasinghe. 2002. StreamIt: A Language for Streaming Applications (International Conference on Compiler Construction). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yasutaka Wada, Akihiro Hayashi, Takeshi Masuura, Jun Shirako, Hirofumi Nakano, Hiroaki Shikano, Keiji Kimura, and Hironori Kasahara. 2011. A Parallelizing Compiler Cooperative Heterogeneous Multicore Processor Architecture. Springer Berlin Heidelberg, Berlin, Heidelberg.Google ScholarGoogle Scholar
  37. Yonghong Yan, Jisheng Zhao, Yi Guo, and Vivek Sarkar. 2009. Hierarchical Place Trees: A Portable Abstraction for Task Parallelism and Data Movement. In Proceedings of the 22Nd International Conference on Languages and Compilers for Parallel Computing (LCPC'09). Springer-Verlag, Berlin, Heidelberg, 172--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jin Zhou and Brian Demsky. 2010. Bamboo: A Data-centric, Object-oriented Approach to Many-core Software. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '10). ACM, New York, NY, USA, 388--399. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. HPVM: heterogeneous parallel virtual machine

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 53, Issue 1
      PPoPP '18
      January 2018
      426 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/3200691
      Issue’s Table of Contents
      • cover image ACM Conferences
        PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
        February 2018
        442 pages
        ISBN:9781450349826
        DOI:10.1145/3178487

      Copyright © 2018 Owner/Author

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 February 2018

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!