skip to main content
research-article

Orchestrating the execution of stream programs on multicore platforms

Published:07 June 2008Publication History
Skip Abstract Section

Abstract

While multicore hardware has become ubiquitous, explicitly parallel programming models and compiler techniques for exploiting parallelism on these systems have noticeably lagged behind. Stream programming is one model that has wide applicability in the multimedia, graphics, and signal processing domains. Streaming models execute as a set of independent actors that explicitly communicate data through channels. This paper presents a compiler technique for planning and orchestrating the execution of streaming applications on multicore platforms. An integrated unfolding and partitioning step based on integer linear programming is presented that unfolds data parallel actors as needed and maximally packs actors onto cores. Next, the actors are assigned to pipeline stages in such a way that all communication is maximally overlapped with computation on the cores. To facilitate experimentation, a generalized code generation template for mapping the software pipeline onto the Cell architecture is presented. For a range of streaming applications, a geometric mean speedup of 14.7x is achieved on a 16-core Cell platform compared to a single core.

References

  1. Pieter Bellens, Josep M. Perez, Rosa M. Badia, and Jesus Labarta. Cellss: a programming model for the cell be architecture. Proceedings Supercomputing '06, 00(1):5, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Filip Blagojevic, Dimitris S. Nikolopoulos, Alexandros Stamatakis, and Christos D. Antonopoulos. Dynamic multigrain parallelization on the cell broadband engine. In Proc. of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 90--100, New York, NY, USA, 2007. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. I. Buck et al. Brook for GPUs: Stream computing on graphics hardware. ACM Transactions on Graphics, 23(3):777--786, August 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Chen, X. Li, R. Lian, J. Lin, L. Liu, T. Liu, and R. Ju. Shangri-la: Achieving high performance from compiled network applications while enabling ease of programming. In Proc. of the SIGPLAN '05 Conference on Programming Language Design and Implementation, pages 224--236, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. W. Eatherton. The push of network processing to the top of the pyramid, 2005.Google ScholarGoogle Scholar
  6. Michael I. Gordon, William Thies, and Saman Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In 14th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 151--162, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Michael I. Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S. Meli, Andrew A. Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, and Saman Amarasinghe. A stream compiler for communication-exposed architectures. In Tenth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 291--303, October 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jayanth Gummaraju and Mendel Rosenblum. Stream programming on general-purpose processors. In Proc. of the 38th Annual International Symposium on Microarchitecture, pages 343--354, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Soonhoi Ha and Edward A. Lee. Compile-time scheduling and assignment of data-flow program graphs with data-dependent iteration. IEEE Transactions on Computers, 40(11):1225--1238, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. P. Hofstee. Power efficient processor design and the Cell processor. In Proc. of the 11th International Symposium on High-Performance Computer Architecture, pages 258--262, February 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. IBM. Cell Broadband Engine Architecture, March 2006.Google ScholarGoogle Scholar
  12. G. Karypis and V. Kumar. Metis: A Software Package for Paritioning Unstructured Graphs, Partitioning Meshes and Computing Fill-Reducing Orderings of Sparce Matrices. University of Minnesota, September 1998.Google ScholarGoogle Scholar
  13. Timothy J. Knight, Ji Young Park, Manman Ren, Mike Houston, Mattan Erez, Kayvon Fatahalian, Alex Aiken, William J. Dally, and Pat Hanrahan. Compilation for explicitly managed memory hierarchies. In Proc. of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 226--236, New York, NY, USA, 2007. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded SPARC processor. IEEE Micro, 25(2):21--29, February 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Lee and D. Messerschmitt. Synchronous data flow. IEEE Proceedings of, 75(9):1235--1245, 1987.Google ScholarGoogle ScholarCross RefCross Ref
  16. E. A. Lee and D. Messerschmitt. Pipeline interleaved programmable dsp's: Synchronous data flow programming. 35(9):1334--1345, 1987.Google ScholarGoogle Scholar
  17. Edward Ashford Lee and David G. Messerschmitt. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers, 36(1):24--35, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. Mark, R. Glanville, K. Akeley, and J. Kilgard. Cg: A system for programming graphics hardware in a C-like language. In Proc. of the 30thInternational Conference on Computer Graphics and Interactive Techniques, pages 893--907, July 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Nickolls and I. Buck. NVIDIA CUDA software and GPU parallel computing architecture. In Microprocessor Forum, May 2007.Google ScholarGoogle Scholar
  20. K.K. Parhi and D.G. Messerschmitt. Static rate-optimal scheduling of iterative data-flow programs via optimum unfolding. IEEE Transactions on Computers, 40(2):178--195, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jose Luis Pino, Shuvra S. Bhattacharyya, and Edward A. Lee. A hierarchical multiprocessor scheduling framework for synchronous dataflow graphs. Technical Report UCB/ERL M95/36, University of California, Berkeley, May 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. R. Rau. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proc. of the 27th Annual International Symposium on Microarchitecture, pages 63--74, November 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. R. Rau, M. S. Schlansker, and P. P. Tirumalai. Code generation for modulo scheduled loops. In Proc. of the 25th Annual International Symposium on Microarchitecture, pages 158--169, November 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Sánchez and A. González. Modulo scheduling for a fully-distributed clustered VLIW architecture. In Proc. of the 33rd Annual International Symposium on Microarchitecture, pages 124--133, December 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Michael Bedford Taylor et al. The Raw microprocessor: A computational fabric for software circuits and general purpose programs. IEEE Micro, 22(2):25--35, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A language for streaming applications. In Proc. of the 2002 International Conference on Compiler Construction, pages 179--196, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Shih wei Liao, Zhaohui Du, Gansha Wu, and Guei-Yuan Lueh. Data and computation transformations for brook streaming applications on multiprocessors. Proc. of the 2006 International Symposium on Code Generation and Optimization, 0(1):196--207, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Zhang, Z. Li, H. Song, and L Liu. A programming model for an embedded media processing architecture. In Proc. of the 5thInternational Symposium on Systems, Architectures, Modeling, and Simulation, volume 3553 of Lecture Notes in Computer Science, pages 251--261, July 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Orchestrating the execution of stream programs on multicore platforms

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 43, Issue 6
      PLDI '08
      June 2008
      382 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1379022
      Issue’s Table of Contents
      • cover image ACM Conferences
        PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2008
        396 pages
        ISBN:9781595938602
        DOI:10.1145/1375581
        • General Chair:
        • Rajiv Gupta,
        • Program Chair:
        • Saman Amarasinghe

      Copyright © 2008 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 June 2008

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!