skip to main content
research-article

Orchestration by approximation: mapping stream programs onto multicore architectures

Published:05 March 2011Publication History
Skip Abstract Section

Abstract

We present a novel 2-approximation algorithm for deploying stream graphs on multicore computers and a stream graph transformation that eliminates bottlenecks. The key technical insight is a data rate transfer model that enables the computation of a "closed form", i.e., the data rate transfer function of an actor depending on the arrival rate of the stream program. A combinatorial optimization problem uses the closed form to maximize the throughput of the stream program. Although the problem is inherently NP-hard, we present an efficient and effective 2-approximation algorithm that provides a lower bound on the quality of the solution. We introduce a transformation that uses the closed form to identify and eliminate bottlenecks.

We show experimentally that state-of-the art integer linear programming approaches for orchestrating stream graphs are (1) intractable or at least impractical for larger stream graphs and larger number of processors and (2)our 2-approximation algorithm is highly efficient and its results are close to the optimal solution for a standard set of StreamIt benchmark programs.

References

  1. StreamIt Website. http://groups.csail.mit.edu/cag/streamit, retrieved 2010.Google ScholarGoogle Scholar
  2. J. Backus. Can programming be liberated from the von Neumann style? A functional style and its algebra of programs. ACM Turing Award Lectures, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. S. Battacharyya, E. A. Lee, and P. K. Murthy. Software Synthesis from Dataflow Graphs. Kluwer Academic Publishers, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. E. Bryant and D. R. O'Halloran. Computer Systems: A Programmer's Perspective. Prentice-Hall, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream computing on graphics hardware. ACM Trans. Graph., 23(3):777--786, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. M. Carpenter, A. Ramirez, and E. Ayguade. Mapping stream programs onto heterogeneous multiprocessor systems. In CASES '09: Proceedings of the 2009 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pages 57--66. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. K. Chen, X. F. Li, R. Lian, J. H. Lin, L. Liu, T. Liu, and R. Ju. Shangri-la: Achieving high performance from compiled network applications while enabling ease of programming. In PLDI '05: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cisco. The Cisco QuantumFlow processor: Cisco's next generation network processor. White paper, 2008.Google ScholarGoogle Scholar
  9. J. B. Dennis. First version of a data flow procedure language. In Programming Symposium, Proceedings Colloque sur la Programmation, pages 362--376. Springer-Verlag, 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. Eatherton. The push of network processing to the top of the pyramid, 2005.Google ScholarGoogle Scholar
  11. M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In ASPLOS '06: Proceedings of the 2006 International Conference on Architectural Support for Programming Languages and Operating Systems, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Gummaraju and M. Rosenblum. Stream programming on general-purpose processors. In MICRO 38: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture, pages 343--354. IEEE Computer Society, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. P. Hofstee. Power efficient processor architecture and the Cell processor. In HPCA '05: Proceedings of the 2005 International Symposium on High-Performance Computer Architecture, pages 258--262. IEEE Computer Society, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Kalla, B. Sinharoy, W. J. Starke, and M. Floyd. Power7: IBM's next-generation server processor. IEEE Micro, 30(2):7--15, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Karczmarek. Constrained and phased scheduling of synchronous data flow graphs for the StreamIt language. Master's thesis, Massachusetts Institute of Technology, 2002.Google ScholarGoogle Scholar
  16. M. Karczmarek, W. Thies, and S. Amarasinghe. Phased scheduling of stream programs. LCTES '03: Proceedings of the 2003 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, 38(7):1235--1245, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. M. Karp and R. E. Miller. Properties of a model for parallel computations: Determinacy, termination, queueing. SIAM Journal on Applied Mathematics, 14(6):1390--1411, 1966.Google ScholarGoogle ScholarCross RefCross Ref
  18. M. Kudlur and S. Mahlke. Orchestrating the execution of stream programs on multicore platforms. In PLDI '08: Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. A. Lee and D. G. Messerschmitt. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers, 36:24--35, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. A. Lee and D. G. Messerschmitt. Synchronous data flow. Proceedings of the IEEE, 75(9):1235--1245, 1987.Google ScholarGoogle ScholarCross RefCross Ref
  21. W. R. Mark, R. Steven G., K. Akeley, and M. J. Kilgard. Cg: a system for programming hardware in a C-like language. In SIGGRAPH '03. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. W. Michael, M. Taylor, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, S. Devabhaktuni, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring it all to software: The Raw machine. IEEE Computer, 30:86--93, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. NVIDIA Corporation. CUDA C Programming Guide 3.1, 2010.Google ScholarGoogle Scholar
  24. S. Robert. A survey of stream processing. Acta Informatica, 34(7):491--541, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  25. B. G. Ryder and M. C. Paull. Elimination algorithms for data flow analysis. ACM Comput. Surv., 18(3):277--316, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. L. Shin, K. Tam, D. Huang, B. Petrick, and H. Pham. A 40nm 16-core 128-thread CMT SPARC SoC processor. In ISSCC '10, Solid-State Circuits Conference Digest of Technical Papers. IEEE International, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  27. J. H. Spring, J. Privat, R. Guerraoui, and J. Vitek. StreamFlex: High-throughput stream programming in Java. OOPSLA '07: Proceedings of the 2007 ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications, 42(10), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. W. Thies. Language and Compiler Support for Stream Programs. PhD thesis, Massachusetts Institute of Technology, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. W. Thies and S. Amarasinghe. An empirical characterization of stream programs and its implications for language and compiler design. In PACT '10 Proceedings of the 2010 Conference on Parallel Architectures and Compilation Techniques. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A language for streaming applications. In CC '02: Proceedings of the 11th International Conference on Compiler Construction, pages 179--196, London, UK, 2002. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil. Software pipelined execution of stream programs on GPUs. In CGO '09: Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization. IEEE Computer Society, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil. Synergistic execution of stream programs on multicores with accelerators. LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, 44(7), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. V. V. Vazirani. Approximation Algorithms. Springer-Verlag, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. H. Wei, J. Yu, H. Yu, and G. R. Gao. Minimizing communication in rate-optimal software pipelining for stream programs. In CGO '10: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pages 210--217. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Zhang, Q. J. Li, R. Rabbah, and S. Amarasinghe. A lightweight streaming layer for multicore execution. SIGARCH Comput. Archit. News, 36(2):18--27, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. D. Zhang, Z. Li, H. Song, and L. Liu. A programming model for an embedded media processing architecture. In SAMOS '05: Proceedings of the 2005 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation. Springer LNCS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Orchestration by approximation: mapping stream programs onto multicore architectures

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 46, Issue 3
      ASPLOS '11
      March 2011
      407 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1961296
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS XVI: Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
        March 2011
        432 pages
        ISBN:9781450302661
        DOI:10.1145/1950365

      Copyright © 2011 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 March 2011

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!