Abstract
We present a novel 2-approximation algorithm for deploying stream graphs on multicore computers and a stream graph transformation that eliminates bottlenecks. The key technical insight is a data rate transfer model that enables the computation of a "closed form", i.e., the data rate transfer function of an actor depending on the arrival rate of the stream program. A combinatorial optimization problem uses the closed form to maximize the throughput of the stream program. Although the problem is inherently NP-hard, we present an efficient and effective 2-approximation algorithm that provides a lower bound on the quality of the solution. We introduce a transformation that uses the closed form to identify and eliminate bottlenecks.
We show experimentally that state-of-the art integer linear programming approaches for orchestrating stream graphs are (1) intractable or at least impractical for larger stream graphs and larger number of processors and (2)our 2-approximation algorithm is highly efficient and its results are close to the optimal solution for a standard set of StreamIt benchmark programs.
- StreamIt Website. http://groups.csail.mit.edu/cag/streamit, retrieved 2010.Google Scholar
- J. Backus. Can programming be liberated from the von Neumann style? A functional style and its algebra of programs. ACM Turing Award Lectures, 2007. Google Scholar
Digital Library
- S. S. Battacharyya, E. A. Lee, and P. K. Murthy. Software Synthesis from Dataflow Graphs. Kluwer Academic Publishers, 1996. Google Scholar
Digital Library
- R. E. Bryant and D. R. O'Halloran. Computer Systems: A Programmer's Perspective. Prentice-Hall, 2003. Google Scholar
Digital Library
- I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream computing on graphics hardware. ACM Trans. Graph., 23(3):777--786, 2004. Google Scholar
Digital Library
- P. M. Carpenter, A. Ramirez, and E. Ayguade. Mapping stream programs onto heterogeneous multiprocessor systems. In CASES '09: Proceedings of the 2009 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pages 57--66. ACM, 2009. Google Scholar
Digital Library
- M. K. Chen, X. F. Li, R. Lian, J. H. Lin, L. Liu, T. Liu, and R. Ju. Shangri-la: Achieving high performance from compiled network applications while enabling ease of programming. In PLDI '05: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 2005. Google Scholar
Digital Library
- Cisco. The Cisco QuantumFlow processor: Cisco's next generation network processor. White paper, 2008.Google Scholar
- J. B. Dennis. First version of a data flow procedure language. In Programming Symposium, Proceedings Colloque sur la Programmation, pages 362--376. Springer-Verlag, 1974. Google Scholar
Digital Library
- W. Eatherton. The push of network processing to the top of the pyramid, 2005.Google Scholar
- M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In ASPLOS '06: Proceedings of the 2006 International Conference on Architectural Support for Programming Languages and Operating Systems, 2006. Google Scholar
Digital Library
- J. Gummaraju and M. Rosenblum. Stream programming on general-purpose processors. In MICRO 38: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture, pages 343--354. IEEE Computer Society, 2005. Google Scholar
Digital Library
- H. P. Hofstee. Power efficient processor architecture and the Cell processor. In HPCA '05: Proceedings of the 2005 International Symposium on High-Performance Computer Architecture, pages 258--262. IEEE Computer Society, 2005. Google Scholar
Digital Library
- R. Kalla, B. Sinharoy, W. J. Starke, and M. Floyd. Power7: IBM's next-generation server processor. IEEE Micro, 30(2):7--15, 2010. Google Scholar
Digital Library
- M. Karczmarek. Constrained and phased scheduling of synchronous data flow graphs for the StreamIt language. Master's thesis, Massachusetts Institute of Technology, 2002.Google Scholar
- M. Karczmarek, W. Thies, and S. Amarasinghe. Phased scheduling of stream programs. LCTES '03: Proceedings of the 2003 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, 38(7):1235--1245, 2003. Google Scholar
Digital Library
- R. M. Karp and R. E. Miller. Properties of a model for parallel computations: Determinacy, termination, queueing. SIAM Journal on Applied Mathematics, 14(6):1390--1411, 1966.Google Scholar
Cross Ref
- M. Kudlur and S. Mahlke. Orchestrating the execution of stream programs on multicore platforms. In PLDI '08: Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 2008. Google Scholar
Digital Library
- E. A. Lee and D. G. Messerschmitt. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers, 36:24--35, 1987. Google Scholar
Digital Library
- E. A. Lee and D. G. Messerschmitt. Synchronous data flow. Proceedings of the IEEE, 75(9):1235--1245, 1987.Google Scholar
Cross Ref
- W. R. Mark, R. Steven G., K. Akeley, and M. J. Kilgard. Cg: a system for programming hardware in a C-like language. In SIGGRAPH '03. ACM, 2003. Google Scholar
Digital Library
- E. W. Michael, M. Taylor, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, S. Devabhaktuni, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring it all to software: The Raw machine. IEEE Computer, 30:86--93, 1997. Google Scholar
Digital Library
- NVIDIA Corporation. CUDA C Programming Guide 3.1, 2010.Google Scholar
- S. Robert. A survey of stream processing. Acta Informatica, 34(7):491--541, 1997.Google Scholar
Cross Ref
- B. G. Ryder and M. C. Paull. Elimination algorithms for data flow analysis. ACM Comput. Surv., 18(3):277--316, 1986. Google Scholar
Digital Library
- J. L. Shin, K. Tam, D. Huang, B. Petrick, and H. Pham. A 40nm 16-core 128-thread CMT SPARC SoC processor. In ISSCC '10, Solid-State Circuits Conference Digest of Technical Papers. IEEE International, 2010.Google Scholar
Cross Ref
- J. H. Spring, J. Privat, R. Guerraoui, and J. Vitek. StreamFlex: High-throughput stream programming in Java. OOPSLA '07: Proceedings of the 2007 ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications, 42(10), 2007. Google Scholar
Digital Library
- W. Thies. Language and Compiler Support for Stream Programs. PhD thesis, Massachusetts Institute of Technology, USA, 2009. Google Scholar
Digital Library
- W. Thies and S. Amarasinghe. An empirical characterization of stream programs and its implications for language and compiler design. In PACT '10 Proceedings of the 2010 Conference on Parallel Architectures and Compilation Techniques. ACM, 2010. Google Scholar
Digital Library
- W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A language for streaming applications. In CC '02: Proceedings of the 11th International Conference on Compiler Construction, pages 179--196, London, UK, 2002. Springer-Verlag. Google Scholar
Digital Library
- A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil. Software pipelined execution of stream programs on GPUs. In CGO '09: Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization. IEEE Computer Society, 2009. Google Scholar
Digital Library
- A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil. Synergistic execution of stream programs on multicores with accelerators. LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, 44(7), 2009. Google Scholar
Digital Library
- V. V. Vazirani. Approximation Algorithms. Springer-Verlag, 2001. Google Scholar
Digital Library
- H. Wei, J. Yu, H. Yu, and G. R. Gao. Minimizing communication in rate-optimal software pipelining for stream programs. In CGO '10: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pages 210--217. ACM, 2010. Google Scholar
Digital Library
- D. Zhang, Q. J. Li, R. Rabbah, and S. Amarasinghe. A lightweight streaming layer for multicore execution. SIGARCH Comput. Archit. News, 36(2):18--27, 2008. Google Scholar
Digital Library
- D. Zhang, Z. Li, H. Song, and L. Liu. A programming model for an embedded media processing architecture. In SAMOS '05: Proceedings of the 2005 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation. Springer LNCS, 2005. Google Scholar
Digital Library
Index Terms
Orchestration by approximation: mapping stream programs onto multicore architectures
Recommendations
Orchestration by approximation: mapping stream programs onto multicore architectures
ASPLOS XVI: Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systemsWe present a novel 2-approximation algorithm for deploying stream graphs on multicore computers and a stream graph transformation that eliminates bottlenecks. The key technical insight is a data rate transfer model that enables the computation of a "...
Orchestration by approximation: mapping stream programs onto multicore architectures
ASPLOS '11We present a novel 2-approximation algorithm for deploying stream graphs on multicore computers and a stream graph transformation that eliminates bottlenecks. The key technical insight is a data rate transfer model that enables the computation of a "...
Orchestrating the execution of stream programs on multicore platforms
PLDI '08While multicore hardware has become ubiquitous, explicitly parallel programming models and compiler techniques for exploiting parallelism on these systems have noticeably lagged behind. Stream programming is one model that has wide applicability in the ...







Comments