Abstract
Because multicore architectures have become the industry standard, programming abstractions for concurrent programming are of key importance. Stream programming languages facilitate application domains characterized by regular sequences of data, such as multimedia, graphics, signal processing and networking. With stream programs, computations are expressed through independent actors that interact through FIFO data channels. A major challenge with stream programs is to load-balance actors among available processing cores. The workload of a stream program is determined by actor execution times and the communication overhead induced by data channels. Estimating communication costs on cache-coherent shared-memory multiprocessors is difficult, because data movements are abstracted away by the cache coherence protocol. Standard execution time profiling techniques cannot separate actor execution times from communication costs, because communication costs manifest in terms of execution time overhead.
In this work we present a unified Integer Linear Programming (ILP) formulation that balances the workload of stream programs on cache-coherent multicore architectures. For estimating the communication costs of data channels, we devise a novel profiling scheme that minimizes the number of profiling steps. We conduct experiments across a range of StreamIt benchmarks and show that our method achieves a speedup of up to 4.02x on 6 processors. The number of profiling steps is on average only 17% of an exhaustive profiling run over all data channels of a stream program.
- S. Amarasinghe. Streamit. http://groups.csail.mit.edu, 2010.Google Scholar
- J. Backus. Can Programming be Liberated from the von Neumann style? A Functional Style and its Algebra of Programs. ACM Turing Award Lectures, 2007. Google Scholar
Digital Library
- S. S. Battacharyya, E. A. Lee, and P. K. Murthy. Software Synthesis from Dataflow Graphs. Kluwer Academic Publishers, 1996. Google Scholar
Digital Library
- Rowland L. Brooks. On colouring the nodes of a network. Mathematical Proceedings of the Cambridge Philosophical Society, 37:194--197, 1941.Google Scholar
Cross Ref
- R. E. Bryant and D. R. O'Halloran. Computer Systems: A Programmer's Perspective. Prentice-Hall, 2003. Google Scholar
Digital Library
- I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream computing on graphics hardware. ACM Trans. Graph., 23(3):777--786, 2004. Google Scholar
Digital Library
- P. M. Carpenter, A. Ramirez, and E. Ayguade. Mapping stream programs onto heterogeneous multiprocessor systems. In CASES '09: Proceedings of the 2009 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pages 57--66. ACM, 2009. Google Scholar
Digital Library
- M. K. Chen, X. F. Li, R. Lian, J. H. Lin, L. Liu, T. Liu, and R. Ju. Shangri-la: Achieving high performance from compiled network applications while enabling ease of programming. In PLDI '05: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 2005. Google Scholar
Digital Library
- J. B. Dennis. First version of a data flow procedure language. In Programming Symposium, Proceedings Colloque sur la Programmation, pages 362--376. Springer-Verlag, 1974. Google Scholar
Digital Library
- S. M. Farhad, Yousun Ko, Bernd Burgstaller, and Bernhard Scholz. Orchestration by approximation: mapping stream programs onto multicore architectures. In Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems, ASPLOS '11, pages 357--368, New York, NY, USA, 2011. ACM. Google Scholar
Digital Library
- R. Fourer, D. M. Gay, and B. W. Kernighan. AMPL: A Modeling Language for Mathematical Programming. Thomson/Brooks/Cole, 2nd edition, 2003. Google Scholar
Digital Library
- M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In ASPLOS '06: Proceedings of the 2006 International Conference on Architectural Support for Programming Languages and Operating Systems, 2006. Google Scholar
Digital Library
- J. Gummaraju and M. Rosenblum. Stream programming on general-purpose processors. In MICRO 38: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture, pages 343--354. IEEE Computer Society, 2005. Google Scholar
Digital Library
- H. P. Hofstee. Power efficient processor architecture and the Cell processor. In HPCA '05: Proceedings of the 2005 International Symposium on High-Performance Computer Architecture, volume 0, pages 258--262. IEEE Computer Society, 2005. Google Scholar
Digital Library
- IBM. Cplex. http://www.ibm.com, 2011.Google Scholar
- Intel Corporation. Intel 64 and IA-32 Architectures Software Developer Manuals, Vol. 3B, December 2011.Google Scholar
- Intel Corporation. Intel 64 and IA-32 Architectures Software Developer Manuals, retrieved Jan. 2012.Google Scholar
- M. Karczmarek, W. Thies, and S. Amarasinghe. Phased scheduling of stream programs. LCTES '03: Proceedings of the 2003 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, 38(7):1235--1245, 2003. Google Scholar
Digital Library
- M. Kudlur and S. Mahlke. Orchestrating the execution of stream programs on multicore platforms. In PLDI '08: Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 2008. Google Scholar
Digital Library
- E. A. Lee and D. G. Messerschmitt. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers, 36:24--35, 1987. Google Scholar
Digital Library
- E. A. Lee and D. G. Messerschmitt. Synchronous data flow. Proceedings of the IEEE, 75(9):1235--1245, 1987.Google Scholar
Cross Ref
- LIKWID tool website. http://code.google.com/p/likwid, retrieved Jan. 2012.Google Scholar
- W. R. Mark, R. Steven G., K. Akeley, and M. J. Kilgard. Cg: a system for programming ghardware in a C-like language. In SIGGRAPH '03: Proceedings of the 2003 Conference on Special Interest Group on GRAPHics and Interactive Techniques. ACM, 2003. Google Scholar
Digital Library
- David W. Matula and Leland L. Beck. Smallest-last ordering and clustering and graph coloring algorithms. J. ACM, 30:417--427, July 1983. Google Scholar
Digital Library
- E. W. Michael, M. Taylor, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, S. Devabhaktuni, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring it all to software: The raw machine. IEEE Computer, 30:86--93, 1997. Google Scholar
Digital Library
- S. Robert. A survey of stream processing. Acta Informatica, 34(7):491--541, 1997.Google Scholar
Cross Ref
- J. H. Spring, J. Privat, R. Guerraoui, and J. Vitek. StreamFlex: High-throughput stream programming in Java. OOPSLA '07: Proceedings of the 2007 ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications, 42(10), 2007. Google Scholar
Digital Library
- W. Thies. Language and Compiler Support for Stream Programs. PhD thesis, Massachusetts Institute of Technology, USA, 2009. Google Scholar
Digital Library
- W. Thies and S. Amarasinghe. An Empirical Characterization of Stream Programs and its Implications for Language and Compiler Design. In PACT '10 Proceedings of the 2010 Conference on Parallel Architectures and Compilation Techniques. ACM, 2010. Google Scholar
Digital Library
- W. Thies, M. Karczmarek, and S. P. Amarasinghe. Streamit: A language for streaming applications. In CC '02: Proceedings of the 11th International Conference on Compiler Construction, pages 179--196, London, UK, 2002. Springer-Verlag. Google Scholar
Digital Library
- J. Treibig, G. Hager, and G. Wellein. Likwid: A lightweight performance-oriented tool suite for x86 multicore environments. In Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, San Diego CA, 2010. Google Scholar
Digital Library
- A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil. Software pipelined execution of stream programs on GPUs. In CGO '09: Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization. IEEE Computer Society, 2009. Google Scholar
Digital Library
- A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil. Synergistic execution of stream programs on multicores with accelerators. LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, 44(7), 2009. Google Scholar
Digital Library
- V. V. Vazirani. Approximation Algorithms. Springer-Verlag, 2001. Google Scholar
Digital Library
- H. Wei, J. Yu, H. Yu, and G. R. Gao. Minimizing communication in rate-optimal software pipelining for stream programs. In CGO '10: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pages 210--217. ACM, 2010. Google Scholar
Digital Library
- Haitao Wei, Junqing Yu, Huafei Yu, and Guang R. Gao. Minimizing communication in rate-optimal software pipelining for stream programs. In Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, CGO '10, pages 210--217, New York, NY, USA, 2010. ACM. Google Scholar
Digital Library
- D. Zhang, Q. J. Li, R. Rabbah, and S. Amarasinghe. A lightweight streaming layer for multicore execution. SIGARCH Comput. Archit. News, 36(2):18--27, 2008. Google Scholar
Digital Library
- D. Zhang, Z. Li, H. Song, and L. Liu. A programming model for an embedded media processing architecture. In SAMOS '05: Proceedings of the 2005 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation. Springer LNCS, 2005. Google Scholar
Digital Library
Index Terms
Profile-guided deployment of stream programs on multicores
Recommendations
Profile-guided deployment of stream programs on multicores
LCTES '12: Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded SystemsBecause multicore architectures have become the industry standard, programming abstractions for concurrent programming are of key importance. Stream programming languages facilitate application domains characterized by regular sequences of data, such as ...
Synergistic execution of stream programs on multicores with accelerators
LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systemsThe StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multicore architectures. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on accelerators such as ...
Orchestrating the execution of stream programs on multicore platforms
PLDI '08While multicore hardware has become ubiquitous, explicitly parallel programming models and compiler techniques for exploiting parallelism on these systems have noticeably lagged behind. Stream programming is one model that has wide applicability in the ...






Comments