ABSTRACT
As multicore architectures enter the mainstream, there is a pressing demand for high-level programming models that can effectively map to them. Stream programming offers an attractive way to expose coarse-grained parallelism, as streaming applications (image, video, DSP, etc.) are naturally represented by independent filters that communicate over explicit data channels.In this paper, we demonstrate an end-to-end stream compiler that attains robust multicore performance in the face of varying application characteristics. As benchmarks exhibit different amounts of task, data, and pipeline parallelism, we exploit all types of parallelism in a unified manner in order to achieve this generality. Our compiler, which maps from the StreamIt language to the 16-core Raw architecture, attains a 11.2x mean speedup over a single-core baseline, and a 1.84x speedup over our previous work.
- Raza Microelectronics, Inc. http://www.razamicroelectronics.com/products/xlr.htm.]]Google Scholar
- StreamIt Language Specification. http://cag.lcs.mit.edu/streamit/papers/streamit-lang-spec.pdf.]]Google Scholar
- S. Agrawal, W. Thies, and S. Amarasinghe. Optimizing Stream Programs Using Linear State Space Analysis. In CASES, San Francisco, CA, Sept. 2005.]] Google Scholar
Digital Library
- J. Andrews and N. Baker. Xbox 360 System Architecture. IEEE Micro, 26(2), 2006.]] Google Scholar
Digital Library
- S. Bakshi and D.D. Gajski. Partitioning and pipelining for performance-constrained hardware/software systems. IEEE Trans. Very Large Scale Integr. Syst., 7(4):419--432, 1999.]] Google Scholar
Digital Library
- I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream Computing on Graphics Hardware. In SIGGRAPH, 2004.]] Google Scholar
Digital Library
- L.-F. Chao and E.H.-M. Sha. Scheduling Data-Flow Graphs via Retiming and Unfolding. IEEE Trans. on Parallel and Distributed Systems, 08(12), 1997.]] Google Scholar
Digital Library
- K.S. Chatha and R. Vemuri. Hardware-Software partitioning and pipelined scheduling of transformative applications. IEEE Trans. Very Large Scale Integr. Syst., 10(3), 2002.]] Google Scholar
Digital Library
- M.K. Chen, X.F. Li, R. Lian, J.H. Lin, L. Liu, T. Liu, and R. Ju. Shangri-La: Achieving High Performance from Compiled Network Applications While Enabling Ease of Programming. In PLDI, New York, NY, USA, 2005.]] Google Scholar
Digital Library
- C. Consel, H. Hamdi, L. Rveillre, L. Singaravelu, H. Yu, and C. Pu. Spidle: A DSL Approach to Specifying Streaming Applications. In 2nd Int. Conf. on Generative Prog. and Component Engineering, 2003.]] Google Scholar
Digital Library
- M. Drake, H. Hoffman, R. Rabbah, and S. Amarasinghe. MPEG-2 Decoding in a Stream Programming Language. In IPDPS, Rhodes Island, Greece, April 2006.]] Google Scholar
Digital Library
- W. Du, R. Ferreira, and G. Agrawal. Compiler Support for Exploiting Coarse-Grained Pipelined Parallelism. In Supercomputing, 2005.]] Google Scholar
Digital Library
- E. and D. Messerschmitt. Pipeline interleaved programmable DSP's: Synchronous data flow programming. IEEE Trans. on Signal Processing, 35(9), 1987.]]Google Scholar
- W. Eatherton. The Push of Network Processing to the Top of the Pyramid. Keynote Address, Symposium on Architectures for Networking and Communications Systems, 2005.]]Google Scholar
- M. Gordon,W. Thies, M. Karczmarek, J. Lin, A.S. Meli, A.A. Lamb, C. Leger, J. Wong, H. Hoffmann, D. Maze, and S. Amarasinghe. A Stream Compiler for Communication-Exposed Architectures. In ASPLOS, 2002.]] Google Scholar
Digital Library
- J. Gummaraju and M. Rosenblum. Stream Programming on General- Purpose Processors. In MICRO, 2005.]] Google Scholar
Digital Library
- H.P. Hofstee. Power Efficient Processor Architecture and The Cell Processor. HPCA, 00:258--262, 2005.]] Google Scholar
Digital Library
- U.J. Kapasi, S. Rixner, W.J. Dally, B. Khailany, J.H. Ahn, P. Mattson, and J.D. Owens. Programmable stream processors. IEEE Computer, 2003.]] Google Scholar
Digital Library
- M. Karczmarek, W. Thies, and S. Amarasinghe. Phased scheduling of stream programs. In LCTES, San Diego, CA, June 2003.]] Google Scholar
Digital Library
- M.A. Karczmarek. Constrained and Phased Scheduling of Synchronous Data Flow Graphs for the StreamIt Language. Master's thesis, MIT, 2002.]]Google Scholar
- P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-Way Multithreaded Sparc Processor. IEEE Micro, 25(2):21--29, 2005.]] Google Scholar
Digital Library
- Y.-K. Kwok and I. Ahmad. FASTEST: A Practical Low-Complexity Algorithm for Compile-Time Assignment of Parallel Programs to Multiprocessors. IEEE Trans. on Parallel and Distributed Systems, 10(2), 1999.]] Google Scholar
Digital Library
- Y.-K. Kwok and I. Ahmad. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv., 31(4):406--471, 1999.]] Google Scholar
Digital Library
- A.A. Lamb, W. Thies, and S. Amarasinghe. Linear Analysis and Optimization of Stream Programs. In PLDI, San Diego, CA, June 2003.]] Google Scholar
Digital Library
- J. Lebak. Polymorphous Computing Architecture (PCA) Example Applications and Description. External Report, Lincoln Laboratory, Mass. Inst. of Technology, 2001.]]Google Scholar
- E.A. Lee and D.G. Messerschmitt. Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing. IEEE Trans. Comput., 36(1):24--35, 1987.]] Google Scholar
Digital Library
- W.R. Mark, R.S. Glanville, K. Akeley, and M.J. Kilgard. Cg: A System for Programming Graphics Hardware in a C-like Language. In SIGGRAPH, 2003.]] Google Scholar
Digital Library
- D. May, R. Shepherd, and C. Keane. Communicating Process Architecture: Transputers and Occam. Future Parallel Computers: An Advanced Course, Pisa, Lecture Notes in Computer Science, 272, June 1987.]] Google Scholar
Digital Library
- Michael Sipser. Introduction to the Theory of Computation. PWS Publishing Company, 1997.]] Google Scholar
Digital Library
- G. Ottoni, R. Rangan, A. Stoler, and D.I. August. Automatic Thread Extraction with Decoupled Software Pipelining. In MICRO, 2005.]] Google Scholar
Digital Library
- K. Parhi and D. Messerschmitt. Static Rate-Optimal Scheduling of Iterative Data-Flow Programs Via Optimum Unfolding. IEEE Transactions on Computers, 40(2), 1991.]] Google Scholar
Digital Library
- J.L. Pino, S.S. Bhattacharyya, and E.A. Lee. A Hierarchical Multiprocessor Scheduling Framework for Synchronous Dataflow Graphs. Technical Report UCB/ERL M95/36, May 1995.]] Google Scholar
Digital Library
- J.L. Pino and E.A. Lee. Hierarchical Static Scheduling of Dataflow Graphs onto Multiple Processors. Proc. of the IEEE Conference on Acoustics, Speech, and Signal Processing, 1995.]]Google Scholar
Cross Ref
- S. Seneff. Speech transformation system (spectrum and/or excitation) without pitch extraction. Master's thesis, MIT, 1980.]]Google Scholar
- J. Sermulins, W. Thies, R. Rabbah, and S. Amarasinghe. Cache Aware Optimization of Stream Programs. In LCTES, Chicago, 2005.]] Google Scholar
Digital Library
- R. Stephens. A Survey of Stream Processing. Acta Informatica, 34(7), 1997.]]Google Scholar
Cross Ref
- M.B. Taylor et al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs. IEEE Micro vol 22, Issue 2, 2002.]] Google Scholar
Digital Library
- M.B. Taylor, W. Lee, J. Miller, D. Wentzlaff, et al. Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams. In ISCA, Munich, Germany, June 2004.]] Google Scholar
Digital Library
- W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A Language for Streaming Applications. In CC, France, 2002.]] Google Scholar
Digital Library
- E.Waingold, M. Taylor, D. Srikrishna, V. Sarkar,W. Lee, et al. Baring It All to Software: Raw Machines. IEEE Computer, 30(9), 1997.]] Google Scholar
Digital Library
- S. wei Liao, Z. Du, G. Wu, and G.-Y. Lueh. Data and Computation Transformations for Brook Streaming Applications on Multiprocessors. In CGO, 2006.]] Google Scholar
Digital Library
- D. Zhang, Z.-Z. Li, H. Song, and L. Liu. A Programming Model for an Embedded Media Processing Architecture. In SAMOS, 2005.]] Google Scholar
Digital Library
Index Terms
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Recommendations
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 2006 ASPLOS ConferenceAs multicore architectures enter the mainstream, there is a pressing demand for high-level programming models that can effectively map to them. Stream programming offers an attractive way to expose coarse-grained parallelism, as streaming applications (...
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 2006 ASPLOS ConferenceAs multicore architectures enter the mainstream, there is a pressing demand for high-level programming models that can effectively map to them. Stream programming offers an attractive way to expose coarse-grained parallelism, as streaming applications (...
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 2006 ASPLOS ConferenceAs multicore architectures enter the mainstream, there is a pressing demand for high-level programming models that can effectively map to them. Stream programming offers an attractive way to expose coarse-grained parallelism, as streaming applications (...








Comments