ABSTRACT
Recently, the number of cores on general-purpose processors has been increasing rapidly. Using conventional programming models, it is challenging to effectively exploit these cores for maximal performance. An interesting alternative candidate for programming multiple cores is the stream programming model, which provides a framework for writing programs in a sequential-style while greatly simplifying the task of automatic parallelization. It has been shown that not only traditional media/image applications but also more general-purpose data-intensive applications can be expressed in the stream programming style.
In this paper, we investigate the potential to use the stream programming model to efficiently utilize commodity multicore general-purpose processors (e.g., Intel/AMD). Although several stream languages and stream compilers have recently been developed, they typically target special-purpose stream processors. In contrast, we propose a flexible software system, Streamware, which automatically maps stream programs onto a wide variety of general-purpose multicore processor configurations. We leverage existing compilation framework for stream processors and design a runtime environment which takes as input the output of these stream compilers in the form of machine-independent stream virtual machine code. The runtime environment assigns work to processor cores considering processor/cache configurations and adapts to workload variations. We evaluate this approach for a few general-purpose scientific applications on real hardware and a cycle-level simulator set-up to showcase scaling and contention issues. The results show that the stream programming model is a good choice for efficiently exploiting modern and future multicore CPUs for an important class of applications.
Supplemental Material
Available for Download
Supplemental material for Streamware: programming general-purpose multicore processors using streams
- Intel Thread Building Blocks. osstbb.intel.com.Google Scholar
- MPI. www.open-mpi.org.Google Scholar
- NVidia G80. www.nvidia.com.Google Scholar
- OpenMP. www.openmp.org.Google Scholar
- RStream Compiler. www.reservoir.com.Google Scholar
- T. Barth. Simplified discontinuous Galerkin methods for systems of conservation laws with convex extension. In Discontinuous Galerkin Methods, volume 11 of Lecture Notes in Computational Science and Engineering. Springer-Verlag, Heidelberg, 1999.Google Scholar
Cross Ref
- Y. Basar and M. Itskov. Constitutive model and finite element formulation for large strain elasto-plastic analysis of shells. In Journal of Computational Mechanics, Jun 1999.Google Scholar
- N. Binkert, E. Hallnor, and S. Reinhardt. Network-oriented full system simulation using M5. In CAECW, 2003.Google Scholar
- Bratin Saha et al. McRT-STM: a high performance software transactional memory system for a multi-core runtime. In PPoPP, 2006. Google Scholar
Digital Library
- Bratin Saha et al. Enabling scalability and performance in a large scale CMP environment. In Eurosys, 2007. Google Scholar
Digital Library
- I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream computing on graphics hardware. In SIGGRAPH, 2004. Google Scholar
Digital Library
- C. Ranger et al. Evaluating MapReduce for Multicore and Multiprocessor Systems. In HPCA, 2007. Google Scholar
Digital Library
- W. Dally, P. Hanrahan, M. Erez, T. J. Knight, F. Labonte, J.-H. Ahn, N. Jayasena, U. J. Kapasi, A. Das, J. Gummaraju, and I. Buck. Merrimac: Supercomputing with streams. In SC, Nov 2003. Google Scholar
Digital Library
- A. Das, W. Dally, and P. Mattson. Compiling for Stream Processing. In PACT, 2006. Google Scholar
Digital Library
- M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI, 1998. Google Scholar
Digital Library
- M. Gordon,W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In ASPLOS, 2006. Google Scholar
Digital Library
- J. Gummaraju, M. Erez, J. Coburn, M. Rosenblum, and W. Dally. Architectural Support for the Stream Execution Model on General-Purpose Processors. In PACT, 2007. Google Scholar
Digital Library
- J. Gummaraju and M. Rosenblum. Stream Programming on General-Purpose Processors. In International Symposium on Microarchitecture, 2005. Google Scholar
Digital Library
- M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. In ISCA, 1993. Google Scholar
Digital Library
- H. P. Hofstee. Power efficient processor architecture and the Cell processor. In HPCA, Feb 2005. Google Scholar
Digital Library
- J. Leverich et al. Comparing Memory Systems for Chip Multiprocessors. In ISCA, 2007. Google Scholar
Digital Library
- K. Fatahalian et al. Sequoia: Programming the Memory Hierarchy. In SC, Nov 2006. Google Scholar
Digital Library
- K. Mahesh et al. Large eddy simulation of reacting turbulent flows in complex geometries. ASME J. of Applied Mechanics, May 2006.Google Scholar
Cross Ref
- K. Yelick et al. Titanium: A high-performance Java dialect. In ACM Workshop on Java for High-Performance Network Computing, Feb 1998.Google Scholar
Cross Ref
- U. Kapasi, W. Dally, S. Rixner, J. Owens, and B. Khailany. The Imagine stream processor. In ICCD, Sep 2002.Google Scholar
Cross Ref
- F. Labonte, P. Mattson, I. Buck, C. Kozyrakis, and M. Horowitz. The Stream Virtual Machine. In PACT, 2004. Google Scholar
Digital Library
- M. B. Taylor et al. The Raw microprocessor: a computational fabric for software circuits and general-purpose programs. IEEE Micro, 22:25.35, March 2002. Google Scholar
Digital Library
- M. Erez and J. Ahn and J. Gummaraju and M. Rosenblum and W. Dally. Executing Irregular Scientific Applications on Stream Architectures. In ICS, 2007. Google Scholar
Digital Library
- M. Gordon et al. A Stream Compiler for Communication-Exposed Architectures. In ASPLOS, 2002. Google Scholar
Digital Library
- M. Houston et al. A Portable Run-time Interface for Multi-level Memory Hierarchies. In PPoPP, 2008. Google Scholar
Digital Library
- M. Isard et al. Dryad: Distributed Data Parallel Programs from Sequential Building Blocks. In Eurosys, 2007. Google Scholar
Digital Library
- M. D. McCool. Data-parallel programming on Cell BE and the GPU using the Rapidmind development platform. In GSPx Multicore Applications Conference, 2006.Google Scholar
- P. Charles et al. X10: An object-oriented approach to non-uniform cluster computing. In OOPSLA, 2005. Google Scholar
Digital Library
- T. Knight et al. Sequoia: Programming the Memory Hierarchy. In PPoPP, 2007.Google Scholar
- D. Tam, R. Azimi, and M. Stumm. Thread Clustering: A Share-aware Scheduling on SMP-CMP-SMT Multiprocessors. In EuroSys, 2007. Google Scholar
Digital Library
- D. Tarditi, S. Puri, and J. Oglesby. ACCELERATOR: Using dataparallelism to program GPUs for general-purpose uses. In ASPLOS, 2006. Google Scholar
Digital Library
- W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A language for streaming applications. In ICCC, 2002. Google Scholar
Digital Library
- R. Vuduc, J. W. Demmel, K. A. Yelick, S. Kamil, R. Nishtala, and B. Lee. Performance optimizations and bounds for sparse matrixvector multiply. SC, 2002. Google Scholar
Digital Library
- D. Wang, B. Ganesh, N. T. K. B. A. Jaleel, and B. Jacob. DRAMsim: A memory system simulator. In SIGARCH Computer Architecture News, September 2005. Google Scholar
Digital Library
- D. Zhang, Q. Li, R. Rabbah, and S. Amarasinghe. A Lightweight Streaming Layer for Multicore Execution. In Workshop on Design, Architecture, and Simulation of Chip Multiprocessors, Dec 2007.Google Scholar
Index Terms
Streamware: programming general-purpose multicore processors using streams
Recommendations
Streamware: programming general-purpose multicore processors using streams
ASPLOS '08Recently, the number of cores on general-purpose processors has been increasing rapidly. Using conventional programming models, it is challenging to effectively exploit these cores for maximal performance. An interesting alternative candidate for ...
Streamware: programming general-purpose multicore processors using streams
ASPLOS '08Recently, the number of cores on general-purpose processors has been increasing rapidly. Using conventional programming models, it is challenging to effectively exploit these cores for maximal performance. An interesting alternative candidate for ...
Streamware: programming general-purpose multicore processors using streams
ASPLOS '08Recently, the number of cores on general-purpose processors has been increasing rapidly. Using conventional programming models, it is challenging to effectively exploit these cores for maximal performance. An interesting alternative candidate for ...









Comments