skip to main content
10.1145/1346281.1346319acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Streamware: programming general-purpose multicore processors using streams

Published:01 March 2008Publication History

ABSTRACT

Recently, the number of cores on general-purpose processors has been increasing rapidly. Using conventional programming models, it is challenging to effectively exploit these cores for maximal performance. An interesting alternative candidate for programming multiple cores is the stream programming model, which provides a framework for writing programs in a sequential-style while greatly simplifying the task of automatic parallelization. It has been shown that not only traditional media/image applications but also more general-purpose data-intensive applications can be expressed in the stream programming style.

In this paper, we investigate the potential to use the stream programming model to efficiently utilize commodity multicore general-purpose processors (e.g., Intel/AMD). Although several stream languages and stream compilers have recently been developed, they typically target special-purpose stream processors. In contrast, we propose a flexible software system, Streamware, which automatically maps stream programs onto a wide variety of general-purpose multicore processor configurations. We leverage existing compilation framework for stream processors and design a runtime environment which takes as input the output of these stream compilers in the form of machine-independent stream virtual machine code. The runtime environment assigns work to processor cores considering processor/cache configurations and adapts to workload variations. We evaluate this approach for a few general-purpose scientific applications on real hardware and a cycle-level simulator set-up to showcase scaling and contention issues. The results show that the stream programming model is a good choice for efficiently exploiting modern and future multicore CPUs for an important class of applications.

Skip Supplemental Material Section

Supplemental Material

Video

References

  1. Intel Thread Building Blocks. osstbb.intel.com.Google ScholarGoogle Scholar
  2. MPI. www.open-mpi.org.Google ScholarGoogle Scholar
  3. NVidia G80. www.nvidia.com.Google ScholarGoogle Scholar
  4. OpenMP. www.openmp.org.Google ScholarGoogle Scholar
  5. RStream Compiler. www.reservoir.com.Google ScholarGoogle Scholar
  6. T. Barth. Simplified discontinuous Galerkin methods for systems of conservation laws with convex extension. In Discontinuous Galerkin Methods, volume 11 of Lecture Notes in Computational Science and Engineering. Springer-Verlag, Heidelberg, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  7. Y. Basar and M. Itskov. Constitutive model and finite element formulation for large strain elasto-plastic analysis of shells. In Journal of Computational Mechanics, Jun 1999.Google ScholarGoogle Scholar
  8. N. Binkert, E. Hallnor, and S. Reinhardt. Network-oriented full system simulation using M5. In CAECW, 2003.Google ScholarGoogle Scholar
  9. Bratin Saha et al. McRT-STM: a high performance software transactional memory system for a multi-core runtime. In PPoPP, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bratin Saha et al. Enabling scalability and performance in a large scale CMP environment. In Eurosys, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream computing on graphics hardware. In SIGGRAPH, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Ranger et al. Evaluating MapReduce for Multicore and Multiprocessor Systems. In HPCA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. W. Dally, P. Hanrahan, M. Erez, T. J. Knight, F. Labonte, J.-H. Ahn, N. Jayasena, U. J. Kapasi, A. Das, J. Gummaraju, and I. Buck. Merrimac: Supercomputing with streams. In SC, Nov 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Das, W. Dally, and P. Mattson. Compiling for Stream Processing. In PACT, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Gordon,W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In ASPLOS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Gummaraju, M. Erez, J. Coburn, M. Rosenblum, and W. Dally. Architectural Support for the Stream Execution Model on General-Purpose Processors. In PACT, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Gummaraju and M. Rosenblum. Stream Programming on General-Purpose Processors. In International Symposium on Microarchitecture, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. In ISCA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. P. Hofstee. Power efficient processor architecture and the Cell processor. In HPCA, Feb 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Leverich et al. Comparing Memory Systems for Chip Multiprocessors. In ISCA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Fatahalian et al. Sequoia: Programming the Memory Hierarchy. In SC, Nov 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Mahesh et al. Large eddy simulation of reacting turbulent flows in complex geometries. ASME J. of Applied Mechanics, May 2006.Google ScholarGoogle ScholarCross RefCross Ref
  24. K. Yelick et al. Titanium: A high-performance Java dialect. In ACM Workshop on Java for High-Performance Network Computing, Feb 1998.Google ScholarGoogle ScholarCross RefCross Ref
  25. U. Kapasi, W. Dally, S. Rixner, J. Owens, and B. Khailany. The Imagine stream processor. In ICCD, Sep 2002.Google ScholarGoogle ScholarCross RefCross Ref
  26. F. Labonte, P. Mattson, I. Buck, C. Kozyrakis, and M. Horowitz. The Stream Virtual Machine. In PACT, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. B. Taylor et al. The Raw microprocessor: a computational fabric for software circuits and general-purpose programs. IEEE Micro, 22:25.35, March 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Erez and J. Ahn and J. Gummaraju and M. Rosenblum and W. Dally. Executing Irregular Scientific Applications on Stream Architectures. In ICS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Gordon et al. A Stream Compiler for Communication-Exposed Architectures. In ASPLOS, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Houston et al. A Portable Run-time Interface for Multi-level Memory Hierarchies. In PPoPP, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Isard et al. Dryad: Distributed Data Parallel Programs from Sequential Building Blocks. In Eurosys, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. D. McCool. Data-parallel programming on Cell BE and the GPU using the Rapidmind development platform. In GSPx Multicore Applications Conference, 2006.Google ScholarGoogle Scholar
  33. P. Charles et al. X10: An object-oriented approach to non-uniform cluster computing. In OOPSLA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. T. Knight et al. Sequoia: Programming the Memory Hierarchy. In PPoPP, 2007.Google ScholarGoogle Scholar
  35. D. Tam, R. Azimi, and M. Stumm. Thread Clustering: A Share-aware Scheduling on SMP-CMP-SMT Multiprocessors. In EuroSys, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. D. Tarditi, S. Puri, and J. Oglesby. ACCELERATOR: Using dataparallelism to program GPUs for general-purpose uses. In ASPLOS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A language for streaming applications. In ICCC, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. R. Vuduc, J. W. Demmel, K. A. Yelick, S. Kamil, R. Nishtala, and B. Lee. Performance optimizations and bounds for sparse matrixvector multiply. SC, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. D. Wang, B. Ganesh, N. T. K. B. A. Jaleel, and B. Jacob. DRAMsim: A memory system simulator. In SIGARCH Computer Architecture News, September 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. D. Zhang, Q. Li, R. Rabbah, and S. Amarasinghe. A Lightweight Streaming Layer for Multicore Execution. In Workshop on Design, Architecture, and Simulation of Chip Multiprocessors, Dec 2007.Google ScholarGoogle Scholar

Index Terms

  1. Streamware: programming general-purpose multicore processors using streams

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!