ABSTRACT
In this paper we propose the Merge framework, a general purpose programming model for heterogeneous multi-core systems. The Merge framework replaces current ad hoc approaches to parallel programming on heterogeneous platforms with a rigorous, library-based methodology that can automatically distribute computation across heterogeneous cores to achieve increased energy and performance efficiency. The Merge framework provides (1) a predicate dispatch-based library system for managing and invoking function variants for multiple architectures; (2) a high-level, library-oriented parallel language based on map-reduce; and (3) a compiler and runtime which implement the map-reduce language pattern by dynamically selecting the best available function implementations for a given input and machine configuration. Using a generic sequencer architecture interface for heterogeneous accelerators, the Merge framework can integrate function variants for specialized accelerators, offering the potential for to-the-metal performance for a wide range of heterogeneous architectures, all transparent to the user. The Merge framework has been prototyped on a heterogeneous platform consisting of an Intel Core 2 Duo CPU and an 8-core 32-thread Intel Graphics and Media Accelerator X3000, and a homogeneous 32-way Unisys SMP system with Intel Xeon processors. We implemented a set of benchmarks using the Merge framework and enhanced the library with X3000 specific implementations, achieving speedups of 3.6x -- 8.5x using the X3000 and 5.2x -- 22x using the 32-way system relative to the straight C reference implementation on a single IA32 core.
Supplemental Material
Available for Download
Supplemental material for Merge: a programming model for heterogeneous multi-core systems
- E. Allen, D. Chase, J. Hallet, V. Luchangco, J.-W. Maessen, S. Ryu, G. L. Steele, and S. Tobin-Hochstadt. The Fortress language specification version 1.0beta. Technical report, Sun Microsystems, 2007.Google Scholar
- M. Annavram, E. Grochowski, and J. Shen. Mitigating Amdahl's law through EPI throttling. In Proc. of ISCA, pages 298--309, 2005. Google Scholar
Digital Library
- C. Barret and S. Berezin. CVC lite: A new implementation of cooperating validity checker. In Proc. of Conf. on Computer Aided Verification, pages 515--518, 2004.Google Scholar
Cross Ref
- I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: stream computing on graphics hardware. ACM Trans. Graph., 23(3):777--786, 2004. Google Scholar
Digital Library
- S. Chiba. A metaobject protocol for C++. In Proc. of OOPSLA, pages 285--299, 1995. Google Scholar
Digital Library
- J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In Proc. of OSDI, pages 137--149, 2004. Google Scholar
Digital Library
- P. Dubey. Recognition, mining and synthesis moves computers to the era of tera. [email protected] Magazine, 2005.Google Scholar
- M. Ernst, C. Kaplan, and C. Chambers. Predicate dispatching: A unified theory of dispatch. In European Conf. on Object-Oriented Programming, pages 186--211, 1998. Google Scholar
Digital Library
- K. Fatahalian, T. J. Knight, M. Houston, M. Erez, D. R. Horn, L. Leem, J. Y. Park, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequioa: Programming the memory hierarchy. In Proc. of ACM/IEEE Conf. on Supercomputing, page 83, 2006. Google Scholar
Digital Library
- E. Grochowski, R. Ronen, J. Shen, and H. Wang. Best of both latency and throughput. In Proc. of ICCD, pages 236--243, 2004. Google Scholar
Digital Library
- S. Z. Guyer and C. Lin. Annotation language for optimizing software libraries. In Proc. of Conf. on Domain Specific Languages, pages 39--52, 1999. Google Scholar
Digital Library
- R. Hankins, G. Chinya, J. D. Collins, P. Wang, R. Rakvic, H. Wang, and J. Shen. Multiple instruction stream processor. In Proc. of ISCA, pages 114--127, 2006. Google Scholar
Digital Library
- Intel. Intel C++ compiler. http://www3.intel.com/cd/software/products/asmo-na/eng/compilers/284132.htm.Google Scholar
- Intel. Intel threading building blocks. http://www3.intel.com/cd/software/products/asmo-na/eng/294797.htm.Google Scholar
- U. Kapasi, S. Rixner, W. J. Dally, B. Khailany, J. H. Ahn, P. Mattson, and J. Owens. Programmable stream processors. IEEE Computer, 36(8):54--62, 2003. Google Scholar
Digital Library
- K. Kennedy, B. Broom, A. Chauhan, R. Fowler, J. Garvin, C. Koelbel, C. McCosh, and J. Mellor-Crummey. Telescoping languages: A system for automatic generation of domain languages. Proc. of the IEEE, 93:378--408, 2005.Google Scholar
Cross Ref
- V. Kuncak, P. Lam, and M. Rinard. Role analysis. ACM SIGPLAN Notices, 37:17--32, 2002. Google Scholar
Digital Library
- P. Lyman and H. R. Varian. How much information. http://www.sims.berkeley.edu/how-much-info-2003, 2003.Google Scholar
- W. Mark, R. Glanville, K. Akeley, and M. Kilgard. Cg: A system for programming graphics hardware in a C-like language. ACM Trans. Graph., 22(3):896--907, 2003. Google Scholar
Digital Library
- M. McCool and S. Toit. Metaprogramming GPUs with Sh. A K Peters, 2004. Google Scholar
Digital Library
- M. McCool, K. Wadleigh, B. Henderson, and H. Y. Lin. Performance evaluation of GPUs using the RapidMind development platform. In Proc. of ACM/IEEE Conf. on Supercomputing, page 81, 2006. Google Scholar
Digital Library
- T. Millstein. Practical predicate dispatch. In Proc. of OOPSLA, pages 345--264, 2004. Google Scholar
Digital Library
- A. Nayak, M. Haldar, A. Kanhere, P. Joisha, N. Shenoy, A. Choudhary, and P. Banerjee. A library based compiler to execute MATLAB programs on a heterogeneous platform. In Proc. of Conf. on Parallel and Distributed Computing Systems, 2000.Google Scholar
- NVidia. Cuda. http://developer.nvidia.com/object/cuda.html.Google Scholar
- Peakstream. The PeakStream platform: High productivity software development for multi-core processors. Technical report, PeakStream Inc., 2006.Google Scholar
- D. Pham, S. Asano, M. Bolliger, M. N. Day, H. P. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, M. Riley, D. Shippy, D. Stasiak, M. Suzuoki, M. Wang, J. Warnock, S. Weitzel, D. Wendel, T. Yamazaki, and K. Yazawa. The design and implementation of a first-generation CELL processor. In Proc. of ISSCC, pages 184--185, 2005.Google Scholar
Cross Ref
- C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In Proc. of HPCA, pages 13--24, 2007. Google Scholar
Digital Library
- M. Segal and M. Peercy. A performance-oriented data parallel virtual machines for GPUs. Technical report, ATI Technologies, 2006.Google Scholar
- D. Tarditi, S. Puri, and J. Oglesby. Accelerator: Using data parallelism to program GPUs for general-purpose uses. In Proc. of ASPLOS, pages 325--335, 2006. Google Scholar
Digital Library
- W. Thies, M. Karczmarek, and S. Amarainghe. StreamIt: A language for streaming applications. In Proc. of Conf. on Compiler Construction, pages 49--84, 2002. Google Scholar
Digital Library
- P. H. Wang, J. D. Collins, G. N. Chinya, H. Jiang, X. Tian, M. Girkar, N. Y. Yang, G.-Y. Lueh, and H. Wang. EXOCHI: Architecture and programming environment for a heterogeneous multi-core multithreaded system. In Proc. of PLDI, pages 156--166, 2007. Google Scholar
Digital Library
- Z. S. Zumsteg, C. Kemere, S. O'Driscoll, G. Santhanam, R. E. Ahmed, K. V. Shenoy, and T. H. Meng. Power feasibility of implantable digital spike sorting circuits for neural prosthetic systems. IEEE Trans Neural Syst Rehabil Eng, 13(3):272--279, 2005.Google Scholar
Cross Ref
Index Terms
Merge: a programming model for heterogeneous multi-core systems
Recommendations
Merge: a programming model for heterogeneous multi-core systems
ASPLOS '08In this paper we propose the Merge framework, a general purpose programming model for heterogeneous multi-core systems. The Merge framework replaces current ad hoc approaches to parallel programming on heterogeneous platforms with a rigorous, library-...
Merge: a programming model for heterogeneous multi-core systems
ASPLOS '08In this paper we propose the Merge framework, a general purpose programming model for heterogeneous multi-core systems. The Merge framework replaces current ad hoc approaches to parallel programming on heterogeneous platforms with a rigorous, library-...
Merge: a programming model for heterogeneous multi-core systems
ASPLOS '08In this paper we propose the Merge framework, a general purpose programming model for heterogeneous multi-core systems. The Merge framework replaces current ad hoc approaches to parallel programming on heterogeneous platforms with a rigorous, library-...









Comments