Abstract
In this paper we present an approach to parallel programming called SpiceC. SpiceC simplifies the task of parallel programming through a combination of an intuitive computation model and SpiceC directives. The SpiceC parallel computation model consists of multiple threads where every thread has a private space for data and all threads share data via a shared space. Each thread performs computations using its private space thus offering isolation which allows for speculative computations. SpiceC provides easy to use SpiceC compiler directives using which the programmers can express different forms of parallelism. It allows developers to express high level constraints on data transfers between spaces while the tedious task of generating the code for the data transfers is performed by the compiler. SpiceC also supports data transfers involving dynamic data structures without help from developers. SpiceC allows developers to create clusters of data to enable parallel data transfers. SpiceC programs are portable across modern chip multiprocessor based machines that may or may not support cache coherence. We have developed implementations of SpiceC for shared memory systems with and without cache coherence. We evaluate our implementation using seven benchmarks of which four are parallelized speculatively. Our compiler generated implementations achieve speedups ranging from 2x to 18x on a 24 core system.
- M. Azimi, N. Cherukuri, D. N. Jayasimha, A. Kumar, P. Kundu, S. Park, I. Schoinas, and A. S. Vaidya. Integration challenges and tradeoffs for tera-scale architectures. Intel Technology Journal, 11 (3): 173--184, 2007.Google Scholar
Cross Ref
- G. Bikshandi, J. Guo, D. Hoeflinger, G. Almasi, B. B. Fraguela, M. J. Garzarn, D. Padua, and C. V. Praun. Programming for parallelism and locality with hierarchically tiled arrays. In PPoPP, pages 48--57, 2006. Google Scholar
Digital Library
- K. D. Bosschere, W. Luk, X. Martorell, N. Navarro, M. O'Boyle, D. Pnevmatikatos, A. Ramirez, P. Sainrat, A. Seznec, P. Stenstrom, and O. Temam. High-performance embedded architecture and compilation roadmap. In Transactions on HiPEAC, volume 1, pages 5--29, 2007. Google Scholar
Digital Library
- B. L. Chamberlain, D. Callahan, and H. P. Zima. Parallel programmability and the chapel language. International Journal of High Performance Computing Applications, 21 (3): 291--312, 2007. Google Scholar
Digital Library
- P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In OOPSLA, 2005. Google Scholar
Digital Library
- A. A. Chien and W. J. Dally. Concurrent aggregates. In PPoPP, pages 187--196, 1990. Google Scholar
Digital Library
- J. Chung, H. Chafi, C. Minh, A. McDonald, B. Carlstrom, C. Kozyrakis, and K. Olukotun. The common case transactional behavior of multithreaded programs. In HPCA, pages 266--277, 2006.Google Scholar
- U. Consortium. UPC language specifications, v1.2. Lawrence Berkeley National Lab Tech Report LBNL-59208, 2005.Google Scholar
Cross Ref
- L. Dagum and R. Menon. Openmp: An industry-standard api for shared-memory programming. IEEE computational science & engineering, 5 (1): 46--55, 1998. Google Scholar
Digital Library
- C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In PLDI, 2007. Google Scholar
Digital Library
- Y. Dotsenko, C. Coarfa, and J. Mellor-Crummey. A multi-platform co-array fortran compiler. In PACT, 2004. Google Scholar
Digital Library
- K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston, J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the memory hierarchy. In SC, 2006. Google Scholar
Digital Library
- W. Gropp, E. Lusk, and A. Skjellum. Using MPI: Portable Parallel Programming with the Message Passing Interface. The MIT Press, 1994. Google Scholar
Digital Library
- P. Hilfinger, D. Bonachea, K. Datta, D. Gay, S. Graham, B. Liblit, G. Pike, J. Su, and K. Yelick. Titanium language reference manual. U.C. Berkeley Tech Report, UCB/EECS-2005-15, 2005.Google Scholar
- K. Kelsey, T. Bai, C. Ding, and C. Zhang. Fast track: A software system for speculative program optimization. In CGO, 2009. Google Scholar
Digital Library
- M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In PLDI, pages 211--222, 2007. Google Scholar
Digital Library
- M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Optimistic parallelism benefits from data partitioning. In ASPLOS, pages 233--243, 2008. Google Scholar
Digital Library
- M. Kulkarni, M. Burtscher, R. Inkulu, K. Pingali, and C. Cascaval. How much parallelism is there in irregular applications? In PPoPP, 2009. Google Scholar
Digital Library
- M. Mehrara, J. Hao, P.-C. Hsu, and S. Mahlke. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In PLDI, pages 166--176, 2009. Google Scholar
Digital Library
- J. M. Mellor-Crummey and M. L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst., 9 (1): 21--65, 1991. Google Scholar
Digital Library
- M. Mendez-Lojo, D. Nguyen, D. Prountzos, X. Sui, M. A. Hassaan, M. Kulkarni, M. Burtscher, and K. Pingali. Structure-drive optimization for amorphous data-parallel programs. In PPoPP, 2010. Google Scholar
Digital Library
- P. Prabhu, G. Ramalingam, and K. Vaswani. Safe programmable speculative parallelism. In PLDI, pages 50--61, 2010. Google Scholar
Digital Library
- D. Quinlan. Rose: Compiler support for object-oriented framework. In CPC, 2000.Google Scholar
Cross Ref
- R. Rangan, N. Vachharajani, M. Vachharajani, and D. I. August. Decoupled software pipelining with the synchronization array. In PACT, 2004. Google Scholar
Digital Library
- J. Reinders. Intel threading building blocks: outfitting C for multi-core processor. O'Reilly Media, 2007. Google Scholar
Digital Library
- M. F. Spear, L. Dalessandro, V. J. Marathe, and M. L. Scott. A comprehensive strategy for contention management in software transactional memory. In PPoPP, 2009. Google Scholar
Digital Library
- C. Tian, M. Feng, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In MICRO, 2008.Google Scholar
- C. Tian, M. Feng, and R. Gupta. Supporting speculative parallelization in the presence of dynamic data structures. In PLDI, pages 62--73, 2010. Google Scholar
Digital Library
- C. Tian, M. Feng, and R. Gupta. Speculative parallelization using state separation and multiple value prediction. In ISMM, 2010. Google Scholar
Digital Library
- T.-H. Weng, R.-K. Perng, and B. Chapman. Openmp implementation of SPICE3 circuit simulator. Int. J. Parallel Program., 35 (5): 493--505, 2007. Google Scholar
Digital Library
- J. Zhou and B. Demsky. Bamboo: a data-centric, object-oriented approach to many-core software. In PLDI, 2010. Google Scholar
Digital Library
Index Terms
SpiceC: scalable parallelism via implicit copying and explicit commit
Recommendations
SpiceC: scalable parallelism via implicit copying and explicit commit
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingIn this paper we present an approach to parallel programming called SpiceC. SpiceC simplifies the task of parallel programming through a combination of an intuitive computation model and SpiceC directives. The SpiceC parallel computation model consists ...
Parallelization of NAS benchmarks for shared memory multiprocessors
AbstractThis paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high ...







Comments