Abstract
Many recent parallelization tools lower the barrier for parallelizing a program, but overlook one of the first questions that a programmer needs to answer: which parts of the program should I spend time parallelizing?
This paper examines Kremlin, an automatic tool that, given a serial version of a program, will make recommendations to the user as to what regions (e.g. loops or functions) of the program to attack first. Kremlin introduces a novel hierarchical critical path analysis and develops a new metric for estimating the potential of parallelizing a region: self-parallelism. We further introduce the concept of a parallelism planner, which provides a ranked order of specific regions to the programmer that are likely to have the largest performance impact when parallelized. Kremlin supports multiple planner personalities, which allow the planner to more effectively target a particular programming environment or class of machine.
We demonstrate the effectiveness of one such personality, an OpenMP planner, by comparing versions of programs that are parallelized according to Kremlin's plan against third-party manually parallelized versions. The results show that Kremlin's OpenMP planner is highly effective, producing plans whose performance is typically comparable to, and sometimes much better than, manual parallelization. At the same time, these plans would require that the user parallelize significantly fewer regions of the program.
- NAS Parallel Benchmarks 2.3; OpenMP C. www.hpcc.jp/Omni/.Google Scholar
- Spec OMP2001 Benchmarks. http://www.spec.org/omp.Google Scholar
- F. Allen, M. Burke, R. Cytron, J. Ferrante, W. Hsieh, and V. Sarkar. "A framework for determining useful parallelism". In Proceedings of the 2nd international conference on Supercomputing, ICS '88, 1988. Google Scholar
Digital Library
- T. E. Anderson, and E. D. Lazowska. "Quartz: A tool for tuning parallel program performance". In SIGMETRICS, vol. 18, 1990. Google Scholar
Digital Library
- T. Austin, and G. S. Sohi. "Dynamic dependency analysis of ordinary programs". In ISCA, 1992. Google Scholar
Digital Library
- Bailey et al. "The NAS parallel benchmarks. In SC, 1991.Google Scholar
- W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. Padua, W. Paek, Y. Pottenger, L. Rauchwerger, and P. Tu. "Parallel programming with Polaris". IEEE Computer, Aug 2002. Google Scholar
Digital Library
- J. M. Bull, and D. O'Neill. "A microbenchmark suite for openmp 2.0. SIGARCH Comput. Archit. News, December 2001. Google Scholar
Digital Library
- M. K. Chen, and K. Olukotun. "The Jrpm system for dynamically parallelizing Java programs". In ISCA, 2003. Google Scholar
Digital Library
- D. Dig, J. Marrero, and M. D. Ernst. "Refactoring sequential java code for concurrency via concurrent libraries". In ICSE, 2009. Google Scholar
Digital Library
- Z. H. Du, C. C. Lim, X. F. Li, C. Yang, Q. Zhao, and T. F. Ngai. "A cost-driven compilation framework for speculative parallelization of sequential programs". In PLDI, 2004. Google Scholar
Digital Library
- M. W. Hall, J. M. Anderson, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and E. Bu. "Maximizing multiprocessor performance with the SUIF compiler". IEEE Computer, Aug 1996. Google Scholar
Digital Library
- C. Hammacher, K. Streit, S. Hack, and A. Zeller. "Profiling java programs for parallelism". In Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering, IWMSE '09, 2009. Google Scholar
Digital Library
- Y. He, C. Leiserson, and W. Leiserson. "The Cilkview Scalability Analyzer". In SPAA, 2010. Google Scholar
Digital Library
- D. Jeon, S. Garcia, C. Louie, S. Kota Venkata, and M. Taylor. "Kremlin: Like gprof, but for Parallelization". In Principles and Practice of Parallel Programming, 2011. Google Scholar
Digital Library
- G. Jost, H. Jin, J. Labarta, and J. Gimenez. "Interfacing computer aided parallelization and performance analysis". In Computational Science ICCS 2003, vol. 2660 of Lecture Notes in Computer Science, 715--715. 2003. Google Scholar
Digital Library
- K. Kelsey, T. Bai, C. Ding, and C. Zhang. "Fast track: A software system for speculative program optimization". In CGO, 2009. Google Scholar
Digital Library
- K. Kennedy, K. S. McKinley, and C. W. Tseng. "Interactive parallel programming using the parascope editor". IEEE TPDS, 1991. Google Scholar
Digital Library
- M. Kim, H. Kim, and C.-K. Luk. "Sd3: A scalable approach to dynamic data-dependence profiling". Microarchitecture, IEEE/ACM International Symposium on, 2010. Google Scholar
Digital Library
- S. Kota Venkata, I. Ahn, D. Jeon, A. Gupta, C. Louie, S. Garcia, S. Belongie, and M. Taylor. "SD-VBS: The San Diego Vision Benchmark Suite". In IISWC, 2009. Google Scholar
Digital Library
- D. Kuck, Y. Muraoka, and S.-C. Chen. "On the number of operations simultaneously executable in fortran-like programs and their resulting speedup". IEEE Transactions on Computers, Dec. 1972. Google Scholar
Digital Library
- M. Kulkarni, M. Burtscher, R. Inkulu, K. Pingali, and C. Casçaval. "How much parallelism is there in irregular applications"? In PPoPP, 2009. Google Scholar
Digital Library
- M. Kumar. "Measuring parallelism in computation-intensive scientific/engineering applications". IEEE TOC, Sep 1988. Google Scholar
Digital Library
- J. R. Larus. "Loop-level parallelism in numeric and symbolic programs". IEEE Trans. Parallel Distrib. Syst., 1993. Google Scholar
Digital Library
- C. Lattner, and V. Adve. "LLVM: A compilation framework for lifelong program analysis & transformation". In CGO, Mar 2004. Google Scholar
Digital Library
- W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Babb, V. Sarkar, and S. Amarasinghe. "Space-time scheduling of instruction-level parallelism on a Raw machine". In ASPLOS, October 1998. Google Scholar
Digital Library
- C. E. Leiserson. "The Cilk concurrency platform. In DAC, 2009. Google Scholar
Digital Library
- S.-W. Liao, A. Diwan, R. P. Bosch, Jr., A. Ghuloum, and M. S. Lam. "Suif explorer: an interactive and interprocedural parallelizer". In Proceedings of the ACM SIGPLAN symposium on Principles and practice of parallel programming, 1999. Google Scholar
Digital Library
- W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau, and J. Torrellas. "POSH: a TLS compiler that exploits program structure". In PPoPP, 2006. Google Scholar
Digital Library
- N. Nethercote, and J. Seward. "Valgrind: A framework for heavyweight dynamic binary instrumentation". In PLDI, 2007. Google Scholar
Digital Library
- L. Rauchwerger, and D. Padua. "The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization". In PLDI, 1995. Google Scholar
Digital Library
- A. Rountev, K. Van Valkenburgh, D. Yan, and P. Sadayappan. "Understanding parallelism-inhibiting dependences in sequential java programs". In Software Maintenance (ICSM), 2010 IEEE International Conference on, Sept 2010. Google Scholar
Digital Library
- V. A. Saraswat, V. Sarkar, and C. von Praun. "X10: concurrent programming for modern architectures". In PPoPP, 2007. Google Scholar
Digital Library
- G. Sohi, S. Breach, and T. Vijaykumar. "Multiscalar processors". In ISCA, 1995. Google Scholar
Digital Library
- N. R. Tallent, and J. M. Mellor Crummey. "Effective performance measurement and analysis of multithreaded applications". In PPoPP, 2009. Google Scholar
Digital Library
- W. Thies, S. Hall, and S. Amarasinghe. "Manipulating lossless video in the compressed domain". In ACM Multimedia, 2009. Google Scholar
Digital Library
- C. Tian, M. Feng, V. Nagarajan, and R. Gupta. "Copy or discard execution model for speculative parallelization on multicores". In MICRO, 2008.Google Scholar
- G. Tournavitis, Z. Wang, B. Franke, and M. F. P. O'Boyle. "Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping". In PLDI, 2009. Google Scholar
Digital Library
- C. von Praun, R. Bordawekar, and C. Cascaval. "Modeling optimistic concurrency using quantitative dependence analysis". In PPoPP, 2008. Google Scholar
Digital Library
- J. Wloka, M. Sridharan, and F. Tip. "Refactoring for reentrancy". In FSE, 2009. Google Scholar
Digital Library
- P. Wu, A. Kejariwal, and C. Caşcaval. "Compiler-driven dependence profiling to guide program parallelization". In LCPC, 232--248. 2008. Google Scholar
Digital Library
- B. Xin, and X. Zhang. "Efficient online detection of dynamic control dependence". In ISSTA, 2007. Google Scholar
Digital Library
- X. Zhang, A. Navabi, and S. Jagannathan. "Alchemist: A transparent dependence distance profiling infrastructure". In CGO, 2009. Google Scholar
Digital Library
- Y. Zhang, and R. Gupta. "Timestamped whole program path representation and its applications". In PLDI, 2001. Google Scholar
Digital Library
- Q. Zhao, D. Bruening, and S. Amarasinghe. "Umbra: Efficient and scalable memory shadowing". In CGO, 2010. Google Scholar
Digital Library
- H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. "Uncovering hidden loop level parallelism in sequential applications". In HPCA, 2008.Google Scholar
Index Terms
Kremlin: rethinking and rebooting gprof for the multicore age
Recommendations
Kremlin: like gprof, but for parallelization
PPoPP '11This paper overviews Kremlin, a software profiling tool designed to assist the parallelization of serial programs. Kremlin accepts a serial source code, profiles it, and provides a list of regions that should be considered in parallelization. Unlike a ...
Kismet: parallel speedup estimates for serial programs
OOPSLA '11: Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applicationsSoftware engineers now face the difficult task of refactoring serial programs for parallel execution on multicore processors. Currently, they are offered little guidance as to how much benefit may come from this task, or how close they are to the best ...
Kremlin: rethinking and rebooting gprof for the multicore age
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationMany recent parallelization tools lower the barrier for parallelizing a program, but overlook one of the first questions that a programmer needs to answer: which parts of the program should I spend time parallelizing?
This paper examines Kremlin, an ...







Comments