Abstract
We present QuickStep, a novel system for parallelizing sequential programs. Unlike standard parallelizing compilers (which are designed to preserve the semantics of the original sequential computation), QuickStep is instead designed to generate (potentially nondeterministic) parallel programs that produce acceptably accurate results acceptably often. The freedom to generate parallel programs whose output may differ (within statistical accuracy bounds) from the output of the sequential program enables a dramatic simplification of the compiler, a dramatic increase in the range of applications that it can parallelize, and a significant expansion in the range of parallel programs that it can legally generate.
Results from our benchmark set of applications show that QuickStep can automatically generate acceptably accurate and efficient parallel programs---the automatically generated parallel versions of five of our six benchmark applications run between 5.0 and 7.8 times faster on eight cores than the original sequential versions. These applications and parallelizations contain features (such as the use of modern object-oriented programming constructs or desirable parallelizations with infrequent but acceptable data races) that place them inherently beyond the reach of standard approaches.
- Aleen, F. and Clark, N. 2009. Commutativity analysis for software parallelization: Letting program transformations see the big picture. In Proceedings of ASPLOS. Google Scholar
Digital Library
- Barnes, J. and Hut, P. 1986. A hierarchical O(NlogN) force calculation algorithm. Nature 324, 4, 446--449.Google Scholar
Cross Ref
- Berger, E. and Zorn, B. 2006. DieHard: Probabilistic memory safety for unsafe languages. In Proceedings of PLDI. Google Scholar
Digital Library
- Blume, W. and Eigenmann, R. 1992. Performance analysis of parallelizing compilers on the Perfect Benchmarks programs. IEEE Trans. Parallel Distrib. Syst. 3, 6. Google Scholar
Digital Library
- Blume, W., Eigenmann, R., Faigin, K., Grout, J., Hoeflinger, J., Padua, D., Petersen, P., Pottenger, W., Raughwerger, L., Tu, P., and Weatherford, S. 1995. Effective automatic parallelization with Polaris. Int. J. Parallel Program.Google Scholar
- Bolosky, W. and Scott, M. 1993. False sharing and its effect on shared memory performance. In Proceedings of SEDMS. Google Scholar
Digital Library
- Bridges, M., Vachharajani, N., Zhang, Y., Jablin, T., and August, D. 2007. Revisiting the sequential programming model for multi-core. In Proceedings of MICRO. Google Scholar
Digital Library
- Browning, R., Li, T., Chui, B., Ye, J., Pease, R., Czyzewski, Z., and Joy, D. 1995. Low-energy electron/atom elastic scattering cross sections for 0.1-30keV. Scanning 17, 4, 250--253.Google Scholar
Cross Ref
- Carbin, M., Misailovic, S., Kling, M., and Rinard, M. 2011. Detecting and escaping infinite loops with Jolt. In Proceedings of ECOOP. Google Scholar
Digital Library
- Chaudhuri, S., Gulwani, S., Lublinerman, R., and Navidpour, S. 2011. Proving programs robust. In Proceedings of ESEC/FSE. Google Scholar
Digital Library
- Dagum, L. and Menon, R. 1998. OpenMP: An industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5, 1, 46--55. Google Scholar
Digital Library
- Demsky, B. and Rinard, M. 2005. Data structure repair using goal-directed reasoning. In Proceedings of ICSE. Google Scholar
Digital Library
- Demsky, B., Ernst, M., Guo, P., McCamant, S., Perkins, J., and Rinard, M. 2006. Inference and enforcement of data structure consistency specifications. In Proceedings of ISSTA. Google Scholar
Digital Library
- Ding, C., Shen, X., Kelsey, K., Tice, C., Huang, R., and Zhang, C. 2007. Software behavior oriented parallelization. In Proceedings of PLDI. Google Scholar
Digital Library
- Ding, Y. and Li, Z. 2003. An adaptive scheme for dynamic parallelization. In Proceedings of LCPC, H. Dietz Ed., Lecture Notes in Computer Science, vol. 2624. Springer-Verlag, 274--289. Google Scholar
Digital Library
- Dinning, A. and Schonberg, E. 1991. Detecting access anomalies in programs with critical sections. In Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging. Google Scholar
Digital Library
- Hall, M., Anderson, J., Amarasinghe, S., Murphy, B., Liao, S., Bugnion, E., and Lam, M. 1996. Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer. Google Scholar
Digital Library
- Harris, J., Lazaratos, S., and Michelena, R. 1990. Tomographic string inversion. In Proceedings of the 60th Annual International Meeting, Society of Exploration and Geophysics, Extended Abstracts.Google Scholar
- Herlihy, M and Moss, J. 1993. Transactional memory: Architectural support for lock-free data structures. In Proceedings of ISCA. Google Scholar
Digital Library
- Hoeffding, W. 1963. Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 301, 13--30.Google Scholar
Cross Ref
- Hoffmann, H., Misailovic, S., Sidiroglou, S., Agarwal, A., and Rinard, M. 2009. Using code perforation to improve performance, reduce energy consumption, and respond to failures. Tech. rep. MIT-CSAIL-TR-2009-042, MIT, Cambridge, MA.Google Scholar
- Jin, G., Song, L., Zhang, W., Lu, S., and Liblit, B. 2011. Automated atomicity-violation fixing. In Proceedings of PLDI. Google Scholar
Digital Library
- Kim, D. and Rinard, M. C. 2011. Verification of semantic commutativity conditions and inverse operations on linked data structures. In Proceedings of PLDI. Google Scholar
Digital Library
- Kirsch, C., Payer, H., Röck, H., and Sokolova, A. 2011. Performance, scalability, and semantics of concurrent FIFO queues. Tech. rep. 2011-03, Department of Computer Sciences, University of Salzburg.Google Scholar
- Lattner, C. and Adve, V. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of CGO. Google Scholar
Digital Library
- Meng, J., Chakradhar, S., and Raghunathan, A. 2009. Best-effort parallel execution framework for recognition and mining applications. In Proceedings of IPDPS. Google Scholar
Digital Library
- Meng, J., Raghunathan, A., Chakradhar, S., and Byna, S. 2010. Exploiting the forgiving nature of applications for scalable parallel execution. In Proceedings of IPDPS.Google Scholar
- Misailovic, S., Kim, D., and Rinard, M. 2010a. Automatic parallelization with statistical accuracy bounds. Tech. rep. MIT-CSAIL-TR-2010-007, MIT, Cambridge, MA.Google Scholar
- Misailovic, S., Kim, D., and Rinard, M. 2010b. Parallelizing sequential programs with statistical accuracy tests. Tech. rep. MIT-CSAIL-TR-2010-038, MIT, Cambridge, MA.Google Scholar
- Misailovic, S., Roy, D., and Rinard, M. 2011a. Probabilistic and statistical analysis of perforated patterns. Tech. rep. MIT-CSAIL-TR-2011-003, MIT, Cambridge, MA.Google Scholar
- Misailovic, S., Roy, D., and Rinard, M. 2011b. Probabilistically accurate program transformations. In Proceedings of SAS. Google Scholar
Digital Library
- Misailovic, S., Sidiroglou, S., Hoffmann, H., and Rinard, M. 2010. Quality of service profiling. In Proceedings of ICSE. Google Scholar
Digital Library
- Nguyen, H. and Rinard, M. 2007. Detecting and eliminating memory leaks using cyclic memory allocation. In Proceedings of ISMM. Google Scholar
Digital Library
- Nieh, J. and Levoy, M. 1992. Volume rendering on scalable shared-memory MIMD architectures. Tech. rep. CSL-TR-92-537, Computer Systems Laboratory, Stanford Univ., Stanford, CA.Google Scholar
- Perkins, J., Kim, S., Larsen, S., Amarasinghe, S., Bachrach, J., Carbin, M., Pacheco, C., Sherwood, F., Sidiroglou, S., Sullivan, G., Wong, W., Zibin, Y., Ernst, M. D., and Rinard, M. 2009. Automatically patching errors in deployed software. In Proceedings of SOSP. Google Scholar
Digital Library
- Prabhu, M. and Olukotun, K. 2005. Exposing speculative thread parallelism in SPEC2000. In Proceedings of PPoPP. Google Scholar
Digital Library
- Rauchwerger, L. and Padua, D. 1995. The LRPD test: Speculative runtime parallelization of loops with privatization and reduction parallelization. In Proceedings of PLDI. Google Scholar
Digital Library
- Rauchwerger, L., Amato, N., and Padua, D. 1995. Runtime methods for parallelizing partially parallel loops. In Proceedings of ICS. Google Scholar
Digital Library
- Rinard, M. 1994. The design, implementation and evaluation of Jade, a portable, implicitly parallel programming language. Ph.D. dissertation, Dept. of Computer Science, Stanford Univ., Stanford, CA. Google Scholar
Digital Library
- Rinard, M. 2003. Acceptability-oriented computing. In Proceedings of OOPSLA Onwards! Session. Google Scholar
Digital Library
- Rinard, M. 2006. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks. In Proceedings of ICS. Google Scholar
Digital Library
- Rinard, M. 2007. Using early phase termination to eliminate load imbalancess at barrier synchronization points. In Proceedings of OOPSLA. Google Scholar
Digital Library
- Rinard, M. and Diniz, P. 1997. Commutativity analysis: A new analysis technique for parallelizing compilers. ACM Trans. Program. Lang. Syst. 19, 6. Google Scholar
Digital Library
- Rinard, M., Cadar, C., Dumitran, D., Roy, D. M., Leu, T., and William S. Beebee, J. 2004. Enhancing server availability and security through failure-oblivious computing. In Proceedings of OSDI. Google Scholar
Digital Library
- Rinard, M., Hoffmann, H., Misailovic, S., and Sidiroglou, S. 2010. Patterns and statistical analysis for understanding reduced resource computing. In Proceedings of OOPSLA Onwards! Google Scholar
Digital Library
- Rul, S., Vandierendonck, H., and De Bosschere, K. 2008. A dynamic analysis tool for finding coarse-grain parallelism. In Proceedings of HiPEAC Industrial Workshop.Google Scholar
- Rus, S., Pennings, M., and Rauchwerger, L. 2007. Sensitivity analysis for automatic parallelization on multi-cores. In Proceedings of ICS. Google Scholar
Digital Library
- Sidiroglou, S., Misailovic, S., Hoffmann, H., and Rinard, M. 2011. Managing performance vs. accuracy trade-offs with loop perforation. In Proceedings of ESEC/FSE. Google Scholar
Digital Library
- Tinker, P. and Katz, M. 1988. Parallel execution of sequential Scheme with Paratran. In Proceedings of LFP. Google Scholar
Digital Library
- Tournavitis, G., Wang, Z., Franke, B., and O’Boyle, M. 2009. Towards a holistic approach to auto-parallelization: Integrating profile-driven parallelism detection and machine-learning based mapping. In Proceedings of PLDI. Google Scholar
Digital Library
- Udupa, A., Rajan, K., and Thies, W. 2011. Alter: Leveraging breakable dependences for parallelization. In Proceedings of PLDI. Google Scholar
Digital Library
- Wald, A. 1947. Sequential Analysis. John Wiley and Sons.Google Scholar
- Woo, S., Ohara, M., Torrie, E., Singh, J., and Gupta, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of ISCA. Google Scholar
Digital Library
Index Terms
Parallelizing Sequential Programs with Statistical Accuracy Tests
Recommendations
Parallelizing Subroutines in Sequential Programs
An algorithm for making sequential programs parallel is described, which first identifies all subroutines, then determines the appropriate execution mode and restructures the code. It works recursively to parallelize the entire program. We use Fortran ...
Compiler and Runtime Support for Running OpenMP Programs on Pentium-and Itanium-Architectures
HIPS '03: Proceedings of the Eighth International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS'03)Exploiting Thread-Level Parallelism (TLP) is a promisingway to improve the performance of applications with theadvent of general-purpose cost effective uni-processor andshared-memory multiprocessor systems. In this paper, wedescribe the OpenMP ...
Compiler and Runtime Support for Running OpenMP Programs on Pentium- and Itanium-Architectures
IPDPS '03: Proceedings of the 17th International Symposium on Parallel and Distributed ProcessingExploiting Thread-Level Parallelism (TLP) is a promising way to improve the performance of applications with the advent of general-purpose cost effective uni-processor and shared-memory multiprocessor systems. In this paper, we describe the OpenMP ...






Comments