skip to main content
research-article

Parallelizing Sequential Programs with Statistical Accuracy Tests

Published:01 May 2013Publication History
Skip Abstract Section

Abstract

We present QuickStep, a novel system for parallelizing sequential programs. Unlike standard parallelizing compilers (which are designed to preserve the semantics of the original sequential computation), QuickStep is instead designed to generate (potentially nondeterministic) parallel programs that produce acceptably accurate results acceptably often. The freedom to generate parallel programs whose output may differ (within statistical accuracy bounds) from the output of the sequential program enables a dramatic simplification of the compiler, a dramatic increase in the range of applications that it can parallelize, and a significant expansion in the range of parallel programs that it can legally generate.

Results from our benchmark set of applications show that QuickStep can automatically generate acceptably accurate and efficient parallel programs---the automatically generated parallel versions of five of our six benchmark applications run between 5.0 and 7.8 times faster on eight cores than the original sequential versions. These applications and parallelizations contain features (such as the use of modern object-oriented programming constructs or desirable parallelizations with infrequent but acceptable data races) that place them inherently beyond the reach of standard approaches.

References

  1. Aleen, F. and Clark, N. 2009. Commutativity analysis for software parallelization: Letting program transformations see the big picture. In Proceedings of ASPLOS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Barnes, J. and Hut, P. 1986. A hierarchical O(NlogN) force calculation algorithm. Nature 324, 4, 446--449.Google ScholarGoogle ScholarCross RefCross Ref
  3. Berger, E. and Zorn, B. 2006. DieHard: Probabilistic memory safety for unsafe languages. In Proceedings of PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Blume, W. and Eigenmann, R. 1992. Performance analysis of parallelizing compilers on the Perfect Benchmarks programs. IEEE Trans. Parallel Distrib. Syst. 3, 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Blume, W., Eigenmann, R., Faigin, K., Grout, J., Hoeflinger, J., Padua, D., Petersen, P., Pottenger, W., Raughwerger, L., Tu, P., and Weatherford, S. 1995. Effective automatic parallelization with Polaris. Int. J. Parallel Program.Google ScholarGoogle Scholar
  6. Bolosky, W. and Scott, M. 1993. False sharing and its effect on shared memory performance. In Proceedings of SEDMS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bridges, M., Vachharajani, N., Zhang, Y., Jablin, T., and August, D. 2007. Revisiting the sequential programming model for multi-core. In Proceedings of MICRO. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Browning, R., Li, T., Chui, B., Ye, J., Pease, R., Czyzewski, Z., and Joy, D. 1995. Low-energy electron/atom elastic scattering cross sections for 0.1-30keV. Scanning 17, 4, 250--253.Google ScholarGoogle ScholarCross RefCross Ref
  9. Carbin, M., Misailovic, S., Kling, M., and Rinard, M. 2011. Detecting and escaping infinite loops with Jolt. In Proceedings of ECOOP. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chaudhuri, S., Gulwani, S., Lublinerman, R., and Navidpour, S. 2011. Proving programs robust. In Proceedings of ESEC/FSE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dagum, L. and Menon, R. 1998. OpenMP: An industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5, 1, 46--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Demsky, B. and Rinard, M. 2005. Data structure repair using goal-directed reasoning. In Proceedings of ICSE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Demsky, B., Ernst, M., Guo, P., McCamant, S., Perkins, J., and Rinard, M. 2006. Inference and enforcement of data structure consistency specifications. In Proceedings of ISSTA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ding, C., Shen, X., Kelsey, K., Tice, C., Huang, R., and Zhang, C. 2007. Software behavior oriented parallelization. In Proceedings of PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ding, Y. and Li, Z. 2003. An adaptive scheme for dynamic parallelization. In Proceedings of LCPC, H. Dietz Ed., Lecture Notes in Computer Science, vol. 2624. Springer-Verlag, 274--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Dinning, A. and Schonberg, E. 1991. Detecting access anomalies in programs with critical sections. In Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hall, M., Anderson, J., Amarasinghe, S., Murphy, B., Liao, S., Bugnion, E., and Lam, M. 1996. Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Harris, J., Lazaratos, S., and Michelena, R. 1990. Tomographic string inversion. In Proceedings of the 60th Annual International Meeting, Society of Exploration and Geophysics, Extended Abstracts.Google ScholarGoogle Scholar
  19. Herlihy, M and Moss, J. 1993. Transactional memory: Architectural support for lock-free data structures. In Proceedings of ISCA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hoeffding, W. 1963. Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 301, 13--30.Google ScholarGoogle ScholarCross RefCross Ref
  21. Hoffmann, H., Misailovic, S., Sidiroglou, S., Agarwal, A., and Rinard, M. 2009. Using code perforation to improve performance, reduce energy consumption, and respond to failures. Tech. rep. MIT-CSAIL-TR-2009-042, MIT, Cambridge, MA.Google ScholarGoogle Scholar
  22. Jin, G., Song, L., Zhang, W., Lu, S., and Liblit, B. 2011. Automated atomicity-violation fixing. In Proceedings of PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kim, D. and Rinard, M. C. 2011. Verification of semantic commutativity conditions and inverse operations on linked data structures. In Proceedings of PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kirsch, C., Payer, H., Röck, H., and Sokolova, A. 2011. Performance, scalability, and semantics of concurrent FIFO queues. Tech. rep. 2011-03, Department of Computer Sciences, University of Salzburg.Google ScholarGoogle Scholar
  25. Lattner, C. and Adve, V. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of CGO. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Meng, J., Chakradhar, S., and Raghunathan, A. 2009. Best-effort parallel execution framework for recognition and mining applications. In Proceedings of IPDPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Meng, J., Raghunathan, A., Chakradhar, S., and Byna, S. 2010. Exploiting the forgiving nature of applications for scalable parallel execution. In Proceedings of IPDPS.Google ScholarGoogle Scholar
  28. Misailovic, S., Kim, D., and Rinard, M. 2010a. Automatic parallelization with statistical accuracy bounds. Tech. rep. MIT-CSAIL-TR-2010-007, MIT, Cambridge, MA.Google ScholarGoogle Scholar
  29. Misailovic, S., Kim, D., and Rinard, M. 2010b. Parallelizing sequential programs with statistical accuracy tests. Tech. rep. MIT-CSAIL-TR-2010-038, MIT, Cambridge, MA.Google ScholarGoogle Scholar
  30. Misailovic, S., Roy, D., and Rinard, M. 2011a. Probabilistic and statistical analysis of perforated patterns. Tech. rep. MIT-CSAIL-TR-2011-003, MIT, Cambridge, MA.Google ScholarGoogle Scholar
  31. Misailovic, S., Roy, D., and Rinard, M. 2011b. Probabilistically accurate program transformations. In Proceedings of SAS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Misailovic, S., Sidiroglou, S., Hoffmann, H., and Rinard, M. 2010. Quality of service profiling. In Proceedings of ICSE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Nguyen, H. and Rinard, M. 2007. Detecting and eliminating memory leaks using cyclic memory allocation. In Proceedings of ISMM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Nieh, J. and Levoy, M. 1992. Volume rendering on scalable shared-memory MIMD architectures. Tech. rep. CSL-TR-92-537, Computer Systems Laboratory, Stanford Univ., Stanford, CA.Google ScholarGoogle Scholar
  35. Perkins, J., Kim, S., Larsen, S., Amarasinghe, S., Bachrach, J., Carbin, M., Pacheco, C., Sherwood, F., Sidiroglou, S., Sullivan, G., Wong, W., Zibin, Y., Ernst, M. D., and Rinard, M. 2009. Automatically patching errors in deployed software. In Proceedings of SOSP. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Prabhu, M. and Olukotun, K. 2005. Exposing speculative thread parallelism in SPEC2000. In Proceedings of PPoPP. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Rauchwerger, L. and Padua, D. 1995. The LRPD test: Speculative runtime parallelization of loops with privatization and reduction parallelization. In Proceedings of PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Rauchwerger, L., Amato, N., and Padua, D. 1995. Runtime methods for parallelizing partially parallel loops. In Proceedings of ICS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Rinard, M. 1994. The design, implementation and evaluation of Jade, a portable, implicitly parallel programming language. Ph.D. dissertation, Dept. of Computer Science, Stanford Univ., Stanford, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Rinard, M. 2003. Acceptability-oriented computing. In Proceedings of OOPSLA Onwards! Session. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Rinard, M. 2006. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks. In Proceedings of ICS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Rinard, M. 2007. Using early phase termination to eliminate load imbalancess at barrier synchronization points. In Proceedings of OOPSLA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Rinard, M. and Diniz, P. 1997. Commutativity analysis: A new analysis technique for parallelizing compilers. ACM Trans. Program. Lang. Syst. 19, 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Rinard, M., Cadar, C., Dumitran, D., Roy, D. M., Leu, T., and William S. Beebee, J. 2004. Enhancing server availability and security through failure-oblivious computing. In Proceedings of OSDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Rinard, M., Hoffmann, H., Misailovic, S., and Sidiroglou, S. 2010. Patterns and statistical analysis for understanding reduced resource computing. In Proceedings of OOPSLA Onwards! Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Rul, S., Vandierendonck, H., and De Bosschere, K. 2008. A dynamic analysis tool for finding coarse-grain parallelism. In Proceedings of HiPEAC Industrial Workshop.Google ScholarGoogle Scholar
  47. Rus, S., Pennings, M., and Rauchwerger, L. 2007. Sensitivity analysis for automatic parallelization on multi-cores. In Proceedings of ICS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Sidiroglou, S., Misailovic, S., Hoffmann, H., and Rinard, M. 2011. Managing performance vs. accuracy trade-offs with loop perforation. In Proceedings of ESEC/FSE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Tinker, P. and Katz, M. 1988. Parallel execution of sequential Scheme with Paratran. In Proceedings of LFP. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Tournavitis, G., Wang, Z., Franke, B., and O’Boyle, M. 2009. Towards a holistic approach to auto-parallelization: Integrating profile-driven parallelism detection and machine-learning based mapping. In Proceedings of PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Udupa, A., Rajan, K., and Thies, W. 2011. Alter: Leveraging breakable dependences for parallelization. In Proceedings of PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Wald, A. 1947. Sequential Analysis. John Wiley and Sons.Google ScholarGoogle Scholar
  53. Woo, S., Ohara, M., Torrie, E., Singh, J., and Gupta, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of ISCA. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Parallelizing Sequential Programs with Statistical Accuracy Tests

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!