skip to main content
research-article
Public Access

Unconventional Parallelization of Nondeterministic Applications

Published:19 March 2018Publication History
Skip Abstract Section

Abstract

The demand for thread-level-parallelism (TLP) on commodity processors is endless as it is essential for gaining performance and saving energy. However, TLP in today's programs is limited by dependences that must be satisfied at run time. We have found that for nondeterministic programs, some of these actual dependences can be satisfied with alternative data that can be generated in parallel, thus boosting the program's TLP. Satisfying these dependences with alternative data nonetheless produces final outputs that match those of the original nondeterministic program. To demonstrate the practicality of our technique, we describe the design, implementation, and evaluation of our compilers, autotuner, profiler, and runtime, which are enabled by our proposed C++ programming language extensions. The resulting system boosts the performance of six well-known nondeterministic and multi-threaded benchmarks by 158.2% (geometric mean) on a 28-core Intel-based platform.

References

  1. Wonsun Ahn, Shanxiang Qi, M Nicolaides, Josep Torrellas, J-W Lee, Xing Fang, S Midkiff, and David Wong. 2009. BulkCompiler: high-performance sequential consistency through cooperative compiler and hardware support. In International Symposium on Microarchitecture (MICRO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alexander Aiken and Alexandru Nicolau. 1988. Perfect Pipelining: A New Loop Parallelization Technique European Symposium on Programming (ESOP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Riad Akram, Mohammad Mejbah Ul Alam, and Abdullah Muzahid. 2016. Approximate Lock: Trading off Accuracy for Performance by Skipping Critical Sections International Symposium on Software Reliability Engineering (ISSRE).Google ScholarGoogle Scholar
  4. Ayaz Ali, Lennart Johnsson, and Jaspal Subhlok. 2007. Scheduling FFT Computation on SMP and Multicore Systems International Conference on Supercomputing (ICS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jason Ansel, Cy Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, and Saman Amarasinghe. 2009. PetaBricks: A Language and Compiler for Algorithmic Choice Programming Language Design and Implementation (PLDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, and Saman Amarasinghe. 2014. OpenTuner: An Extensible Framework for Program Autotuning Parallel Architectures and Compilation Techniques (PACT). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jason Ansel, Yee Lok Wong, Cy Chan, Marek Olszewski, Alan Edelman, and Saman Amarasinghe. 2011. Language and Compiler Support for Auto-tuning Variable-accuracy Algorithms Code Generation and Optimization (CGO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Christian Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. bibinfoschoolPrinceton University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications Parallel Architectures and Compilation Techniques (PACT). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jeff Bilmes, Krste Asanovic, Chee-Whye Chin, and Jim Demmel. 1997. Optimizing Matrix Multiply Using PHiPAC: A Portable, High-performance, ANSI C Coding Methodology. In International Conference on Supercomputing (ICS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sergey Blagodurov, Sergey Zhuravlev, Alexandra Fedorova, and Ali Kamali. 2010. A case for NUMA-aware contention management on multicore systems Parallel Architectures and Compilation Techniques (PACT). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Shekhar Borkar, Robert Cohn, George Cox, Sha Gleason, Thomas Gross, H. T. Kung, Monica Lam, Brian Moore, Craig Peterson, John Pieper, Linda Rankin, P. S. Tseng, Jim Sutton, John Urbanski, and Jon Webb. 1988. iWarp: An Integrated Solution to High-Speed Parallel Computing International Conference on Supercomputing (ICS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gary Bradski and Adrian Kaehler. 2008. Learning OpenCV: Computer vision with the OpenCV library. "O'Reilly Media, Inc.".Google ScholarGoogle Scholar
  14. Scott E. Breach, T. N. Vijaykumar, and Gurindar S. Sohi. 1994. The Anatomy of the Register File in a Multiscalar Processor International Symposium on Microarchitecture (MICRO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Simone Campanoni, Kevin Brownell, Svilen Kanev, Timothy M. Jones, Gu-Yeon Wei, and David Brooks. 2014. HELIX-RC: An Architecture-compiler Co-design for Automatic Parallelization of Irregular Programs. In International Symposium on Computer Architecuture (ISCA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Simone Campanoni, Glenn Holloway, Gu-Yeon Wei, and David Brooks. 2015. HELIX-UP: Relaxing Program Semantics to Unleash Parallelization Code Generation and Optimization (CGO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Simone Campanoni, Timothy Jones, Glenn Holloway, Vijay Janapa Reddi, Gu-Yeon Wei, and David Brooks. 2012 a. HELIX: Automatic Parallelization of Irregular Programs for Chip Multiprocessing Code Generation and Optimization (CGO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Campanoni, T. M. Jones, G. Holloway, G. Y. Wei, and D. Brooks. 2012 b. HELIX: Making the Extraction of Thread-Level Parallelism Mainstream International Symposium on Microarchitecture (MICRO).Google ScholarGoogle Scholar
  19. Shawn D. Casey. 2011. How to Determine the Effectiveness of Hyper-Threading Technology with an Application. https://goo.gl/ycuL6E. (2011). Accessed: 2018-01--14.Google ScholarGoogle Scholar
  20. Shailender Chaudhry, Robert Cypher, Magnus Ekman, Martin Karlsson, Anders Landin, Sherman Yip, Håkan Zeffer, and Marc Tremblay. 2009. Rock: A high-performance sparc cmt processor. In International Symposium on Microarchitecture (MICRO).Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ding-Kai Chen and Pen-Chung Yew. 1996. On Effective Execution of Nonuniform DOACROSS Loops Transactions on Parallel and Distributed Systems (TPDS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ding-Kai Chen and Pen-Chung Yew. 1999. Redundant Synchronization Elimination for DOACROSS Loops Transactions on Parallel and Distributed Systems (TPDS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Trishul M Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman. 2014. Project Adam: Building an Efficient and Scalable Deep Learning Training System. Operating Systems Design and Implementation (OSDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Cristian cTuapucs, I-Hsin Chung, and Jeffrey K. Hollingsworth. 2002. Active Harmony: Towards Automated Performance Tuning Supercomputing Conference (SC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ron Cytron, Jeanne Ferrante, Barry K Rosen, Mark N Wegman, and F Kenneth Zadeck. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems (TOPLAS) (1991). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Leonardo Dagum and Ramesh Menon. 1998. OpenMP: An Industry-Standard API for Shared-Memory Programming IEEE Comput. Sci. Eng. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. L. Davies and D. W. Bouldin. 1979. A Cluster Separation Measure. In IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Kemal Ebcioglu and Alexandru Nicolau. 1989. A Global Resource-constrained Parallelization Technique International Conference on Supercomputing (ICS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Matthias Felleisen, Robert Bruce Findler, Matthew Flatt, Shriram Krishnamurthi, Eli Barzilay, Jay McCarthy, and Sam Tobin-Hochstadt. 2015. The Racket Manifesto. In Summit on Advances in Programming Languages (SNAPL).Google ScholarGoogle Scholar
  30. Matteo Frigo and Steven G. Johnson. 2005. The design and implementation of FFTW3. In International Symposium on Microarchitecture (MICRO).Google ScholarGoogle Scholar
  31. Chao-Ying Fu, Matthew D Jennings, Sergei Y Larin, and Thomas M Conte. 1998. Value speculation scheduling for high performance processors Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. José González and Antonio González. 1998. The potential of data value speculation to boost ILP International Conference on Supercomputing (ICS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. P. Hammarlund, A. J. Martinez, A. A. Bajwa, D. L. Hill, E. Hallnor, H. Jiang, M. Dixon, M. Derr, M. Hunsaker, R. Kumar, R. B. Osborne, R. Rajwar, R. Singhal, R. D'Sa, R. Chappell, S. Kaushik, S. Chennupaty, S. Jourdan, S. Gunther, T. Piazza, and T. Burton. 2014. Haswell: The fourth-generation intel core processor International Symposium on Microarchitecture (MICRO).Google ScholarGoogle Scholar
  34. Lance Hammond, Benedict A. Hubbert, Michael Siu, Manohar K. Prabhu, Michael K. Chen, and Kunle Olukotun. 2000. The Stanford Hydra CMP. In International Symposium on Microarchitecture (MICRO).Google ScholarGoogle Scholar
  35. Liang Han, Wei Liu, and James M. Tuck. 2010. Speculative Parallelization of Partial Reduction Variables Code Generation and Optimization (CGO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. Haring, M. Ohmacht, T. Fox, M. Gschwind, D. Satterfield, K. Sugavanam, P. Coteus, P. Heidelberger, M. Blumrich, R. Wisniewski, a. gara, G. Chiu, P. Boyle, N. Chist, and C. Kim. 2012. The IBM Blue Gene/Q Compute Chip. In International Symposium on Microarchitecture (MICRO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Henry Hoffmann, Stelios Sidiroglou, Michael Carbin, Sasa Misailovic, Anant Agarwal, and Martin Rinard. 2011. Dynamic Knobs for Responsive Power-aware Computing Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han Hung, and David I. August. 2010. Decoupled Software Pipelining Creates Parallelization Opportunities Code Generation and Optimization (CGO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A.R. Hurson, Joford T., LimKrishna M., and KaviBen Lee. 1997. Parallelization of DOALL and DOACROSS Loops - A Survey Advances in Computers.Google ScholarGoogle Scholar
  40. Eun-Jin Im and Katherine A. Yelick. 2001. Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY International Conference on Computational Sciences (ICCS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Christian Jacobi, Timothy Slegel, and Dan Greiner. 2012. Transactional memory architecture and implementation for IBM System z International Symposium on Microarchitecture (MICRO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Troy A. Johnson, Rudolf Eigenmann, and T. N. Vijaykumar. 2007. Speculative Thread Decomposition Through Empirical Optimization Principles and Practice of Parallel Programming (PPoPP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. K. Kelsey, T. Bai, C. Ding, and C. Zhang. 2009. Fast Track: A Software System for Speculative Program Optimization Code Generation and Optimization (CGO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. C. Kessler and W. Löwe. 2012. Optimized Composition of Performance-aware Parallel Components Concurr. Comput. : Pract. Exper. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Hanjun Kim, Nick P Johnson, Jae W Lee, Scott A Mahlke, and David I August. 2012. Automatic speculative DOALL for clusters. In Code Generation and Optimization (CGO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, and L. Paul Chew. 2007. Optimistic Parallelism Requires Abstractions. In Programming Language Design and Implementation (PLDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation Code Generation and Optimization (CGO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Hung Q Le, GL Guthrie, DE Williams, Maged M Michael, BG Frey, William J Starke, Cathy May, Rei Odaira, and Takuya Nakaike. 2015. Transactional memory support in the IBM POWER8 processor IBM Journal of Research and Development.Google ScholarGoogle Scholar
  49. Baptiste Lepers, Vivien Quéma, and Alexandra Fedorova. 2015. Thread and Memory Placement on NUMA Systems: Asymmetry Matters. USENIX Annual Technical Conference (USENIX ATC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Duo Liu, Zili Shao, Meng Wang, Minyi Guo, and Jingling Xue. 2009. Optimal Loop Parallelization for Maximizing Iteration-level Parallelism Compilers, Architecture, and Synthesis for Embedded Systems (CASES). Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Wei Liu, James Tuck, Luis Ceze, Wonsun Ahn, Karin Strauss, Jose Renau, and Josep Torrellas. 2006. POSH: A TLS Compiler That Exploits Program Structure Principles and Practice of Parallel Programming (PPoPP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Kathryn S. McKinley. 1994. Evaluating Automatic Parallelization for Efficient Execution on Shared-memory Multiprocessors International Conference on Supercomputing (ICS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Mojtaba Mehrara, Jeff Hao, Po-Chun Hsu, and Scott Mahlke. 2009. Parallelizing Sequential Applications on Commodity Hardware Using a Low-cost Software Transactional Memory. In Programming Language Design and Implementation (PLDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Jiayuan Meng, Srimat Chakradhar, and Anand Raghunathan. 2009. Best-effort parallel execution framework for recognition and mining applications International Symposium on Parallel and Distributed Processing (IPDPS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Jiayuan Meng, Anand Raghunathan, Srimat T. Chakradhar, and Surendra Byna. 2010. Exploiting the forgiving nature of applications for scalable parallel execution International Symposium on Parallel and Distributed Processing (IPDPS).Google ScholarGoogle Scholar
  56. Sasa Misailovic, Deokhwan Kim, and Martin Rinard. 2013. Parallelizing Sequential Programs with Statistical Accuracy Tests ACM Trans. Embed. Comput. Syst. (TECS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Sasa Misailovic, Stelios Sidiroglou, Henry Hoffmann, and Martin Rinard. 2010. Quality of Service Profiling. In International Conference on Software Engineering (ICSE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Guilherme Ottoni, Ram Rangan, Adam Stoler, and David I. August. 2005. Automatic Thread Extraction with Decoupled Software Pipelining International Symposium on Microarchitecture (MICRO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Chuck Pheatt. 2008. Intel Threading Building Blocks. In J. Comput. Sci. Coll. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Phitchaya Mangpo Phothilimthana, Jason Ansel, Jonathan Ragan-Kelley, and Saman Amarasinghe. 2013. Portable Performance on Heterogeneous Architectures Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Guilherme Piccoli, Henrique N Santos, Raphael E Rodrigues, Christiane Pousa, Edson Borin, and Fernando M Quint ao Pereira. 2014. Compiler support for selective page migration in NUMA architectures Parallel Architectures and Compilation Techniques (PACT). Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Keshav Pingali, Donald Nguyen, Milind Kulkarni, Martin Burtscher, M. Amber Hassaan, Rashid Kaleem, Tsung-Hsien Lee, Andrew Lenharth, Roman Manevich, Mario Méndez-Lojo, Dimitrios Prountzos, and Xin Sui. 2011. The Tao of Parallelism in Algorithms. In Programming Language Design and Implementation (PLDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Arun Raman, Hanjun Kim, Thomas R. Mason, Thomas B. Jablin, and David I. August. 2010. Speculative Parallelization Using Software Multi-threaded Transactions Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Easwaran Raman, Guilherme Ottoni, Arun Raman, Matthew J. Bridges, and David I. August. 2008. Parallel-stage Decoupled Software Pipelining. In Code Generation and Optimization (CGO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Lakshminarayanan Renganarayana, Vijayalakshmi Srinivasan, Ravi Nair, and Daniel Prener. 2012. Programming with relaxed synchronization. In Relaxing synchronization for multicore and manycore scalability (RACES). Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Martin C Rinard. 2007. Using early phase termination to eliminate load imbalances at barrier synchronization points Object-oriented Programming, Systems, Languages, and Applications (OOPSLA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Behnam Robatmil, Dong Li, Hadi Esmaeilzadeh, Sibi Govindan, Aaron Smith, Andrew Putnam, Doug Burger, and Stephen W. Keckler. 2013. How to Implement Effective Prediction and Forwarding for Fusable Dynamic Multicore Architectures. In High-Performance Computer Architecture (HPCA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Mehrzad Samadi, Janghaeng Lee, D Anoushe Jamshidi, Amir Hormati, and Scott Mahlke. 2013. Sage: Self-tuning approximation for graphics engines International Symposium on Microarchitecture (MICRO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Nitya Ranganathan, Doug Burger, Stephen W. Keckler, Robert G. McDonald, and Charles R. Moore. 2004. TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP Transactions on Architecture and Code Optimization (TACO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Steven L. Scott. 1996. Synchronization and Communication in the T3E Multiprocessor Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Stelios Sidiroglou-Douskos, Sasa Misailovic, Henry Hoffmann, and Martin Rinard. 2011. Managing Performance vs. Accuracy Trade-offs with Loop Perforation European Conference on Foundations of Software Engineering (ESEC/FSE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Gurindar S. Sohi, Scott E. Breach, and T. N. Vijaykumar. 1995. Multiscalar Processors. In International Symposium on Computer Architecture (ISCA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Sharanyan Srikanthan, Sandhya Dwarkadas, and Kai Shen. 2015. Data Sharing or Resource Contention: Toward Performance Transparency on Multicore Systems USENIX Annual Technical Conference (USENIX ATC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Sharanyan Srikanthan, Sandhya Dwarkadas, and Kai Shen. 2016. Coherence stalls or latency tolerance: informed CPU scheduling for socket and core sharing USENIX Annual Technical Conference (USENIX ATC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. J. Steffan and T Mowry. 1998. The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization. In High-Performance Computer Architecture (HPCA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. J. Gregory Steffan, Christopher Colohan, Antonia Zhai, and Todd C. Mowry. 2005. The STAMPede Approach to Thread-level Speculation Transactions on Computer Systems (TOC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. John E. Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems IEEE Des. Test.Google ScholarGoogle Scholar
  78. Xin Sui, Andrew Lenharth, Donald S. Fussell, and Keshav Pingali. 2016. Proactive Control of Approximate Programs. In Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Georgios Tournavitis, Zheng Wang, Björn Franke, and Michael F.P. O'Boyle. 2009. Towards a Holistic Approach to Auto-parallelization: Integrating Profile-driven Parallelism Detection and Machine-learning Based Mapping Programming Language Design and Implementation (PLDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Abhishek Udupa, Kaushik Rajan, and William Thies. 2011. ALTER: Exploiting Breakable Dependences for Parallelization Programming Language Design and Implementation (PLDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Antonio Valles, M Gillespie, and G Drysdale. 2009. Performance Insights to Intel® Hyper-Threading Technology. http://software.intel.com/en-us/articles/performance-insights-to-intel-hyper-threading-technology. (2009). Accessed: 2017-07-01.Google ScholarGoogle Scholar
  82. Keval Vora, Sai Charan Koduru, and Rajiv Gupta. 2014. ASPIRE: Exploiting asynchronous parallelism in iterative algorithms using a relaxed consistency based DSM. In Object-oriented Programming, Systems, Languages, and Applications (OOPSLA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Yevgen Voronenko, Frédéric de Mesmay, and Markus Püschel. 2009. Computer Generation of General Size Linear Transform Libraries Code Generation and Optimization (CGO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Cheng Wang, Youfeng Wu, Edson Borin, Shiliang Hu, Wei Liu, Dave Sager, Tin-fook Ngai, and Jesse Fang. 2009. Dynamic Parallelization of Single-threaded Binary Programs Using Speculative Slicing International Conference of Supercomputing (ICS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown, III, and Anant Agarwal. 2007. On-Chip Interconnection Architecture of the Tile Processor International Symposium on Microarchitecture (MICRO).Google ScholarGoogle Scholar
  86. R. Clint Whaley and Jack J. Dongarra. 1998. Automatically Tuned Linear Algebra Software. In Supercomputing Conference (SC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Cheng-Zhong Xu and Vipin Chaudhary. 2001. Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences Transactions on Parallel and Distributed Systems (TPDS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Antonia Zhai, J. Gregory Steffan, Christopher B. Colohan, and Todd C. Mowry. 2008. Compiler and Hardware Support for Reducing the Synchronization of Speculative Threads Transactions on Architecture and Code Optimization (TACO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Hongtao Zhong, Mojtaba Mehrara, Steven A. Lieberman, and Scott A. Mahlke. 2008. Uncovering hidden loop level parallelism in sequential applications High-Performance Computer Architecture (HPCA).Google ScholarGoogle Scholar

Index Terms

  1. Unconventional Parallelization of Nondeterministic Applications

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 53, Issue 2
          ASPLOS '18
          February 2018
          809 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/3296957
          Issue’s Table of Contents
          • cover image ACM Conferences
            ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems
            March 2018
            827 pages
            ISBN:9781450349116
            DOI:10.1145/3173162

          Copyright © 2018 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 March 2018

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!