skip to main content
research-article
Open Access

Scaling up Superoptimization

Published:25 March 2016Publication History
Skip Abstract Section

Abstract

Developing a code optimizer is challenging, especially for new, idiosyncratic ISAs. Superoptimization can, in principle, discover machine-specific optimizations automatically by searching the space of all instruction sequences. If we can increase the size of code fragments a superoptimizer can optimize, we will be able to discover more optimizations. We develop LENS, a search algorithm that increases the size of code a superoptimizer can synthesize by rapidly pruning away invalid candidate programs. Pruning is achieved by selectively refining the abstraction under which candidates are considered equivalent, only in the promising part of the candidate space. LENS also uses a bidirectional search strategy to prune the candidate space from both forward and backward directions. These pruning strategies allow LENS to solve twice as many benchmarks as existing enumerative search algorithms, while LENS is about 11-times faster.

Additionally, we increase the effective size of the superoptimized fragments by relaxing the correctness condition using contexts (surrounding code). Finally, we combine LENS with complementary search techniques into a cooperative superoptimizer, which exploits the stochastic search to make random jumps in a large candidate space, and a symbolic (SAT-solver-based) search to synthesize arbitrary constants. While existing superoptimizers consistently solve 9--16 out of 32 benchmarks, the cooperative superoptimizer solves 29 benchmarks. It can synthesize code fragments that are up to 82% faster than code generated by gcc -O3 from WiBench and MiBench.

References

  1. Souper. http://github.com/google/souper. URL http://github.com/google/souper.Google ScholarGoogle Scholar
  2. T. Akiba, K. Imajo, H. Iwami, Y. Iwata, T. Kataoka, N. Takahashi, M. Moskal, and N. Swamy. Calibrating research in program synthesis using 72,000 hours of programmer time. Technical report, MSR, 2013.Google ScholarGoogle Scholar
  3. R. Alur, R. Bodik, E. Dallal, D. Fisman, P. Garg, G. Juniwal, H. Kress-Gazit, P. Madhusudan, M. M. K. Martin, M. Raghothaman, S. Saha, S. A. Seshia, R. Singh, A. Solar-Lezama, E. Torlak, and A. Udupa. Syntax-guided synthesis. In SyGus Competition, 2014.Google ScholarGoogle Scholar
  4. ARM. Cortex-A9: Technical Reference Manual, 2012. URL http://infocenter.arm.com/help/topic/com.arm.doc.ddi0388i/DDI0388I_cortex_a9_r4p1_trm.pdf.Google ScholarGoogle Scholar
  5. S. Bansal and A. Aiken. Automatic generation of peephole superoptimizers. In ASPLOS, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Barthe, J. M. Crespo, S. Gulwani, C. Kunz, and M. Marron. From relational verification to simd loop synthesis. In PPoPP, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Bungo. The use of compiler optimizations for embedded systems software. Crossroads, 15 (1): 8--15, Sept. 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Duller, D. Towner, G. Panesar, A. Gray, and W. Robbins. picoarray technology: the tool's story. In Design, Automation and Test in Europe, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Galenson, P. Reames, R. Bodik, B. Hartmann, and K. Sen. Codehint: Dynamic and interactive synthesis of code snippets. In ICSE, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. V. Govindaraju, C.-H. Ho, T. Nowatzki, J. Chhugani, N. Satish, K. Sankaralingam, and C. Kim. Dyser: Unifying functionality and parallelism specialization for energy-efficient computing. Micro, IEEE, Sept 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Granlund and R. Kenner. Eliminating branches using a superoptimizer and the gnu c compiler. In PLDI, 1992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. GreenArrays. Product Brief: GreenArrays GA144, 2010. URL http://www.greenarraychips.com/home/documents/greg/PB001--100503-GA144--1--10.pdf.Google ScholarGoogle Scholar
  13. S. Gulwani, S. Jha, A. Tiwari, and R. Venkatesan. Synthesis of loop-free programs. In PLDI, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. Mudge, R. Brown, and T. Austin. Mibench: a free, commercially representative embedded benchmark suite. In IEEE International Symposium on Workload Characterization, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  15. Intel. Reducing Data Center Energy Consumption. Technical report, 2008.Google ScholarGoogle Scholar
  16. M. Kandemir, N. Vijaykrishnan, and M. Irwin. Compiler optimizations for low power systems. In Power Aware Computing, Series in Computer Science. Springer US, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  17. N. P. Lopes, D. Menendez, S. Nagarakatte, and J. Regehr. Provably correct peephole optimizations with alive. In PLDI, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Massalin. Superoptimizer: a look at the smallest program. In ASPLOS, 1987.Google ScholarGoogle Scholar
  19. P. Merolla, J. Arthur, F. Akopyan, N. Imam, R. Manohar, and D. Modha. A digital neurosynaptic core using embedded crossbar memory with 45pj per spike in 45nm. In Custom Integrated Circuits Conference (CICC), 2011 IEEE, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  20. P. M. Phothilimthana, T. Jelvis, R. Shah, N. Totla, S. Chasins, and R. Bodik. Chlorophyll: Synthesis-aided compiler for low-power spatial architectures. In PLDI, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. M. Phothilimthana, A. Thakur, R. Bodik, and D. Dhurjati. Greenthumb: Superoptimizer construction framework. In Proceedings of International Conference on Compiler Construction, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. Qadeer, R. Hameed, O. Shacham, P. Venkatesan, C. Kozyrakis, and M. A. Horowitz. Convolution engine: Balancing efficiency and flexibility in specialized computing. In ISCA, 2013.Google ScholarGoogle Scholar
  23. A. Reynolds, M. Deters, V. Kuncak, C. Tinelli, and C. Barrett. Counterexample-guided quantifier instantiation for synthesis in smt. In CAV, 2015.Google ScholarGoogle Scholar
  24. E. Schkufza, R. Sharma, and A. Aiken. Stochastic superoptimization. In ASPLOS, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. E. Schkufza, R. Sharma, and A. Aiken. Stochastic optimization of floating-point programs with tunable precision. In PLDI, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. Sharma. Personal communication, June 2015.Google ScholarGoogle Scholar
  27. A. Solar-Lezama, L. Tancau, R. Bodik, S. Seshia, and V. Saraswat. Combinatorial sketching for finite programs. In ASPLOS, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. V. Srinivasan and T. Reps. Synthesis of machine code from semantics. In PLDI, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. The Linley Group. Processor watch: Getting way out of box. http://www.linleygroup.com/newsletters/newsletter_detail.php?num=5038, 2013. Accessed: 2014--11--13.Google ScholarGoogle Scholar
  30. E. Torlak and R. Bodik. Growing solver-aided languages with Rosette. In Onward!, 2013.Google ScholarGoogle Scholar
  31. E. Torlak and R. Bodik. A lightweight symbolic virtual machine for solver-aided host languages. In PLDI, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Udupa, A. Raghavan, J. V. Deshmukh, S. Mador-Haim, M. M. Martin, and R. Alur. Transit: Specifying protocols with concolic snippets. In PLDI, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. S. Warren. Hacker's Delight. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. H. S. Warren. A hacker's assistant. Oct. 2008. URL http://www.hackersdelight.org/aha/aha.pdf.Google ScholarGoogle Scholar
  35. Wikipedia. List of arm microarchitectures. http://en.wikipedia.org/wiki/List_of_ARM_microarchitectures, 2014. Accessed: 2014--11--13.Google ScholarGoogle Scholar
  36. C. Zhang. Dynamically Reconfigurable Architectures for Real-time Baseband Processing. PhD thesis, Lund University, 2014.Google ScholarGoogle Scholar
  37. Q. Zheng, Y. Chen, R. Dreslinski, C. Chakrabarti, A. Anastasopoulos, S. Mahlke, and T. Mudge. Wibench: An open source kernel suite for benchmarking wireless systems. In Workload Characterization (IISWC), 2013 IEEE International Symposium on, 2013.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Scaling up Superoptimization

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 51, Issue 4
        ASPLOS '16
        April 2016
        774 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2954679
        • Editor:
        • Andy Gill
        Issue’s Table of Contents
        • cover image ACM Conferences
          ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems
          March 2016
          824 pages
          ISBN:9781450340915
          DOI:10.1145/2872362
          • General Chair:
          • Tom Conte,
          • Program Chair:
          • Yuanyuan Zhou

        Copyright © 2016 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 March 2016

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!