Abstract
In this article, we introduce a parallelizing approximation-discovery framework, PANDORA, for automatically discovering application- and architecture-specialized approximations of provided code. PANDORA complements existing compilers and runtime optimizers by generating approximations with a range of Pareto-optimal tradeoffs between performance and error, which enables adaptation to different inputs, different user preferences, and different runtime conditions (e.g., battery life). We demonstrate that PANDORA can create parallel approximations of inherently sequential code by discovering alternative implementations that eliminate loop-carried dependencies. For a variety of functions with loop-carried dependencies, PANDORA generates approximations that achieve speedups ranging from 2.3x to 81x, with acceptable error for many usage scenarios. We also demonstrate PANDORA’s architecture-specialized approximations via FPGA experiments, and highlight PANDORA’s discovery capabilities by removing loop-carried dependencies from a recurrence relation with no known closed-form solution.
- C. Alvarez, J. Corbal, and M. Valero. 2005. Fuzzy memoization for floating-point multimedia applications. IEEE Trans. Comput. 54, 7 (July 2005), 922--927. DOI:https://doi.org/10.1109/TC.2005.119Google Scholar
Digital Library
- Jason Ansel, Yee Lok Wong, Cy Chan, Marek Olszewski, Alan Edelman, and Saman Amarasinghe. 2011. Language and compiler support for auto-tuning variable-accuracy algorithms. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’11). IEEE Computer Society, Washington, DC, 85--96. http://dl.acm.org/citation.cfm?id=2190025.2190056.Google Scholar
Digital Library
- Woongki Baek and Trishul M. Chilimbi. 2010. Green: A framework for supporting energy-conscious programming using controlled approximation. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). ACM, New York, NY, 198--209. DOI:https://doi.org/10.1145/1806596.1806620Google Scholar
- M. Bohr. 2007. A 30 year retrospective on Dennard’s MOSFET scaling paper. Solid-State Circuits Soc. Newsl., IEEE 12, 1 (Winter 2007), 11--13. DOI:https://doi.org/10.1109/N-SSC.2007.4785534Google Scholar
- Pierre Boulet, Alain Darte, Georges-André Silber, and Frédéric Vivien. 1998. Loop parallelization algorithms: From parallelism extraction to code generation. Parallel Comput. 24, 3–4 (1998), 421--444. DOI:https://doi.org/10.1016/S0167-8191(98)00020-9Google Scholar
Digital Library
- E. Bugnion, Shih-Wei Liao, B. R. Murphy, S. P. Amarasinghe, J. M. Anderson, M. W. Hall, and M. S Lam. 1996. Maximizing multiprocessor performance with the SUIF compiler. Comput. 29, 12 (1996), 84,85,86,87,88,89. DOI:https://doi.org/10.1109/2.546613Google Scholar
- Simone Campanoni, Kevin Brownell, Svilen Kanev, Timothy M. Jones, Gu-Yeon Wei, and David Brooks. 2014. HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA’14). IEEE Press, Piscataway, NJ, 217--228. http://dl.acm.org/citation.cfm?id=2665671.2665705.Google Scholar
Digital Library
- Michael Carbin, Deokhwan Kim, Sasa Misailovic, and Martin C. Rinard. 2013. Verified integrity properties for safe approximate program transformations. In Proceedings of the ACM SIGPLAN 2013 Workshop on Partial Evaluation and Program Manipulation (PEPM’13). ACM, New York, NY, 63--66. DOI:https://doi.org/10.1145/2426890.2426901Google Scholar
- Michael Carbin, Sasa Misailovic, and Martin C. Rinard. 2013. Verifying quantitative reliability for programs that execute on unreliable hardware. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages 8 Applications (OOPSLA’13). ACM, New York, NY, 33--52. DOI:https://doi.org/10.1145/2509136.2509546Google Scholar
- Lakshmi N. Chakrapani, Bilge E. S. Akgul, Suresh Cheemalavagu, Pinar Korkmaz, Krishna V. Palem, and Balasubramanian Seshasayee. 2006. Ultra-efficient (embedded) SOC architectures based on probabilistic CMOS (PCMOS) technology. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’06). Belgium, 1110--1115. http://dl.acm.org/citation.cfm?id=1131481.1131790.Google Scholar
Cross Ref
- Lakshmi N. B. Chakrapani, Kirthi Krishna Muntimadugu, Avinash Lingamneni, Jason George, and Krishna V. Palem. 2008. Highly energy and performance efficient embedded computing through approximately correct arithmetic: A mathematical foundation and preliminary experimental validation. In Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES’08). ACM, New York, NY, 187--196. DOI:https://doi.org/10.1145/1450095.1450124Google Scholar
- Vinay K. Chippa, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2013. Analysis and characterization of inherent application resilience for approximate computing. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, New York, NY, Article 113, 9 pages. DOI:https://doi.org/10.1145/2463209.2488873Google Scholar
Digital Library
- Noel Cressie. 1990. The origins of kriging. Math. Geol. 22, 3 (1990), 239--252. DOI:https://doi.org/10.1007/BF00889887Google Scholar
Cross Ref
- M. de la Guia Solaz and Richard Conway. 2010. Comparative study on wordlength reduction and truncation for low power multipliers. In MIPRO, 2010 Proceedings of the 33rd International Convention. 84--88.Google Scholar
- A. E. Eichenberger, K. O’Brien, Peng Wu, Tong Chen, P. H. Oden, D. A. Prener, J. C. Shepherd, Byoungro So, Z. Sura, A. Wang, Tao Zhang, Peng Zhao, and M. Gschwind. 2005. Optimizing compiler for the CELL processor. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT’05). 161--172. DOI:https://doi.org/10.1109/PACT.2005.33Google Scholar
Digital Library
- Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. 2012. Architecture support for disciplined approximate programming. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVII). ACM, New York, NY, 301--312. DOI:https://doi.org/10.1145/2150976.2151008Google Scholar
Digital Library
- Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. 2012. Neural acceleration for general-purpose approximate programs. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). IEEE Computer Society, Washington, DC, 449--460. DOI:https://doi.org/10.1109/MICRO.2012.48Google Scholar
Digital Library
- Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: Evolutionary algorithms made easy. J. Mach. Learn. Res. 13 (July 2012), 2171--2175.Google Scholar
Digital Library
- Jeremy Fowers, Greg Brown, Patrick Cooke, and Greg Stitt. 2012. A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications. In FPGA’12: Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’12). ACM, New York, NY, 47--56. DOI:https://doi.org/10.1145/2145694.2145704Google Scholar
Digital Library
- Milind Girkar and Constantine D. Polychronopoulos. 1995. Extracting task-level parallelism. ACM Trans. Program. Lang. Syst. 17, 4 (July 1995), 600--634. DOI:https://doi.org/10.1145/210184.210189Google Scholar
Digital Library
- V. Gupta, D. Mohapatra, Sang Phill Park, A. Raghunathan, and K. Roy. 2011. IMPACT: IMPrecise adders for low-power approximate computing. In International Symposium on Low Power Electronics and Design (ISLPED’11). 409--414. DOI:https://doi.org/10.1109/ISLPED.2011.5993675Google Scholar
Cross Ref
- J. Han and M. Orshansky. 2013. Approximate computing: An emerging paradigm for energy-efficient design. In 2013 18th IEEE European Test Symposium (ETS). 1--6. DOI:https://doi.org/10.1109/ETS.2013.6569370Google Scholar
Cross Ref
- Gregory S. Hornby. 2006. ALPS: The age-layered population structure for reducing the problem of premature convergence. In Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation (GECCO’06). ACM, New York, NY, 815--822. DOI:https://doi.org/10.1145/1143997.1144142Google Scholar
Digital Library
- K. Hornik, M. Stinchcombe, and H. White. 1989. Multilayer feedforward networks are universal approximators. Neural Netw. 2, 5 (July 1989), 359--366. DOI:https://doi.org/10.1016/0893-6080(89)90020-8Google Scholar
Digital Library
- A. Huang. 2015. Moore’s law is dying (and that could be good). Spectrum, IEEE 52, 4 (April 2015), 43--47. DOI:https://doi.org/10.1109/MSPEC.2015.7065418Google Scholar
- P. M. W. Knijnenburg, T. Kisuki, and M. F. P. O’Boyle. 2002. Iterative Compilation. Springer-Verlag New York, Inc., New York, NY, 171--187.Google Scholar
- John R. Koza. 1994. Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, Cambridge, MA.Google Scholar
Digital Library
- Logan Kugler. 2015. Is “good enough” computing good enough?Commun. ACM 58, 5 (April 2015), 12--14. DOI:https://doi.org/10.1145/2742482Google Scholar
- Sameer Kulkarni and John Cavazos. 2012. Mitigating the compiler optimization phase-ordering problem using machine learning. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA’12). ACM, New York, NY, 147--162. DOI:https://doi.org/10.1145/2384616.2384628Google Scholar
Digital Library
- J. M. Pierre Langlois and Dhamin Al-Khalili. 2006. Carry-free approximate squaring functions with O(n) complexity and O(1) delay. IEEE Trans. Circuits Syst. 53-II, 5 (2006), 374--378. http://dblp.uni-trier.de/db/journals/tcas/tcasII53.html#LangloisA06.Google Scholar
Cross Ref
- Song Liu, Karthik Pattabiraman, Thomas Moscibroda, and Benjamin G. Zorn. 2011. Flikker: Saving DRAM refresh-power through critical data partitioning. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). ACM, New York, NY, 213--224. DOI:https://doi.org/10.1145/1950365.1950391Google Scholar
- James McDermott, David R. White, Sean Luke, Luca Manzoni, Mauro Castelli, Leonardo Vanneschi, Wojciech Jaskowski, Krzysztof Krawiec, Robin Harper, Kenneth De Jong, and Una-May O’Reilly. 2012. Genetic programming needs better benchmarks. In Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation (GECCO’12). ACM, New York, NY, 791--798. DOI:https://doi.org/10.1145/2330163.2330273Google Scholar
Digital Library
- Sasa Misailovic, Michael Carbin, Sara Achour, Zichao Qi, and Martin C. Rinard. 2014. Chisel: Reliability- and accuracy-aware optimization of approximate computational kernels. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages 8 Applications (OOPSLA’14). ACM, New York, NY, 309--328. DOI:https://doi.org/10.1145/2660193.2660231Google Scholar
- Sasa Misailovic, Michael Carbin, Sara Achour, Zichao Qi, and Martin C. Rinard. 2014. Chisel: Reliability- and accuracy-aware optimization of approximate computational kernels. SIGPLAN Not. 49, 10 (Oct. 2014), 309--328. DOI:https://doi.org/10.1145/2714064.2660231Google Scholar
Digital Library
- Sasa Misailovic, Deokhwan Kim, and Martin Rinard. 2013. Parallelizing sequential programs with statistical accuracy tests. ACM Trans. Embed. Comput. Syst. 12, 2s, Article 88 (May 2013), 26 pages. DOI:https://doi.org/10.1145/2465787.2465790Google Scholar
Digital Library
- Sasa Misailovic, Stelios Sidiroglou, and Martin C. Rinard. 2012. Dancing with uncertainty. In Proceedings of the 2012 ACM Workshop on Relaxing Synchronization for Multicore and Manycore Scalability (RACES’12). ACM, New York, NY, 51--60. DOI:https://doi.org/10.1145/2414729.2414738Google Scholar
- D. Mohapatra, V. K. Chippa, A. Raghunathan, and K. Roy. 2011. Design of voltage-scalable meta-functions for approximate computing. In Proceedings of the Design, Automation and Test in Europe Conference Exhibition (DATE), 2011. 1--6. DOI:https://doi.org/10.1109/DATE.2011.5763154Google Scholar
Cross Ref
- T. Moreau, M. Wyse, J. Nelson, A. Sampson, H. Esmaeilzadeh, L. Ceze, and M. Oskin. 2015. SNNAP: Approximate computing on programmable SoCs via neural acceleration. In Proceedings of the 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 603--614. DOI:https://doi.org/10.1109/HPCA.2015.7056066Google Scholar
Cross Ref
- Sriram Narayanan, John Sartori, Rakesh Kumar, and Douglas L. Jones. 2010. Scalable stochastic processors. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’10). European Design and Automation Association, 3001 Leuven, Belgium, Belgium, 335--338. http://dl.acm.org/citation.cfm?id=1870926.1871008.Google Scholar
- Preeti Ranjan Panda, Nikil D. Dutt, Alexandru Nicolau, Francky Catthoor, Arnout Vandecappelle, Erik Brockmeyer, Chidamber Kulkarni, and Eddy De Greef. 2001. Data memory organization and optimizations in application-specific systems. IEEE Des. Test 18, 3 (May 2001), 56--68. DOI:https://doi.org/10.1109/54.922803Google Scholar
- J. Park and I. W. Sandberg. 1991. Universal approximation using radial-basis-function networks. Neural Comput. 3, 2 (June 1991), 246--257. DOI:https://doi.org/10.1162/neco.1991.3.2.246Google Scholar
Cross Ref
- Suganth Paul, Nikhil Jayakumar, and Sunil P. Khatri. 2009. A fast hardware approach for approximate, efficient logarithm and antilogarithm computations. IEEE Trans. Very Large Scale Integr. VLSI Syst. 17, 2 (2009), 269--277. DOI:https://doi.org/10.1109/TVLSI.2008.2003481Google Scholar
Digital Library
- Efecan Poyraz, Heming Xu, and Yifeng Cui. 2014. Application-specific I/O optimizations on petascale supercomputers. Procedia Comput. Sci. 29 (2014), 910--923. DOI:https://doi.org/10.1016/j.procs.2014.05.082Google Scholar
Cross Ref
- Lakshminarayanan Renganarayana, Vijayalakshmi Srinivasan, Ravi Nair, and Daniel Prener. 2012. Programming with relaxed synchronization. In Proceedings of the 2012 ACM Workshop on Relaxing Synchronization for Multicore and Manycore Scalability (RACES’12). ACM, New York, NY, 41--50. DOI:https://doi.org/10.1145/2414729.2414737Google Scholar
Digital Library
- Adrian Sampson, Werner Dietl, Emily Fortuna, Danushen Gnanapragasam, Luis Ceze, and Dan Grossman. 2011. EnerJ: Approximate data types for safe and general low-power computation. SIGPLAN Not. 46, 6 (June 2011), 164--174. DOI:https://doi.org/10.1145/1993316.1993518Google Scholar
Digital Library
- Stelios Sidiroglou-Douskos, Sasa Misailovic, Henry Hoffmann, and Martin Rinard. 2011. Managing performance vs. accuracy trade-offs with loop perforation. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE’11). ACM, New York, NY, 124--134. DOI:https://doi.org/10.1145/2025113.2025133Google Scholar
Digital Library
- Huaiming Song, Yanlong Yin, Yong Chen, and Xian-He Sun. 2013. Cost-intelligent application-specific data layout optimization for parallel file systems. Cluster Comput. 16, 2 (June 2013), 285--298. DOI:https://doi.org/10.1007/s10586-012-0200-4Google Scholar
Digital Library
- Greg Stitt and David Campbell. 2019. PANDORA: A parallelizing approximation-discovery framework (WIP paper). In Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES 2019). ACM, New York, NY, 198--202. DOI:https://doi.org/10.1145/3316482.3326345Google Scholar
Digital Library
- Vladimir Vapnik, Steven E. Golowich, and Alex Smola. 1996. Support vector method for function approximation, regression estimation, and signal processing. In Advances in Neural Information Processing Systems 9. MIT Press, 281--287.Google Scholar
- Swagath Venkataramani, Vinay K. Chippa, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2013. Quality programmable vector processors for approximate computing. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, 1--12. DOI:https://doi.org/10.1145/2540708.2540710Google Scholar
Digital Library
- S. Venkataramani, A. Sabne, V. Kozhikkottu, K. Roy, and A. Raghunathan. 2012. SALSA: Systematic logic synthesis of approximate circuits. In Proceedings of the 2012 49th ACM/EDAC/IEEE Design Automation Conference (DAC). 796--801.Google Scholar
- John Wernsing, Jeremy Fowers, and Greg Stitt. 2012. RACECAR: A heuristic for automatic function specialization on multi-core heterogeneous systems. In Proceedings of the CASES’12: IEEE/ACM International Conference on Compilers, Architecture, and Synthesis for Embedded Systems.Google Scholar
Digital Library
- K. E. Wires, M. J. Schulte, and J. E. Stine. 2000. Variable-correction truncated floating point multipliers. In Proceedings of the Conference Record of the 34th Asilomar Conference on Signals, Systems and Computers, 2000, Vol. 2. 1344--1348. DOI:https://doi.org/10.1109/ACSSC.2000.911211Google Scholar
- Ning Zhu, Wang Ling Goh, Weija Zhang, Kiat Seng Yeo, and Zhi Hui Kong. 2010. Design of low-power high-speed truncation-error-tolerant adder and its application in digital signal processing. IEEE Trans. Very Large Scale Integr. VLSI Syst. 18, 8 (Aug. 2010), 1225--1229. DOI:https://doi.org/10.1109/TVLSI.2009.2020591Google Scholar
- Zeyuan Allen Zhu, Sasa Misailovic, Jonathan A. Kelner, and Martin Rinard. 2012. Randomized accuracy-aware program transformations for efficient approximate computations. SIGPLAN Not. 47, 1 (Jan. 2012), 441--454. DOI:https://doi.org/10.1145/2103621.2103710Google Scholar
Digital Library
Index Terms
PANDORA: An Architecture-Independent Parallelizing Approximation-Discovery Framework
Recommendations
PANDORA: a parallelizing approximation-discovery framework (WIP paper)
LCTES 2019: Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded SystemsIn this paper, we introduce PANDORA---a framework that complements existing parallelizing compilers by automatically discovering application- and architecture-specialized approximations. We demonstrate that PANDORA creates approximations that extract ...
Sculptor: Flexible Approximation with Selective Dynamic Loop Perforation
ICS '18: Proceedings of the 2018 International Conference on SupercomputingLoop perforation is one of the most well known software techniques in approximate computing. It transforms loops to periodically skip subsets of their iterations. It is general, simple, and effective. However, during analysis, it only considers the ...
Approximate loop unrolling
CF '19: Proceedings of the 16th ACM International Conference on Computing FrontiersWe introduce Approximate Unrolling, a compiler loop optimization that reduces execution time and energy consumption, exploiting code regions that can endure some approximation and still produce acceptable results. Specifically, this work focuses on ...






Comments