Abstract
A daunting challenge faced by program performance autotuning is input sensitivity, where the best autotuned configuration may vary with different input sets. This paper presents a novel two-level input learning algorithm to tackle the challenge for an important class of autotuning problems, algorithmic autotuning. The new approach uses a two-level input clustering method to automatically refine input grouping, feature selection, and classifier construction. Its design solves a series of open issues that are particularly essential to algorithmic autotuning, including the enormous optimization space, complex influence by deep input features, high cost in feature extraction, and variable accuracy of algorithmic choices. Experimental results show that the new solution yields up to a 3x speedup over using a single configuration for all inputs, and a 34x speedup over a traditional one-level method for addressing input sensitivity in program optimizations.
- Government’s open data. http://www.data.org/.Google Scholar
- UCI data sets. http://archive.ics.uci.edu/ml/datasets.Google Scholar
- F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O’boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. Using machine learning to focus iterative optimization. In International Symposium on Code Generation and Optimization, pages 295–305, 2006. Google Scholar
Digital Library
- L. Almagor, K. D. Cooper, A. Grosul, T. J. Harvey, S. W. Reeves, D. Subramanian, L. Torczon, and T. Waterman. Finding effective compilation sequences. In LCTES’04, pages 231–239, 2004. Google Scholar
Digital Library
- J. Ansel, Y. L. W. ans Cy Chan, M. Olszewski, A. Edelman, and S. Amarasinghe. Language and compiler support for auto-tuning variable-accuracy algorithms. In CGO, Chamonix, France, Apr 2011. Google Scholar
Digital Library
- J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao, A. Edelman, and S. Amarasinghe. PetaBricks: A language and compiler for algorithmic choice. In PLDI, Dublin, Ireland, Jun 2009. Google Scholar
Digital Library
- J. Ansel, S. Kamil, K. Veeramachaneni, J. Ragan-Kelley, J. Bosboom, U. O’Reilly, and S. Amarasinghe. Opentuner: An extensible framework for program autotuning. In Proceedings of The 23rd International Conference on Parallel Architectures and Compilation Techniques, 2014. Google Scholar
Digital Library
- J. Ansel, M. Pacula, S. Amarasinghe, and U.-M. O’Reilly. An efficient evolutionary algorithm for solving bottom up problems. In Annual Conference on Genetic and Evolutionary Computation, Dublin, Ireland, July 2011. Google Scholar
Digital Library
- J. Auslander, M. Philipose, C. Chambers, S. J. Eggers, and B. N. Bershad. Fast, effective dynamic compilation. In PLDI, 1996. Google Scholar
Digital Library
- W. Baek and T. Chilimbi. Green: A framework for supporting energyconscious programming using controlled approximation. In PLDI, June 2010. Google Scholar
Digital Library
- P. Berube, J. Amaral, R. Ho, and R. Silvera. Workload reduction for multi-input profile-directed optimization. In Proceedings of the IEEE / ACM International Symposium on Code Generation and Optimization, 2009. Google Scholar
Digital Library
- V. Bhat, M. Parashar,. Hua Liu, M. Khandekar, N. Kandasamy, and S. Abdelwahed. Enabling self-managing applications using model-based online control strategies. In International Conference on Autonomic Computing, Washington, DC, 2006. Google Scholar
Digital Library
- J. Bilmes, K. Asanovic, C.-W. Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology. In Proceedings of the ACM International Conference on Supercomputing, pages 340–347, 1997. Google Scholar
Digital Library
- F. Chang and V. Karamcheti. A framework for automatic adaptation of tunable distributed applications. Cluster Computing, 4, March 2001. Google Scholar
Digital Library
- Y. Chen, S. Fang, L. Eeckhout, O. Temam, and C. Wu. Iterative optimization for the data center. In ASPLOS, 2012. Google Scholar
Digital Library
- Y. Chen, Y. Huang, L. Eeckhout, G. Fursin, L. Peng, O. Temam, and C. Wu. Evaluating iterative optimization across 1000 datasets. In Proceedings of the ACM SIGPLAN conference on Programming language design and implementation, PLDI’10, pages 448–459, 2010. Google Scholar
Digital Library
- P. C. Diniz and M. C. Rinard. Dynamic feedback: an effective technique for adaptive computing. In PLDI, New York, NY, 1997. Google Scholar
Digital Library
- S. Fang, Z. Du, Y. Fang, Y. Huang, Y. Chen, L. Eeckhout, O. Temam, H. Li, Y. Chen, and C. Wu. Performance portability across heterogeneous socs using a generalized library-based approach. ACM Transactions on Architecture and Code Optimization, 11, 2014. Google Scholar
Digital Library
- M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216–231, 2005.Google Scholar
Cross Ref
- M. Frigo and S. G. Johnson. The design and implementation of FFTW3. IEEE, 93(2), February 2005. Invited paper, special issue on “Program Generation, Optimization, and Platform Adaptation”.Google Scholar
Cross Ref
- G. Fursin, A. Cohen, M. O’Boyle, and O. Temam. Quick and practical run-time evaluation of multiple program optimizations. Transactions on High-Performance Embedded Architectures and Compilers, 4050:34– 53, 2007. Google Scholar
Digital Library
- G. Fursin, C. Miranda, O. Temam, M. Namolaru, E. Yom-Tov, A. Zaks, B. Mendelson, E. Bonilla, J. Thomson, H. Leather, C. Williams, M. O’Boyle, P. Barnard, E. Ashton, E. Courtois, and F. Bodin. MILEPOST GCC: machine learning based research compiler. In Proceedings of the GCC Developers’ Summit, Jul 2008.Google Scholar
- T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 2001.Google Scholar
Cross Ref
- P. Hawkins, A. Aiken, K. Fisher, M. Rinard, and M. Sagiv. Data representation synthesis. In Proceedings of ACM SIGPLAN Conference on Programming Languages Design and Implementation, 2012. Google Scholar
Digital Library
- H. Hoffmann, J. Eastep, M. D. Santambrogio, J. E. Miller, and A. Agarwal. Application heartbeats: a generic interface for specifying program performance and goals in autonomous computing environments. In ICAC, New York, NY, 2010. Google Scholar
Digital Library
- H. Hoffmann, S. Misailovic, S. Sidiroglou, A. Agarwal, and M. Rinard. Using code perforation to improve performance, reduce energy consumption, and respond to failures. Technical Report MIT-CSAILTR-2209-042, Massachusetts Institute of Technology, Sep 2009.Google Scholar
- H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard. Power-aware computing with dynamic knobs. In ASPLOS, 2011.Google Scholar
Digital Library
- E. Im and K. Yelick. Optimizing sparse matrix computations for register reuse in SPARSITY. In International Conference on Computational Science, 2001. Google Scholar
Digital Library
- E.-J. Im, K. Yelick, and R. Vuduc. Sparsity: Optimization framework for sparse matrix kernels. Int. J. High Perform. Comput. Appl., 18(1):135–158, 2004. Google Scholar
Digital Library
- C. Jung, S. Rus, B. P. Railing, N. Clark, and S. Pande. Brainy: effective selection of data structures. In Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation, PLDI ’11, pages 86–97, New York, NY, USA, 2011. ACM. Google Scholar
Digital Library
- G. Karsai, A. Ledeczi, J. Sztipanovits, G. Peceli, G. Simon, and T. Kovacshazy. An approach to self-adaptive software based on supervisory control. In International Workshop in Self-adaptive software, 2001. Google Scholar
Digital Library
- X. Li, M. J. Garzarán, and D. Padua. Optimizing sorting with genetic algorithms. In CGO, 2005. Google Scholar
Digital Library
- Y. Liu, E. Z. Zhang, and X. Shen. A cross-input adaptive framework for gpu programs optimization. In Proceedings of International Parallel and Distribute Processing Symposium (IPDPS), pages 1–10, 2009. Google Scholar
Digital Library
- S. Muralidharan, M. Shantharam, M. Hall, M. Garland, and B. Catanzaro. Nitro: A framework for adaptive code variant tuning. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pages 501–512. IEEE, 2014. Google Scholar
Digital Library
- E. Park, L.-N. Pouche, J. Cavazos, A. Cohen, and P. Sadayappan. Predictive modeling in a polyhedral optimization space. In IEEE/ACM International Symposium on Code Generation and Optimization, pages 119 –129, April 2011. Google Scholar
Digital Library
- M. Puschel, J. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. Johnson, and N. Rizzolo. SPIRAL: code generation for DSP transforms. Proceedings of the IEEE, 93(2):232–275, 2005.Google Scholar
Cross Ref
- M. Püschel, J. M. F. Moura, B. Singer, J. Xiong, J. R. Johnson, D. A. Padua, M. M. Veloso, and R. W. Johnson. Spiral: A generator for platform-adapted libraries of signal processing alogorithms. IJHPCA, 18(1), 2004. Google Scholar
Digital Library
- J. Quinlan. Induction of decision trees. Machine learning, 1(1):81–106, 1986. Google Scholar
Digital Library
- M. Samadi, A. Hormati, M. Mehrara, J. Lee, and S. Mahlke. Adaptive input-aware compilation for graphics engines. In Proceedings of ACM SIGPLAN 2012 Conference on Programming Language Design and Implementation, 2012. Google Scholar
Digital Library
- C. Tapus, I.-H. Chung, and J. K. Hollingsworth. Active harmony: Towards automated performance tuning. In In Proceedings from the Conference on High Performance Networking and Computing, pages 1–11, 2003.Google Scholar
- N. Thomas, G. Tanase, O. Tkachyshyn, J. Perdue, N. M. Amato, and L. Rauchwerger. A framework for adaptive algorithm selection in STAPL. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 277–288, 2005. Google Scholar
Digital Library
- K. Tian, Y. Jiang, E. Zhang, and X. Shen. An input-centric paradigm for program dynamic optimizations. In the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), 2010. Google Scholar
Digital Library
- M. Voss and R. Eigenmann. Adapt: Automated de-coupled adaptive program transformation. In International Conference on Parallel Processing, 2000. Google Scholar
Digital Library
- M. Voss and R. Eigenmann. High-level adaptive program optimization with adapt. ACM SIGPLAN Notices, 36(7), 2001. Google Scholar
Digital Library
- R. Vuduc, J. W. Demmel, and K. A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Scientific Discovery through Advanced Computing Conference, Journal of Physics: Conference Series, San Francisco, CA, June 2005.Google Scholar
- R. C. Whaley and J. J. Dongarra. Automatically tuned linear algebra software. In Supercomputing, Washington, DC, 1998. Google Scholar
Digital Library
- R. C. Whaley, A. Petitet, and J. Dongarra. Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27(1-2):3–35, 2001.Google Scholar
Digital Library
Index Terms
Autotuning algorithmic choice for input sensitivity
Recommendations
Autotuning algorithmic choice for input sensitivity
PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and ImplementationA daunting challenge faced by program performance autotuning is input sensitivity, where the best autotuned configuration may vary with different input sets. This paper presents a novel two-level input learning algorithm to tackle the challenge for an ...
Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers
Highlights- Generate parallel CUDA code from sequential C input code using a compiler-based tool for key operators in Geometric Multigrid.
AbstractGPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model ...
Autotuning OpenACC work distribution via direct search
XSEDE '15: Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced CyberinfrastructureOpenACC provides a high-productivity API for programming GPUs and similar accelerator devices. One of the last steps in tuning OpenACC programs is selecting values for the num_gangs and vector_length clauses, which control how a parallel workload is ...






Comments