Abstract
Data structure selection is one of the most critical aspects of developing effective applications. By analyzing data structures' behavior and their interaction with the rest of the application on the underlying architecture, tools can make suggestions for alternative data structures better suited for the program input on which the application runs. Consequently, developers can optimize their data structure usage to make the application conscious of an underlying architecture and a particular program input.
This paper presents the design and evaluation of Brainy, a new program analysis tool that automatically selects the best data structure for a given program and its input on a specific microarchitecture. The data structure's interface functions are instrumented to dynamically monitor how the data structure interacts with the application for a given input. The instrumentation records traces of various runtime characteristics including underlying architecture-specific events. These generated traces are analyzed and fed into an offline model, constructed using machine learning, to select the best data structure. That is, Brainy exploits runtime feedback of data structures to model the situation an application runs on, and selects the best data structure for a given application/input/architecture combination based on the constructed model. The empirical evaluation shows that this technique is highly accurate across several real-world applications with various program input sets on two different state-of-the-art microarchitectures. Consequently, Brainy achieved an average performance improvement of 27% and 33% on both microarchitectures, respectively.
- C. W. Antoine, A. Petitet, and J. J. Dongarra. Automated empirical optimization of software and the atlas project. Parallel Computing, 27:2001, 2000.Google Scholar
- M. Aref. Discussions on the LogicBlox Datalog Optimization Engine, 2009. personal communication.Google Scholar
- I.-H. Chung. Towards Automatic Performance Tuning. PhD thesis, University of Maryland, College Park, 2004. Google Scholar
Digital Library
- K. E. Coons, B. Robatmili, M. E. Taylor, A. Maher, D. Burger, and K. S. Mckinley. Feature selection and policy optimization for distributed instruction placement using reinforcement learning. In PACT'08 : Proceddings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008. Google Scholar
Digital Library
- S. Q. Ding and C. Xiang. Overfitting problem: a new perspective from the geometrical interpretation of mlp. pages 50--57, 2003. Google Scholar
Digital Library
- J. Dongarra, K. London, S. Moore, P. Mucci, and D. Terpstra. Using papi for hardware performance monitoring on linux systems. In Proceedings of the 2nd International Conference on Linux Clusters: The HPC Revolution, Linux Clusters Institute, 2001.Google Scholar
- C. Dubach, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'Boyle, and O. Temam. Fast compiler optimisation evaluation using code-feature based performance prediction. In ACM International Conference on Computing Frontiers, 2007. Google Scholar
Digital Library
- GCC, the GNU Compiler Collection. the gcc team, 2010. http://gcc.gnu.org.Google Scholar
- Google. Google code search, 2009. http://www.google.com/codesearh.Google Scholar
- D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997. Google Scholar
Digital Library
- M. H. Hassoun. Fundamentals of Artificial Neural Networks. MIT Press, Cambridge, MA, USA, 1995. Google Scholar
Digital Library
- F. Hussein. Genetic algorithms for feature selection and weighting, a review and study. In ICDAR '01: Proceedings of the Sixth International Conference on Document Analysis and Recognition, page 1240, Washington, DC, USA, 2001. Google Scholar
Digital Library
- J. Jarmulak and S. Craw. S.: Genetic algorithms for feature selection and weighting. in. In Proceedings of the IJCAI'99 workshop on Automating the Construction of Case Based Reasoners, pages 28--33, 1999.Google Scholar
- C. Jung and N. Clark. Ddt: design and evaluation of a dynamic program analysis for optimizing data structure usage. In MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 56--66, New York, NY, USA, 2009. ACM. Google Scholar
Digital Library
- S. B. Kotsiantis. Supervised machine learning: A review of classification techniques. Informatica (Slovenia), 31(3):249--268, 2007.Google Scholar
- H. Leather, E. Bonilla, and M. O'Boyle. Automatic Feature Generation for Machine Learning Based Optimizing Compilation. In Proc. of the 2009 International Symposium on Code Generation and Optimization, Mar. 2009. Google Scholar
Digital Library
- L. Liu and S. Rus. perflint: A Context Sensitive Performance Advisor for C++ Programs. In Proc. of the 2009 International Symposium on Code Generation and Optimization, Mar. 2009. Google Scholar
Digital Library
- T. Mens and T. Tourwe. A survey of software refactoring. IEEE Transactions on Software Engineering, 30(2):126--139, 2004. Google Scholar
Digital Library
- N. Mitchell and G. Sevitsky. The causes of bloat, the limits of health. In Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications, OOPSLA '07, pages 245--260, New York, NY, USA, 2007. Google Scholar
Digital Library
- N. Mitchell, E. Schonberg, and G. Sevitsky. Four trends leading to java runtime bloat. IEEE Software, 27:56--63, 2010. Google Scholar
Digital Library
- T. M. Mitchell. Machine Learning. McGraw-Hill, New York, 1997. Google Scholar
Digital Library
- A. Monsifrot, F. Bodin, and R. Quiniou. A machine learning approach to automatic production of compiler heuristics. In AIMSA '02: Proceedings of the 10th International Conference on Artificial Intelligence: Methodology, Systems, and Applications, pages 41--50, 2002. Google Scholar
Digital Library
- M. Muller-Hannemann and S. Schirra, editors. Algorithm engineering: bridging the gap between algorithm theory and practice. Springer-Verlag, Berlin, Heidelberg, 2010. ISBN 3-642-14865-4, 978-3-642-14865-1. Google Scholar
Digital Library
- D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. pages 673--695, 1988. Google Scholar
Digital Library
- E. Schonberg, J. T. Schwartz, and M. Sharir. An automatic technique for selection of data representations in setl programs. ACM Trans. Program. Lang. Syst., 3:126--143, April 1981. Google Scholar
Digital Library
- J. T. Schwartz. Automatic data structure choice in a language of very high level. In POPL '75: Proceedings of the 2nd ACM SIGACTSIGPLAN symposium on Principles of programming languages, pages 36--40, 1975. Google Scholar
Digital Library
- O. Shacham, M. Vechev, and E. Yahav. Chameleon: adaptive selection of collections. In PLDI '09: Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation, pages 408--418, 2009. Google Scholar
Digital Library
- W. Siedlecki and J. Sklansky. A note on genetic algorithms for largescale feature selection. Pattern Recogn. Lett., 10(5):335--347, 1989. Google Scholar
Digital Library
- D. D. Sleator and R. E. Tarjan. Self-adjusting binary search trees. J. ACM, 32(3):652--686, 1985. Google Scholar
Digital Library
- A. Stepanov and M. Lee. The standard template library. Technical report, WG21/N0482, ISO Programming Language C++ Project, 1994.Google Scholar
- M. W. Stephenson. Automating the Construction of Compiler Heuristics Using Machine Learning. PhD thesis, Massachusetts Institute of Technology, 2006. Google Scholar
Digital Library
- G. Tournavitis, Z. Wang, B. Franke, and M. F. O'Boyle. Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. In Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation, PLDI '09, pages 177--187, New York, NY, USA, 2009. ACM. Google Scholar
Digital Library
- Z. Wang and M. F. O'Boyle. Mapping parallelism to multi-cores: a machine learning based approach. In PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 75--84, 2009. Google Scholar
Digital Library
- S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms. In Proc. Supercomputing '07, 2007. Google Scholar
Digital Library
- N. Wirth. Algorithms + Data Structures = Programs. Prentice Hall, 1978. Google Scholar
Digital Library
- G. Xu and A. Rountev. Detecting inefficiently-used containers to avoid bloat. In ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation. ACM, 2010. Google Scholar
Digital Library
- G. Xu, N. Mitchell, M. Arnold, A. Rountev, and G. Sevitsky. Software bloat analysis: finding, removing, and preventing performance problems in modern large-scale object-oriented applications. In Proceedings of the FSE/SDP workshop on Future of software engineering research, FoSER '10, pages 421--426, New York, NY, USA, 2010. ACM. Google Scholar
Digital Library
Index Terms
Brainy: effective selection of data structures
Recommendations
Brainy: effective selection of data structures
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationData structure selection is one of the most critical aspects of developing effective applications. By analyzing data structures' behavior and their interaction with the rest of the application on the underlying architecture, tools can make suggestions ...
Selection of representations for data structures
Proceedings of the 1977 symposium on Artificial intelligence and programming languagesThe process of selecting representations for data structures is considered. The model of the selection process we suggest is centered around a base of known abstract data structures and their representations. The abstract data structure for which a ...
Selection of representations for data structures
Proceedings of the 1977 symposium on Artificial intelligence and programming languagesThe process of selecting representations for data structures is considered. The model of the selection process we suggest is centered around a base of known abstract data structures and their representations. The abstract data structure for which a ...







Comments