skip to main content
research-article

Brainy: effective selection of data structures

Published:04 June 2011Publication History
Skip Abstract Section

Abstract

Data structure selection is one of the most critical aspects of developing effective applications. By analyzing data structures' behavior and their interaction with the rest of the application on the underlying architecture, tools can make suggestions for alternative data structures better suited for the program input on which the application runs. Consequently, developers can optimize their data structure usage to make the application conscious of an underlying architecture and a particular program input.

This paper presents the design and evaluation of Brainy, a new program analysis tool that automatically selects the best data structure for a given program and its input on a specific microarchitecture. The data structure's interface functions are instrumented to dynamically monitor how the data structure interacts with the application for a given input. The instrumentation records traces of various runtime characteristics including underlying architecture-specific events. These generated traces are analyzed and fed into an offline model, constructed using machine learning, to select the best data structure. That is, Brainy exploits runtime feedback of data structures to model the situation an application runs on, and selects the best data structure for a given application/input/architecture combination based on the constructed model. The empirical evaluation shows that this technique is highly accurate across several real-world applications with various program input sets on two different state-of-the-art microarchitectures. Consequently, Brainy achieved an average performance improvement of 27% and 33% on both microarchitectures, respectively.

References

  1. C. W. Antoine, A. Petitet, and J. J. Dongarra. Automated empirical optimization of software and the atlas project. Parallel Computing, 27:2001, 2000.Google ScholarGoogle Scholar
  2. M. Aref. Discussions on the LogicBlox Datalog Optimization Engine, 2009. personal communication.Google ScholarGoogle Scholar
  3. I.-H. Chung. Towards Automatic Performance Tuning. PhD thesis, University of Maryland, College Park, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. E. Coons, B. Robatmili, M. E. Taylor, A. Maher, D. Burger, and K. S. Mckinley. Feature selection and policy optimization for distributed instruction placement using reinforcement learning. In PACT'08 : Proceddings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Q. Ding and C. Xiang. Overfitting problem: a new perspective from the geometrical interpretation of mlp. pages 50--57, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Dongarra, K. London, S. Moore, P. Mucci, and D. Terpstra. Using papi for hardware performance monitoring on linux systems. In Proceedings of the 2nd International Conference on Linux Clusters: The HPC Revolution, Linux Clusters Institute, 2001.Google ScholarGoogle Scholar
  7. C. Dubach, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'Boyle, and O. Temam. Fast compiler optimisation evaluation using code-feature based performance prediction. In ACM International Conference on Computing Frontiers, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. GCC, the GNU Compiler Collection. the gcc team, 2010. http://gcc.gnu.org.Google ScholarGoogle Scholar
  9. Google. Google code search, 2009. http://www.google.com/codesearh.Google ScholarGoogle Scholar
  10. D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. H. Hassoun. Fundamentals of Artificial Neural Networks. MIT Press, Cambridge, MA, USA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Hussein. Genetic algorithms for feature selection and weighting, a review and study. In ICDAR '01: Proceedings of the Sixth International Conference on Document Analysis and Recognition, page 1240, Washington, DC, USA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Jarmulak and S. Craw. S.: Genetic algorithms for feature selection and weighting. in. In Proceedings of the IJCAI'99 workshop on Automating the Construction of Case Based Reasoners, pages 28--33, 1999.Google ScholarGoogle Scholar
  14. C. Jung and N. Clark. Ddt: design and evaluation of a dynamic program analysis for optimizing data structure usage. In MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 56--66, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. B. Kotsiantis. Supervised machine learning: A review of classification techniques. Informatica (Slovenia), 31(3):249--268, 2007.Google ScholarGoogle Scholar
  16. H. Leather, E. Bonilla, and M. O'Boyle. Automatic Feature Generation for Machine Learning Based Optimizing Compilation. In Proc. of the 2009 International Symposium on Code Generation and Optimization, Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. Liu and S. Rus. perflint: A Context Sensitive Performance Advisor for C++ Programs. In Proc. of the 2009 International Symposium on Code Generation and Optimization, Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Mens and T. Tourwe. A survey of software refactoring. IEEE Transactions on Software Engineering, 30(2):126--139, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. N. Mitchell and G. Sevitsky. The causes of bloat, the limits of health. In Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications, OOPSLA '07, pages 245--260, New York, NY, USA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. N. Mitchell, E. Schonberg, and G. Sevitsky. Four trends leading to java runtime bloat. IEEE Software, 27:56--63, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. M. Mitchell. Machine Learning. McGraw-Hill, New York, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Monsifrot, F. Bodin, and R. Quiniou. A machine learning approach to automatic production of compiler heuristics. In AIMSA '02: Proceedings of the 10th International Conference on Artificial Intelligence: Methodology, Systems, and Applications, pages 41--50, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Muller-Hannemann and S. Schirra, editors. Algorithm engineering: bridging the gap between algorithm theory and practice. Springer-Verlag, Berlin, Heidelberg, 2010. ISBN 3-642-14865-4, 978-3-642-14865-1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. pages 673--695, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. E. Schonberg, J. T. Schwartz, and M. Sharir. An automatic technique for selection of data representations in setl programs. ACM Trans. Program. Lang. Syst., 3:126--143, April 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. T. Schwartz. Automatic data structure choice in a language of very high level. In POPL '75: Proceedings of the 2nd ACM SIGACTSIGPLAN symposium on Principles of programming languages, pages 36--40, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. O. Shacham, M. Vechev, and E. Yahav. Chameleon: adaptive selection of collections. In PLDI '09: Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation, pages 408--418, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. W. Siedlecki and J. Sklansky. A note on genetic algorithms for largescale feature selection. Pattern Recogn. Lett., 10(5):335--347, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. D. Sleator and R. E. Tarjan. Self-adjusting binary search trees. J. ACM, 32(3):652--686, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Stepanov and M. Lee. The standard template library. Technical report, WG21/N0482, ISO Programming Language C++ Project, 1994.Google ScholarGoogle Scholar
  31. M. W. Stephenson. Automating the Construction of Compiler Heuristics Using Machine Learning. PhD thesis, Massachusetts Institute of Technology, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. G. Tournavitis, Z. Wang, B. Franke, and M. F. O'Boyle. Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. In Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation, PLDI '09, pages 177--187, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Z. Wang and M. F. O'Boyle. Mapping parallelism to multi-cores: a machine learning based approach. In PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 75--84, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms. In Proc. Supercomputing '07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. N. Wirth. Algorithms + Data Structures = Programs. Prentice Hall, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. G. Xu and A. Rountev. Detecting inefficiently-used containers to avoid bloat. In ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. G. Xu, N. Mitchell, M. Arnold, A. Rountev, and G. Sevitsky. Software bloat analysis: finding, removing, and preventing performance problems in modern large-scale object-oriented applications. In Proceedings of the FSE/SDP workshop on Future of software engineering research, FoSER '10, pages 421--426, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Brainy: effective selection of data structures

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 46, Issue 6
        PLDI '11
        June 2011
        652 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/1993316
        Issue’s Table of Contents
        • cover image ACM Conferences
          PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
          June 2011
          668 pages
          ISBN:9781450306638
          DOI:10.1145/1993498
          • General Chair:
          • Mary Hall,
          • Program Chair:
          • David Padua

        Copyright © 2011 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 June 2011

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!