skip to main content
research-article

Semi-sparse flow-sensitive pointer analysis

Published:21 January 2009Publication History
Skip Abstract Section

Abstract

Pointer analysis is a prerequisite for many program analyses, and the effectiveness of these analyses depends on the precision of the pointer information they receive. Two major axes of pointer analysis precision are flow-sensitivity and context-sensitivity, and while there has been significant recent progress regarding scalable context-sensitive pointer analysis, relatively little progress has been made in improving the scalability of flow-sensitive pointer analysis.

This paper presents a new interprocedural, flow-sensitive pointer analysis algorithm that combines two ideas-semi-sparse analysis and a novel use of BDDs-that arise from a careful understanding of the unique challenges that face flow-sensitive pointer analysis. We evaluate our algorithm on 12 C benchmarks ranging from 11K to 474K lines of code. Our fastest algorithm is on average 197x faster and uses 4.6x less memory than the state of the art, and it can analyze programs that are an order of magnitude larger than the previous state of the art.

References

  1. J. Aycock and R. N. Horspool. Simple generation of static single-assignment form. In 9th International Conference on Compiler Construction (CC), pages 110--124, London, UK, 2000. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. T. Ball, R. Majumdar, T. D. Millstein, and S. K. Rajamani. Automatic predicate abstraction of c programs. In Programming Language Design and Implementation (PLDI), pages 203--213, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Barua, W. Lee, S. Amarasinghe, and A. Agarawal. Compiler support for scalable and efficient memory systems. IEEE Trans. Comput., 50(11):1234--1247, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Berndl, O. Lhotak, F. Qian, L. Hendren, and N. Umanee. Points-to analysis using BDDs. In Programming Language Design and Implementation (PLDI), 2003,pages 103--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Bilardi and K. Pingali. Algorithms for computing the static single assignment form. Journal of the ACM, 50(3):375--425, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. E. Bryant. Graph-based algorithms for Boolean function manipulation. IEEETC, C--35(8):677--691, Aug 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Chang, B. Streiff, and C. Lin. Efficient and extensible security enforcement using dynamic data flow analysis. In Computer and Communications Security (CCS), 2008,pages 39--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. R. Chase, M. Wegman, and F. K. Zadeck. Analysis of pointers and structures. In Programming Language Design and Implementation (PLDI), pages 296--310, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P.-S. Chen, M.-Y. Hung, Y.-S. Hwang, R. D.-C. Ju, and J. K. Lee. Compiler support for speculative multithreading architecture with probabilistic points-to analysis. SIGPLAN Not., 38(10):25--36, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B.-C. Cheng and W.-M. W. Hwu. Modular interprocedural pointer analysis using access paths: Design, implementation, and evaluation. ACM SIG-PLAN Notices, 35(5):57--69, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J.-D. Choi, R. Cytron, and J. Ferrante. Automatic construction of sparse data flow evaluation graphs. In Symposium on Principles of Programming Languages (POPL), pages 55--66, New York, NY, USA, 1991. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Chow, S. Chan, S.-M. Liu, R. Lo, and M. Streich. Effective representation of aliases and indirect memory operations in SSA form. In Compiler Construction, 1996, pages 253--267. Google ScholarGoogle ScholarCross RefCross Ref
  13. R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems, 13(4):451--490, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Cytron and R. Gershbein. Efficient accommodation of may-alias information in SSA form. In Programming Language Design and Implementation (PLDI), June 1993, pages 36--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. K. Cytron and J. Ferrante. Efficiently computing Φ-nodes on-the-fly. ACM Trans. Program. Lang. Syst, 17(3):487--506, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. Duesterwald, R. Gupta, and M. L. Soffa. Reducing the cost of data flow analysis by congruence partitioning. In Compiler Construction, 1994, pages 357--373. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Fink, E. Yahav, N. Dor, G. Ramalingam, and E. Geay. Effective typestate verification in the presence of aliasing. In International Symposium on Software Testing and Analysis, pages 133--144, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Ghiya. Putting pointer analysis to work. In Principles of Programming Languages (POPL), 1998,pages 121--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Goyal. An improved intra-procedural may-alias analysis algorithm. Technical report TR1999--777, New York University, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Z. Guyer and C. Lin. Error checking with client-driven pointer analysis. Science of Computer Programming, 58(1-2):83--114, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Hackett and R. Rugina. Region-based shape analysis with tracked locations. In Symposium on Principles of Programming Languages, pages 310--323, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. Hardekopf and C. Lin. The Ant and the Grasshopper: Fast and accurate pointer analysis for millions of lines of code. In Programming Language Design and Implementation (PLDI), pages 290--299, San Diego, CA, USA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. Hardekopf and C. Lin. Exploiting pointer and location equivalence to optimize pointer analysis. In International Static Analysis Symposium (SAS), pages 265--280, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Hasti and S. Horwitz. Using static single assignment form to improve flow-insensitive pointer analysis. In Programming Language Design and Implementation (PLDI), 1998,pages 97--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. N. Heintze and O. Tardieu. Ultra-fast aliasing analysis using CLA: A million lines of C code in a second. In Programming Language Design and Implementation (PLDI), pages 23--34, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Hind. Pointer analysis: haven't we solved this problem yet? In Workshop on Program Analysis for Software Tools and Engineering (PASTE), pages 54--61, New York, NY, USA, 2001. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Hind, M. Burke, P. Carini, and J.-D. Choi. Interprocedural pointer alias analysis. ACM Transactions on Programming Languages and Systems, 21(4):848--894, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Hind and A. Pioli. Assessing the effects of flow-sensitivity on pointer alias analyses. In Static Analysis Symposium, pages 57--81, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. V. Kahlon. Bootstrapping: a technique for scalable flow and context-sensitive pointer alias analysis. In Programming language design and implementation, pages 249--259, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H.-S. Kim, E. M. Nystrom, R. D. Barnes, and W.-M. W. Hwu. Compaction algorithm for precise modular context-sensitive points--to analysis. Technical report IMPACT-03-03, Center for Reliable and High Performance Computing, University of Illinois, Urbana-Champaign, 2003.Google ScholarGoogle Scholar
  31. C. Lapkowski and L. J. Hendren. Extended SSA numbering: introducing SSA properties to languages with multi-level pointers. In CASCON '96: Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research, page 23, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. Lattner. LLVM: An infrastructure for multi-stage optimization. Master's thesis, Computer Science Dept., University of Illinois at Urbana-Champaign, Dec 2002.Google ScholarGoogle Scholar
  33. C. Lattner and V. Adve. Data structure analysis: An efficient context-sensitive heap analysis. Technical Report UIUCDCS-R-2003-2340, Computer Science Dept, University of Illinois at Urbana-Champaign, 2003.Google ScholarGoogle Scholar
  34. O. Lhotak, S. Curial, and J. Amaral. Using ZBDDs in points-to analysis. In Workshops on Languages and Compilers for Parallel Computing (LCPC), 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Lind-Nielson. BuDDy, a binary decision package.Google ScholarGoogle Scholar
  36. A. Milanova and B. G. Ryder. Annotated inclusion constraints for precise flow analysis. In ICSM '05: Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM'05), pages 187--196, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Mock, D. Atkinson, C. Chambers, and S. Eggers. Improving program slicing with dynamic points-to data. In Foundations of Software Engineering, pages 71--80, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. Novillo. Design and implementation of Tree SSA, 2004.Google ScholarGoogle Scholar
  39. E. M. Nystrom, H.-S. Kim, and W. mei W. Hwu. Bottom-up and top-down context-sensitive summary-based pointer analysis. In International Symposium on Static Analysis, pages 165--180, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  40. D. Pearce, P. Kelly, and C. Hankin. Efficient field-sensitive pointer analysis for C. In ACM Workshop on Program Analysis for Software Tools and Engineering (PASTE), pages 37--42, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. D. J. Pearce, P. H. J. Kelly, and C. Hankin. Online cycle detection and difference propagation for pointer analysis. In 3rd International IEEE Workshop on Source Code Analysis and Manipulation (SCAM), pages 3--12, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  42. G. Ramalingam. On sparse evaluation representations. Theoretical Computer Science, 277(1-2):119--147, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. H. Reif and H. R. Lewis. Symbolic evaluation and the global value graph. In Principles of programming languages (POPL), pages 104--118, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. A. Rountev and S. Chandra. Off-line variable substitution for scaling points-to analysis. ACM SIGPLAN Notices, 35(5):47--56, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. A. Salcianu and M. Rinard. Pointer and escape analysis for multithreaded programs. In PPoPP '01: Proceedings of the Eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, pages 12--23, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. M. Shapiro and S. Horwitz. The effects of the precision of pointer analysis. Lecture Notes in Computer Science, 1302:16--34, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. T. B. Tok, S. Z. Guyer, and C. Lin. Efficient flow-sensitive interprocedural data-flow analysis in the presence of pointers. In 15th International Conference on Compiler Construction (CC), pages 17--31, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. J. Whaley and M. S. Lam. Cloning--based context-sensitive pointer alias analysis. In Programming Language Design and Implementation (PLDI), pages 131--144, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. R. P. Wilson and M. S. Lam. Efficient context-sensitive pointer analysis for C programs. In Programming Language Design and Implementation (PLDI), pages 1--12, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. J. Zhu. Symbolic pointer analysis. In International Conference on Computer-Aided Design (ICCAD), pages 150---157, New York, NY, USA, 2002. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. J. Zhu. Towards scalable flow and context sensitive pointer analysis. In DAC '05: Proceedings of the 42nd Annual Conference on Design Automation, pages 831--836, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. J. Zhu and S. Calman. Symbolic pointer analysis revisited. In Programming Language Design and Implementation (PLDI), pages 145--157, New York, NY, USA, 2004. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Semi-sparse flow-sensitive pointer analysis

        Recommendations

        Reviews

        Charles Robert Morgan

        This paper describes techniques for dramatically improving the performance of flow-sensitive, context-insensitive pointer analysis. It is a combination of improved engineering, careful data structure decisions, and new algorithm optimizations. The algorithm is used to analyze 12 large C programs. Two of the programs-Ghostscript and GDB-require more resources than are available, but the other ten programs-including a version of GCC-are analyzed, showing improvements of two orders of magnitude over previous algorithms. The engineering involves the insight that variables can be divided into three categories: variables that have nothing to do with pointers, variables whose address is never taken, and variables whose address is taken. The first set of variables can be ignored for pointer analysis. The second set of variables is efficiently analyzed using static single assignment form. The third set of variables uses def-use information in nonstatic single assignment form. The algorithm is organized to decrease memory and computational requirements. The analysis is a worklist algorithm, where the list is organized so that predecessor nodes are analyzed before successors, thus increasing the quality of the points-to information available when analyzing each node. Information is pruned from points-to information when it is no longer relevant-such as returning from a procedure. The paper analyzes two representations for points-to information: bit vectors and binary decision diagrams (BDDs). The paper concludes that BDDs are more efficient in time and space; however, the paper must address the issue that previous uses of BDDs could not handle the case of strong updates or situations where a store operation removes all previous information about stores, using that variable as a pointer. The paper develops two techniques for determining when two variables have the same points-to information, allowing shared data structures. This is a paper worth studying. It seems to be part of a PhD thesis. This algorithm provides significant improvements in the computation of points-to information. It will probably be even more effective for a strongly typed language such as Java or C#. The techniques are not yet strong enough to handle all systems' programs, but Hardekopf and Lin hint at further progress that may lead to their analysis. Online Computing Reviews Service

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 44, Issue 1
          POPL '09
          January 2009
          453 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/1594834
          Issue’s Table of Contents
          • cover image ACM Conferences
            POPL '09: Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
            January 2009
            464 pages
            ISBN:9781605583792
            DOI:10.1145/1480881

          Copyright © 2009 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 21 January 2009

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!