skip to main content
research-article
Public Access

Graspan: A Single-machine Disk-based Graph System for Interprocedural Static Analyses of Large-scale Systems Code

Authors Info & Claims
Published:04 April 2017Publication History
Skip Abstract Section

Abstract

There is more than a decade-long history of using static analysis to find bugs in systems such as Linux. Most of the existing static analyses developed for these systems are simple checkers that find bugs based on pattern matching. Despite the presence of many sophisticated interprocedural analyses, few of them have been employed to improve checkers for systems code due to their complex implementations and poor scalability. In this paper, we revisit the scalability problem of interprocedural static analysis from a "Big Data" perspective. That is, we turn sophisticated code analysis into Big Data analytics and leverage novel data processing techniques to solve this traditional programming language problem. We develop Graspan, a disk-based parallel graph system that uses an edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.

We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations. Moreover, we show that these analyses can be used to augment the existing checkers; these augmented checkers uncovered 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.

References

  1. The findbugs Java static checker. http://findbugs.sourceforge.net/, 2015.Google ScholarGoogle Scholar
  2. The Coverity code checker. http://www.coverity.com/, 2016.Google ScholarGoogle Scholar
  3. The GrammaTech CodeSonar static checker, 2016.Google ScholarGoogle Scholar
  4. The HP Fortify static checker, 2016.Google ScholarGoogle Scholar
  5. The KlocWork static checker, 2016.Google ScholarGoogle Scholar
  6. The LLVMLinux project. http://llvm.linuxfoundation.org/, 2016.Google ScholarGoogle Scholar
  7. The LogicBlox Datalog engine. http://www.logicblox.com/, 2016.Google ScholarGoogle Scholar
  8. Personal communication with John Criswell, 2016.Google ScholarGoogle Scholar
  9. A. Aiken, S. Bugrara, I. Dillig, T. Dillig, B. Hackett, and P. Hawkins. An overview of the Saturn project. In PASTE, pages 43--48, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Alur. Marrying words and trees. In PODS, pages 233--242, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Alur, M. Benedikt, K. Etessami, P. Godefroid, T. Reps, and M. Yannakakis. Analysis of recursive state machines. ACM Trans. Program. Lang. Syst., 27(4):786--818, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Alur and P. Madhusudan. Visibly pushdown languages. In STOC, pages 202--211, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. D. Atkinson, J.-R. Sack, N. Santoro, and T. Strothotte. Min-max heaps and generalized priority queues. Commun. ACM, 29(10):996--1000, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Ball, B. Cook, V. Levin, and S. K. Rajamani. SLAM and static driver verifier: Technology transfer of formal methods inside microsoft. In IFM, pages 1--20, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  15. T. Ball, R. Majumdar, T. Millstein, and S. K. Rajamani. Automatic predicate abstraction of c programs. In PLDI, pages 203--213, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. O. Bastani, S. Anand, and A. Aiken. Specification inference using context-free language reachability. In POPL, pages 553--566, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Bravenboer and Y. Smaragdakis. Strictly declarative specification of sophisticated points-to analyses. In OOPSLA, pages 243--262, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. F. Brown, A. Notzli, and D. Engler. How to build static checking systems using orders of magnitude less code. In ASPLOS, pages 143--157, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Bugrara and A. Aiken. Verifying the safety of user pointer dereferences. In IEEE S&P, pages 325--338, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Cadar, D. Dunbar, and D. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI, pages 209--224, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Cadar, V. Ganesh, P. M. Pawlowski, D. L. Dill, and D. R. Engler. EXE: Automatically generating inputs of death. In CCS, pages 322--335, 2006.Google ScholarGoogle Scholar
  22. R. Chen, X. Ding, P. Wang, H. Chen, B. Zang, and H. Guan. Computation and communication efficient graph processing with distributed immutable view. In HPDC, pages 215--226, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Chen, J. Shi, Y. Chen, and H. Chen. PowerLyra: Differentiated graph computation and partitioning on skewed graphs. In EuroSys, pages 1:1--1:15, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler. An empirical study of operating systems errors. In SOSP, pages 73--88, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. DeLine and M. F\"ahndrich. Enforcing high-level protocols in low-level software. In PLDI, pages 59--69, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. N. Dor, S. Adams, M. Das, and Z. Yang. Software validation via scalable path-sensitive value flow analysis. In ISSTA, pages 12--22, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Engler. Making finite verification of raw C code easier than writing a test case. In RV. Invited talk.Google ScholarGoogle Scholar
  28. D. Engler, B. Chelf, A. Chou, and S. Hallem. Checking system rules using system-specific, programmer-written compiler extensions. In OSDI, pages 1--1, 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf. Bugs as deviant behavior: A general approach to inferring errors in systems code. In SOSP, pages 57--72, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Fink, E. Yahav, N. Dor, G. Ramalingam, and E. Geay. Effective typestate verification in the presence of aliasing. In ISSTA, pages 133--144, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. S. Foster, M. F\"ahndrich, and A. Aiken. A theory of type qualifiers. In PLDI, pages 192--203, 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, pages 17--30, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. GraphX: Graph processing in a distributed dataflow framework. In OSDI, pages 599--613, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. GraphX: Graph processing in a distributed dataflow framework. In OSDI, pages 599--613, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Hallem, B. Chelf, Y. Xie, and D. Engler. A system and language for building system-specific, static analyses. In PLDI, pages 69--82, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. W.-S. Han, S. Lee, K. Park, J.-H. Lee, M.-S. Kim, J. Kim, and H. Yu. TurboGraph: A fast parallel graph engine handling billion-scale graphs in a single PC. In KDD, pages 77--85, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Hind. Pointer analysis: Haven't we solved this problem yet? In PASTE, pages 54--61, 2001.Google ScholarGoogle Scholar
  38. S. Horwitz, T. Reps, and M. Sagiv. Demand interprocedural dataflow analysis. In FSE, pages 104--115, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. G. F. Italiano. Amortized efficiency of a path retrieval data structure. Theor. Comput. Sci., 48(2--3):273--281, 1986.Google ScholarGoogle Scholar
  40. R. Johnson and D. Wagner. Finding user/kernel pointer bugs with type inference. In USENIX Security, pages 9--9, 2004.Google ScholarGoogle Scholar
  41. G. Kastrinis and Y. Smaragdakis. Hybrid context-sensitivity for points-to analysis. In PLDI, pages 423--434, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. J. Kodumal and A. Aiken. The set constraint/CFL reachability connection in practice. In PLDI, pages 207--218, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. Kodumal and A. Aiken. Regularly annotated set constraints. In PLDI, pages 331--341, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. A. Kyrola, G. Blelloch, and C. Guestrin. GraphChi: Large-scale graph computation on just a PC. In OSDI, pages 31--46, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. M. S. Lam, S. Guo, and J. Seo. SociaLite: Datalog extensions for efficient social network analysis. In ICDE, pages 278--289, 2013.Google ScholarGoogle Scholar
  46. B. W. Lampson. Hints for computer system design. In SOSP, pages 33--48, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. C. Lattner, A. Lenharth, and V. Adve. Making context-sensitive points-to analysis with heap cloning practical for the real world. In PLDI, pages 278--289, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-Miner: A tool for finding copy-paste and related bugs in operating system code. In OSDI, pages 20--20, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Z. Li and Y. Zhou. PR-Miner: Automatically extracting implicit programming rules and detecting violations in large software code. In FSE, pages 306--315, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Z. Lin, M. Kahng, K. M. Sabrin, D. H. P. Chau, H. Lee, , and U. Kang. MMap: Fast billion-scale graph computation on a pc via memory mapping. In BigData, pages 159--164, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  51. Y. Liu and A. Milanova. Static analysis for inference of explicit information flow. In PASTE, pages 50--56, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8):716--727, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, G. Czajkowski, and G. Inc. Pregel: A system for large-scale graph processing. In SIGMOD, pages 135--146, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. D. Melski and T. Reps. Interconvertibility of a class of set constraints and context-free-language reachability. Theoretical Computer Science, 248:29--98, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a timely dataflow system. In SOSP, pages 439--455. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. G. C. Necula, J. Condit, M. Harren, S. McPeak, and W. Weimer. CCured: Type-safe retrofitting of legacy software. ACM Trans. Program. Lang. Syst., 27(3):477--526, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In SOSP, pages 456--471, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Y. Padioleau, J. Lawall, R. R. Hansen, and G. Muller. Documenting and automating collateral evolutions in linux device drivers. In EuroSys, pages 247--260, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. N. Palix, G. Thomas, S. Saha, C. Calvès, J. Lawall, and G. Muller. Faults in linux: Ten years later. In ASPLOS, pages 305--318, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. M. Pundir, L. M. Leslie, I. Gupta, and R. H. Campbell. Zorro: Zero-cost reactive failure recovery in distributed graph processing. In SoCC, pages 195--208, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. J. Rehof and M. F\"ahndrich. Type-based flow analysis: From polymorphic subtyping to CFL-reachability. In POPL, pages 54--66, 2001.Google ScholarGoogle Scholar
  62. T. Reps. Solving demand versions of interprocedural analysis problems. In CC, pages 389--403, 1994. Google ScholarGoogle ScholarCross RefCross Ref
  63. T. Reps. Shape analysis as a generalized path problem. In PEPM, pages 1--11, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. T. Reps. Program analysis via graph reachability. Information and Software Technology, 40(11--12):701--726, 1998.Google ScholarGoogle Scholar
  65. T. Reps, S. Horwitz, and M. Sagiv. Precise interprocedural dataflow analysis via graph reachability. In POPL, pages 49--61, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. T. Reps, S. Horwitz, M. Sagiv, and G. Rosay. Speeding up slicing. In FSE, pages 11--20, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. L. Roditty and U. Zwick. A fully dynamic reachability algorithm for directed graphs with an almost linear update time. In STOC, pages 184--191, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. A. Roy, L. Bindschaedler, J. Malicevic, and W. Zwaenepoel. Chaos: Scale-out graph processing from secondary storage. In SOSP, pages 410--424, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. A. Roy, I. Mihailovic, and W. Zwaenepoel. X-Stream: Edge-centric graph processing using streaming partitions. In SOSP, pages 472--488, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. C. Rubio-González, H. S. Gunawi, B. Liblit, R. H. Arpaci-Dusseau, and A. C. Arpaci-Dusseau. Error propagation analysis for file systems. In PLDI, pages 270--280, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. C. Rubio-González and B. Liblit. Defective error/pointer interactions in the linux kernel. In ISSTA, pages 111--121, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. M. Sagiv, T. Reps, and S. Horwitz. Precise interprocedural dataflow analysis with applications to constant propagation. Theoretical Computer Science, 167(1--2):131--170, 1996.Google ScholarGoogle Scholar
  73. M. Sharir and A. Pnueli. Two approaches to interprocedural data flow analysis. In S. Muchnick and N. Jones, editors, Program Flow Analysis: Theory and Applications, pages 189--234. Prentice Hall, 1981.Google ScholarGoogle Scholar
  74. A. Shkapsky, M. Yang, M. Interlandi, H. Chiu, T. Condie, and C. Zaniolo. Big data analytics with datalog queries on spark. In SIGMOD, pages 1135--1149, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. J. Shun and G. E. Blelloch. Ligra: A lightweight graph processing framework for shared memory. In PPoPP, pages 135--146, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Y. Smaragdakis, M. Bravenboer, and O. Lhoták. Pick your contexts well: Understanding object-sensitivity. In POPL, pages 17--30, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Y. Smaragdakis, G. Kastrinis, and G. Balatsouras. Introspective analysis: Context-sensitivity, across the board. In PLDI, pages 485--495, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. M. Sridharan and R. Bodik. Refinement-based context-sensitive points-to analysis for Java. In PLDI, pages 387--400, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. M. Sridharan, D. Gopan, L. Shan, and R. Bodik. Demand-driven points-to analysis for Java. In OOPSLA, pages 59--76, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. H. Tang, X. Wang, L. Zhang, B. Xie, L. Zhang, and H. Mei. Summary-based context-sensitive data-dependence analysis in presence of callbacks. In POPL, pages 83--95, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. K. Vora, R. Gupta, and G. Xu. Synergistic analysis of evolving graphs. ACM Trans. Archit. Code Optim., 13(4):32:1--32:27, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. K. Vora, R. Gupta, and G. Xu. Kickstarter: Fast and accurate computations on streaming graphs via trimmed approximations. In ASPLOS, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. K. Vora, S. C. Koduru, and R. Gupta. ASPIRE: Exploiting asynchronous parallelism in iterative algorithms using a relaxed consistency based dsm. In OOPSLA, pages 861--878, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. K. Vora, G. Xu, and R. Gupta. Load the edges you need: A generic I/O optimization for disk-based graph processing. In USENIX ATC, pages 507--522, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. G. Wang, W. Xie, A. Demers, and J. Gehrke. Asynchronous large-scale graph processing made easy. In CIDR, 2013.Google ScholarGoogle Scholar
  86. J. Wang, M. Balazinska, and D. Halperin. Asynchronous and fault-tolerant recursive datalog evaluation in shared-nothing engines. PVLDB, 8(12):1542--1553, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. K. Wang, G. Xu, Z. Su, and Y. D. Liu. GraphQ: Graph query processing with abstraction refinement\textemdashprogrammable and budget-aware analytical queries over very large graphs on a single PC. In USENIX ATC, pages 387--401, 2015.Google ScholarGoogle Scholar
  88. X. Wang, N. Zeldovich, M. F. Kaashoek, and A. Solar-Lezama. Towards optimization-safe systems: Analyzing the impact of undefined behavior. In SOSP, pages 260--275, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. C. Weiss, C. Rubio-González, and B. Liblit. Database-backed program analysis for scalable error propagation. In ICSE, pages 586--597, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  90. J. Whaley and M. Lam. Cloning-based context-sensitive pointer alias analysis using binary decision diagrams. In PLDI, pages 131--144, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. G. Xu, A. Rountev, and M. Sridharan. Scaling CFL-reachability-based points-to analysis using context-sensitive must-not-alias analysis. In ECOOP, pages 98--122, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. D. Yan, G. Xu, and A. Rountev. Demand-driven context-sensitive alias analysis for Java. In ISSTA, pages 155--165, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. J. Yang, C. Sar, and D. Engler. EXPLODE: A lightweight, general system for finding serious storage system errors. In OSDI, pages 10--10, 2006.Google ScholarGoogle Scholar
  94. M. Yannakakis. Graph-theoretic methods in database theory. In PODS, pages 230--242, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. D. M. Yellin. Speeding up dynamic transitive closure for bounded degree graphs. Acta Inf., 30(4):369--384, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. S. Yong, S. Horwitz, and T. Reps. Pointer analysis for programs with structures and casting. In PLDI, pages 91--103, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Q. Zhang, M. R. Lyu, H. Yuan, and Z. Su. Fast algorithms for Dyck-CFL-reachability with applications to alias analysis. In PLDI, pages 435--446, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Q. Zhang and Z. Su. Context-sensitive data dependence analysis via linear conjunctive language reachability. In POPL, pages 344--358, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Q. Zhang, X. Xiao, C. Zhang, H. Yuan, and Z. Su. Efficient subcubic alias analysis for C. In OOPSLA, pages 829--845, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. D. Zheng, D. Mhembere, R. Burns, J. Vogelstein, C. E. Priebe, and A. S. Szalay. FlashGraph: processing billion-node graphs on an array of commodity ssds. In FAST, pages 45--58, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. X. Zheng and R. Rugina. Demand-driven alias analysis for C. In POPL, pages 197--208, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. X. Zhu, W. Han, and W. Chen. GridGraph: Large scale graph processing on a single machine using 2-level hierarchical partitioning. In USENIX ATC, pages 375--386, 2015.Google ScholarGoogle Scholar

Index Terms

  1. Graspan: A Single-machine Disk-based Graph System for Interprocedural Static Analyses of Large-scale Systems Code

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 52, Issue 4
        ASPLOS '17
        April 2017
        811 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/3093336
        Issue’s Table of Contents
        • cover image ACM Conferences
          ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
          April 2017
          856 pages
          ISBN:9781450344654
          DOI:10.1145/3037697

        Copyright © 2017 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 April 2017

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!