skip to main content
research-article

SuperC: parsing all of C by taming the preprocessor

Published:11 June 2012Publication History
Skip Abstract Section

Abstract

C tools, such as source browsers, bug finders, and automated refactorings, need to process two languages: C itself and the preprocessor. The latter improves expressivity through file includes, macros, and static conditionals. But it operates only on tokens, making it hard to even parse both languages. This paper presents a complete, performant solution to this problem. First, a configuration-preserving preprocessor resolves includes and macros yet leaves static conditionals intact, thus preserving a program's variability. To ensure completeness, we analyze all interactions between preprocessor features and identify techniques for correctly handling them. Second, a configuration-preserving parser generates a well-formed AST with static choice nodes for conditionals. It forks new subparsers when encountering static conditionals and merges them again after the conditionals. To ensure performance, we present a simple algorithm for table-driven Fork-Merge LR parsing and four novel optimizations. We demonstrate the effectiveness of our approach on the x86 Linux kernel.

References

  1. B. Adams et al. Can we refactor conditional compilation into aspects? In Proc. 8th AOSD, pp. 243--254, Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. V. Aho et al. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 2nd edition, Aug. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. L. Akers et al. Re-engineering C++ component models via automatic program transformation. In Proc. 12th WCRE, pp. 13--22, Nov. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. J. Badros and D. Notkin. A framework for preprocessor-aware C source code analyses. SPE, 30(8):907--924, July 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. I. D. Baxter and M. Mehlich. Preprocessor conditional removal by simple partial evaluation. In Proc. 8th WCRE, pp. 281--290, Oct. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Bessey et al. A few billion lines of code later: Using static analysis to find bugs in the real world. CACM, 53(2):66--75, Feb. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Birman and J. D. Ullman. Parsing algorithms with backtrack. Information and Control, 23(1):1--34, Aug. 1973.Google ScholarGoogle ScholarCross RefCross Ref
  8. A. M. Bishop. C cross referencing and documenting tool. http://www.gedanken.demon.co.uk/cxref/.Google ScholarGoogle Scholar
  9. B. Blanchet et al. A static analyzer for large safety-critical software. In Proc. PLDI, pp. 196--207, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Bowdidge. Performance trade-offs implementing refactoring support for Objective-C. In Proc. 3rd WRT, Oct. 2009.Google ScholarGoogle Scholar
  11. M. Bravenboer and E. Visser. Concrete syntax for objects. In Proc. 19th OOPSLA, pp. 365--383, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. E. Bryant. Graph-based algorithms for boolean function manipulation. TOC, C-35(8):677--691, Aug. 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. DeRemer and T. Pennello. Efficient computation of LALR(1) lookahead sets. TOPLAS, 4(4):615--649, Oct. 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. D. Ernst et al. An empirical analysis of C preprocessor use. TSE, 28(12):1146--1170, Dec. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J.-M. Favre. Understanding-in-the-large. In Proc. 5th IWPC, pp. 29--38, Mar. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Ford. Parsing expression grammars: A recognition-based syntactic foundation. In Proc. 31st POPL, pp. 111--122, Jan. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Free Software Foundation. Bison. http://www.gnu.org/ software/bison/.Google ScholarGoogle Scholar
  18. E. Gagnon. SableCC, an object-oriented compiler framework. Master's thesis, McGill University, Mar. 1998.Google ScholarGoogle Scholar
  19. A. Garrido and R. Johnson. Analyzing multiple configurations of a C program. In Proc. 21st ICSM, pp. 379--388, Sept. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. G. Gleditsch and P. K. Gjermshus. The LXR project. http://lxr.sourceforge.net/.Google ScholarGoogle Scholar
  21. E. Graf et al. Refactoring support for the C++ development tooling. In Companion 22nd OOPSLA, pp. 781--782, Oct. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Grimm. Better extensibility through modular syntax. In Proc. PLDI, pp. 38--51, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. java.net. JJTree reference documentation. http://javacc.java.net/doc/JJTree.html.Google ScholarGoogle Scholar
  24. V. Kabanets and R. Impagliazzo. Derandomizing polynomial identity tests means proving circuit lower bounds. In Proc. 35th STOC, pp. 355--364, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Kästner et al. Partial preprocessing C code for variability analysis. In Proc. 5th VaMoS, pp. 127--136, Jan. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Kästner et al. Variability-aware parsing in the presence of lexical macros and conditional compilation. In Proc. 26th OOPSLA, pp. 805--824, Oct. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Klein et al. JFlex: The fast scanner generator for Java. http://jflex.de/.Google ScholarGoogle Scholar
  28. D. E. Knuth. On the translation of languages from left to right. Information and Control, 8(6):607--639, Dec. 1965.Google ScholarGoogle ScholarCross RefCross Ref
  29. B. McCloskey and E. Brewer. ASTEC: A new approach to refactoring C. In Proc. 10th ESEC, pp. 21--30, Sept. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. McPeak and G. C. Necula. Elkhound: A fast, practical GLR parser generator. In Proc. 13th CC, vol. 2985 of LNCS, pp. 73--88, Mar. 2004.Google ScholarGoogle ScholarCross RefCross Ref
  31. Y. Padioleau. Parsing C/C++ code without pre-processing. In Proc. 18th CC, vol. 5501 of LNCS, pp. 109--125, Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. T. Parr and K. Fisher. LL(*): The foundation of the ANTLR parser generator. In Proc. PLDI, pp. 425--436, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Platoff et al. An integrated program representation and toolkit for the maintenance of C programs. In Proc. ICSM, pp. 129--137, Oct. 1991.Google ScholarGoogle ScholarCross RefCross Ref
  34. D. J. Rosenkrantz and R. E. Stearns. Properties of deterministic top down grammars. In Proc. 1st STOC, pp. 165--180, May 1969. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Roskind. Parsing C, the last word. The comp.compilers newgroup, Jan. 1992. http://groups.google.com/group/comp.compilers/msg/c0797b5b668605b4.Google ScholarGoogle Scholar
  36. D. Spinellis. Global analysis and transformations in preprocessed languages. TSE, 29(11):1019--1030, Nov. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. R. Tartler et al. Configuration coverage in the analysis of large-scale system software. OSR, 45(3):10--14, Dec. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. R. Tartler et al. Feature consistency in compile-time configurable system software: Facing the Linux 10,000 feature problem. In Proc. 6th EuroSys, pp. 47--60, Apr. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Tomita, ed. Generalized LR Parsing. Kluwer, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. E. Visser. Syntax Definition for Language Prototyping. PhD thesis, University of Amsterdam, Sept. 1997.Google ScholarGoogle Scholar
  41. M. Vittek. Refactoring browser with preprocessor. In Proc. 7th CSMR, pp. 101--110, Mar. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. J. Whaley. JavaBDD. http://javabdd.sourceforge.net/.Google ScholarGoogle Scholar

Index Terms

  1. SuperC: parsing all of C by taming the preprocessor

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 47, Issue 6
      PLDI '12
      June 2012
      534 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2345156
      Issue’s Table of Contents
      • cover image ACM Conferences
        PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2012
        572 pages
        ISBN:9781450312059
        DOI:10.1145/2254064

      Copyright © 2012 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 June 2012

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!