skip to main content
article
Free Access

RE2C: a more versatile scanner generator

Published:01 March 1993Publication History
Skip Abstract Section

Abstract

It is usually claimed that lexical analysis routines are still coded by hand, despite the widespread availability of scanner generators, for efficiency reasons. While efficiency is a consideration, there exist freely available scanner generators such as GLA [Gray 1988] that can generate scanners that are faster than most hand-coded ones. However, most generated scanners are tailored for a particular environment, and retargeting these scanners to other environments, if possible, is usually complex enough to make a hand-coded scanner more appealing. In this paper we describe RE2C, a scanner generator that not only generates scanners that are faster (and usually smaller) than those produced by any other scanner generator known to the authors, including GLA, but that also adapt easily to any environment.

References

  1. AHO, A. V., SETm, R., AND ULLMAN, J.D. 1988. Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, Mass. Google ScholarGoogle Scholar
  2. BERNSTEIN, R.L. 1985. Producing good code for the case statement. Softw. Pract. Exper. 15, 10 (Oct.), 1021-1024. Google ScholarGoogle Scholar
  3. DEREMER, F., AND PENNELLO, T. 1982. Efficient computation of LALR(1) look-ahead sets. ACM Trans. Program. Lang. Syst. 4, 4 (Oct.), 615-649. Google ScholarGoogle Scholar
  4. ELLIS, M., AND STROUSTRUP, B. 1990. The Annotated C ++ Reference Manual. Addison-Wesley, Reading, Mass. Google ScholarGoogle Scholar
  5. FRASER, C. W., AND HANSON, D.R. 1991. A retargetable compiler for ANSI C. SIGPLAN Not. (ACM) 26, 10 (Oct.), 29-43. Google ScholarGoogle Scholar
  6. GAREY, M. R., AND JOHNSON, D.S. 1991. Computers and Intractabihty: A Guide to the Theory of NP-Completeness. Freeman, San Francisco, Calif. Google ScholarGoogle Scholar
  7. GRAY, R.W. 1988. 7-GLA--A generator for lexical analyzers that programmers can use. In USENIX Conference Proceedings (June). USENIX Association, Berkeley, Calif., 147-160.Google ScholarGoogle Scholar
  8. GRAY, R. W., HEURING, V. P., LEVI, S. P., SLOANE, A. M., AND WAITE, W. M. 1992. Eli: A complete, flexible compiler construction system. Commun. ACM 35, 2 (Feb.), 121-131. Google ScholarGoogle Scholar
  9. GROSCH, J. 1989. Efficient generation of lexical ana}ysers. Softw. Pract. Exper. 19, 11, 1089-1103. Google ScholarGoogle Scholar
  10. HARRISON, M.A. 1978. Introductwn to Formal Language Theory. Addison-Wesley, Reading, Mass. Google ScholarGoogle Scholar
  11. HENNESSY, J. L., AND MENDELSOHN, N. 1982. Compilation of the Pascal case statement. Soflw. Pract. Exper. 12, 9 (Sept.), 879-882.Google ScholarGoogle Scholar
  12. HORSPOOL, R. N., AND WHITNEY, M. 1990. Even faster LR parsing. Softw. Pract. Exper. 20, 6, 515-535. Google ScholarGoogle Scholar
  13. JACOBSON, V. 1987. Tuning UNIX Lex or it's NOT true what they say about Lex. In USENIX Conference Proceedings (Washington, D.C.). USENIX Association, Berkeley, Calif., 163-164. (Abstract only.)Google ScholarGoogle Scholar
  14. KERNIGHAN, B. W., AND RITCHIE, D.M. 1988. The C Programming Language. 2nd ed. Prentice- Hall, Englewood Cliffs, N.J. Google ScholarGoogle Scholar
  15. LESK, M. E. 1975. LEX--A }exical analyzer generator. Comput. Sci. Tech. Rep. 39, Bell Telephone Laboratories, Murray Hill, N.J.Google ScholarGoogle Scholar
  16. PAXSON, V. 1988. flex--Man pages. In flex-2.3.7.tar.Z. (Available for anonymous ftp from ftp.uu.net in/packages / gnu.)Google ScholarGoogle Scholar
  17. PENNELLO, T.J. 1986. Very fast LR parsing. In Proceedtngs of the ACM SIGPLAN86 Symposium on Compiler Construction (July). ACM, New York. SIGPLAN Not. 21, 7 (July). Google ScholarGoogle Scholar
  18. SALE, A. 1981. The implementation of case statements in Pascal. Softw. Pract. Exper. 11, 9 (Sept.), 929-942.Google ScholarGoogle Scholar

Index Terms

  1. RE2C: a more versatile scanner generator

      Recommendations

      Reviews

      Manuel E. Bermudez

      RE2C is a tool for generating scanners from regular expressions. The authors argue that the reason developers tend to hand-code scanners is inadequate performance by machine-generated scanners. RE2C is a good solution to this problem because it concentrates on efficient deterministic finite automat on (DFA) recognition of tokens; leaves efficient buffering issues to the user; and provides few bells and whistles, such as end-of-input pseudo-tokens. The authors claim that these limitations make RE2C tailorable to virtually any system without compromising performance. Well-presented statistics substantiate the authors' case. The paper is well written and technically accurate. RE2C indeed seems to perform in a manner comparable to hand-coded scanners. Thus, it is an option that is difficult to ignore for anyone seriously considering alternatives for developing a scanner. The relevance of the entire work is questionable, however; with scanners occupying only a small fraction of overall compilation time, and with processor speed continuing to increase, the authors' achievement loses some of its luster. In addition, while many developers do prefer hand-coded scanners because of efficiency, many others may well prefer them because they need to tweak the scanner to handle difficult constructs that otherwise do not fit the DFA model cleanly. Still, for anyone who is dissatisfied with the performance of machine-generated scanners, this paper presents an excellent alternative.

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Letters on Programming Languages and Systems
        ACM Letters on Programming Languages and Systems  Volume 2, Issue 1-4
        March–Dec. 1993
        241 pages
        ISSN:1057-4514
        EISSN:1557-7384
        DOI:10.1145/176454
        Issue’s Table of Contents

        Copyright © 1993 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 March 1993
        Published in loplas Volume 2, Issue 1-4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!