skip to main content
10.1145/1111037.1111039acmconferencesArticle/Chapter ViewAbstractPublication PagespoplConference Proceedingsconference-collections
Article

The next 700 data description languages

Published:11 January 2006Publication History

ABSTRACT

In the spirit of Landin, we present a calculus of dependent types to serve as the semantic foundation for a family of languages called data description languages. Such languages, which include pads, datascript, and packettypes, are designed to facilitate programming with ad hoc data, ie, data not in well-behaved relational or xml formats. In the calculus, each type describes the physical layout and semantic properties of a data source. In the semantics, we interpret types simultaneously as the in-memory representation of the data described and as parsers for the data source. The parsing functions are robust, automatically detecting and recording errors in the data stream without halting parsing. We show the parsers are type-correct, returning data whose type matches the simple-type interpretation of the specification. We also prove the parsers are "error-correct," accurately reporting the number of physical and semantic errors that occur in the returned data. We use the calculus to describe the features of various data description languages, and we discuss how we have used the calculus to improve PADS.

References

  1. G. Back. DataScript: A specification and scripting language for binary data. In GPCE, volume 2487, pages 66--77. LNCS, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Birman and J. D. Ullman. Parsing algorithms with backtrack. Information and Control, 23(1), Aug. 1973.Google ScholarGoogle Scholar
  3. W. Burge. Recursive Programming Techniques. Addison Wesley, 1975.Google ScholarGoogle Scholar
  4. D. Eger. Bit level types. www-2.cs.cmu.edu/~eger/.Google ScholarGoogle Scholar
  5. M. F. Fernández, J. Siméon, B. Choi, A. Marian, and G. Sur. Implementing XQuery 1.0: The Galax experience. In VLDB, pages 1077--1080. ACM Press, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Fisher and R. Gruber. PADS: A domain specific language for processing ad hoc data. In PLDI, pages 295--304. ACM Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Ford. Packrat parsing: Simple, powerful, lazy, linear time. In ICFP, pages 36--47. ACM Press, Oct. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Ford. Parsing expression grammars: A recognition-based syntactic foundation. In POPL, pages 111--122. ACM Press, Jan. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gene Ontology Project. www.geneontology.org.Google ScholarGoogle Scholar
  10. R. Grimm. Practical packrat parsing. Technical Report TR2004-854, New York University, Mar. 2004.Google ScholarGoogle Scholar
  11. P. Gustafsson and K. Sagonas. Adaptive pattern matching on binary data. In ESOP, pages 124--139. Springer, Mar. 2004.Google ScholarGoogle ScholarCross RefCross Ref
  12. R. Harper. Programming Languages: Theory and Practice. Unpublished, 2005. www-2.cs.cmu.edu/~rwh/.Google ScholarGoogle Scholar
  13. G. Hutton and E. Meijer. Monadic parsing in Haskell. JFP, 8(4):437--444, July 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Igarashi, B. Pierce, and P. Wadler. Featherwieght Java: A minimal core calculus for Java and GJ. In OOPSLA, pages 132--146. ACM Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Krishnamurthy and J. Rexford. Web Protocols and Practice. Addison Wesley, 2001.Google ScholarGoogle Scholar
  16. P. J. Landin. The next 700 programming languages. CACM, 9(3):157--166, Mar. 1966. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. McCann and S. Chandra. PacketTypes: Abstract specification of network protocol messages. In SIGCOMM, pages 321--333. ACM Press, August 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Tree formats. Workshop on molecular evolution. workshop.molecularevolution.org/resources/fileformats/tree_formats.php.Google ScholarGoogle Scholar
  19. T. J. Parr and R. W. Quong. ANTLR: A predicated- ll(k) parser generator. Software Practice and Experience, 25(7):789--810, July 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Wikström and T. Rogvall. Protocol programming in Erlang using binaries. In Erlang/OTP User Conference, Oct. 1999.Google ScholarGoogle Scholar

Index Terms

  1. The next 700 data description languages

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!