ABSTRACT
In the spirit of Landin, we present a calculus of dependent types to serve as the semantic foundation for a family of languages called data description languages. Such languages, which include pads, datascript, and packettypes, are designed to facilitate programming with ad hoc data, ie, data not in well-behaved relational or xml formats. In the calculus, each type describes the physical layout and semantic properties of a data source. In the semantics, we interpret types simultaneously as the in-memory representation of the data described and as parsers for the data source. The parsing functions are robust, automatically detecting and recording errors in the data stream without halting parsing. We show the parsers are type-correct, returning data whose type matches the simple-type interpretation of the specification. We also prove the parsers are "error-correct," accurately reporting the number of physical and semantic errors that occur in the returned data. We use the calculus to describe the features of various data description languages, and we discuss how we have used the calculus to improve PADS.
- G. Back. DataScript: A specification and scripting language for binary data. In GPCE, volume 2487, pages 66--77. LNCS, 2002. Google Scholar
Digital Library
- A. Birman and J. D. Ullman. Parsing algorithms with backtrack. Information and Control, 23(1), Aug. 1973.Google Scholar
- W. Burge. Recursive Programming Techniques. Addison Wesley, 1975.Google Scholar
- D. Eger. Bit level types. www-2.cs.cmu.edu/~eger/.Google Scholar
- M. F. Fernández, J. Siméon, B. Choi, A. Marian, and G. Sur. Implementing XQuery 1.0: The Galax experience. In VLDB, pages 1077--1080. ACM Press, 2003. Google Scholar
Digital Library
- K. Fisher and R. Gruber. PADS: A domain specific language for processing ad hoc data. In PLDI, pages 295--304. ACM Press, 2005. Google Scholar
Digital Library
- B. Ford. Packrat parsing: Simple, powerful, lazy, linear time. In ICFP, pages 36--47. ACM Press, Oct. 2002. Google Scholar
Digital Library
- B. Ford. Parsing expression grammars: A recognition-based syntactic foundation. In POPL, pages 111--122. ACM Press, Jan. 2004. Google Scholar
Digital Library
- Gene Ontology Project. www.geneontology.org.Google Scholar
- R. Grimm. Practical packrat parsing. Technical Report TR2004-854, New York University, Mar. 2004.Google Scholar
- P. Gustafsson and K. Sagonas. Adaptive pattern matching on binary data. In ESOP, pages 124--139. Springer, Mar. 2004.Google Scholar
Cross Ref
- R. Harper. Programming Languages: Theory and Practice. Unpublished, 2005. www-2.cs.cmu.edu/~rwh/.Google Scholar
- G. Hutton and E. Meijer. Monadic parsing in Haskell. JFP, 8(4):437--444, July 1998. Google Scholar
Digital Library
- A. Igarashi, B. Pierce, and P. Wadler. Featherwieght Java: A minimal core calculus for Java and GJ. In OOPSLA, pages 132--146. ACM Press, 1999. Google Scholar
Digital Library
- B. Krishnamurthy and J. Rexford. Web Protocols and Practice. Addison Wesley, 2001.Google Scholar
- P. J. Landin. The next 700 programming languages. CACM, 9(3):157--166, Mar. 1966. Google Scholar
Digital Library
- P. McCann and S. Chandra. PacketTypes: Abstract specification of network protocol messages. In SIGCOMM, pages 321--333. ACM Press, August 2000. Google Scholar
Digital Library
- Tree formats. Workshop on molecular evolution. workshop.molecularevolution.org/resources/fileformats/tree_formats.php.Google Scholar
- T. J. Parr and R. W. Quong. ANTLR: A predicated- ll(k) parser generator. Software Practice and Experience, 25(7):789--810, July 1995. Google Scholar
Digital Library
- C. Wikström and T. Rogvall. Protocol programming in Erlang using binaries. In Erlang/OTP User Conference, Oct. 1999.Google Scholar
Index Terms
The next 700 data description languages
Recommendations
The next 700 data description languages
In the spirit of Landin, we present a calculus of dependent types to serve as the semantic foundation for a family of languages called data description languages. Such languages, which include pads, datascript, and packettypes, are designed to ...
The next 700 data description languages
Proceedings of the 2006 POPL ConferenceIn the spirit of Landin, we present a calculus of dependent types to serve as the semantic foundation for a family of languages called data description languages. Such languages, which include pads, datascript, and packettypes, are designed to ...
Typing ad hoc data
TLDI '07: Proceedings of the 2007 ACM SIGPLAN international workshop on Types in languages design and implementationTraditionally, types describe the internal data manipulated by programs. To accommodate the variety of desired data structures, language designers and type theorists have developed a wide variety of types and type constructors. But not all useful data ...







Comments