Abstract
We present a calculus for processing semistructured data that spans differences of application area among several novel query languages, broadly categorized as "NoSQL". This calculus lets users define their own operators, capturing a wider range of data processing capabilities, whilst providing a typing precision so far typical only of primitive hard-coded operators. The type inference algorithm is based on semantic type checking, resulting in type information that is both precise, and flexible enough to handle structured and semistructured data. We illustrate the use of this calculus by encoding a large fragment of Jaql, including operations and iterators over JSON, embedded SQL expressions, and co-grouping, and show how the encoding directly yields a typing discipline for Jaql as it is, namely without the addition of any type definition or type annotation in the code.
Supplemental Material
- A. Albano, G. Ghelli, and R. Orsini. Fibonacci: A programming language for object databases. The VLDB Journal, 4:403--444, 1995. Google Scholar
Digital Library
- N. Alon, T. Milo, F. Neven, D. Suciu, and V. Vianu. XML with data values: typechecking revisited. In PODS\,'01. ACM, 2001. Google Scholar
Digital Library
- N. Alon, T. Milo, F. Neven, D. Suciu, and V. Vianu. Typechecking XML views of relational databases. ACM Trans. Comput. Logic, 4:315--354, July 2003. Google Scholar
Digital Library
- A. Behm phet al. Asterix: towards a scalable, semistructured data platform for evolving-world models. DAPD, 29(3):185--216, 2011. Google Scholar
Digital Library
- V. Benzaken, G. Castagna, and A. Frisch. CDuce: an XML-friendly general purpose language. In ICFP\,'03. ACM, 2003. Google Scholar
Digital Library
- K. Beyer phet al. Jaql: A scripting language for large scale semistructured data analysis. PVLDB, 4(12):1272--1283, 2011.Google Scholar
- S. Boag, D. Chamberlain, M. F. Fernández, D. Florescu, J. Robie, and J. Siméon. XQuery 1.0: An XML query language, W3C rec., 2007.Google Scholar
- P. Buneman, L. Libkin, D. Suciu, V. Tannen, and L. Wong. Comprehension syntax. SIGMOD Record, 23(1):87--96, 1994. Google Scholar
Digital Library
- P. Buneman, R. Nikhil, and R. Frankel. A Practical Functional Programming System for Databases. In Proc. Conference on Functional Programming and Architecture. ACM, 1981. Google Scholar
Digital Library
- G. Castagna and K. Nguyen. Typed iterators for XML. In ICFP'08. ACM, 2008. Google Scholar
Digital Library
- H. Comon, M. Dauchet, R. Gilleron, F. Jacquemard, C. Löding, D. Lugiez, S. Tison, and M. Tommasi. Tree automata techniques and applications. http://www.grappa.univ-lille3.fr/tata, 2007.Google Scholar
- G. Copeland and D. Maier. Making Smalltalk a database system. In ACM SIGMOD Conf., 1984. Google Scholar
Digital Library
- J. Engelfriet. Top-down tree transducers with regular look-ahead. Mathematical Systems Theory, 10(1):289--303, Dec. 1976.Google Scholar
Cross Ref
- A. Frisch. Théorie, conception et réalisation d'un langage de programmation adapté à XML. PhD thesis, Université Paris 7 Denis Diderot, 2004.Google Scholar
- A. Frisch, G. Castagna, and V. Benzaken. Semantic subtyping: Dealing set-theoretically with function, union, intersection, and negation types. Journal of the ACM, 55(4):1--64, 2008. Google Scholar
Digital Library
- Jaql.texttthttp://code.google.com/p/jaql.Google Scholar
- JavaScript Object Notation (JSON).texttthttp://json.org/.Google Scholar
- W. Martens and F. Neven. Typechecking top-down uniform unranked tree transducers. In ICDT\,'03. Springer, 2002. Google Scholar
Digital Library
- E. Meijer. The world according to LINQ. ACM Queue, 9(8):60, 2011. Google Scholar
Digital Library
- E. Meijer and G. Bierman. A co-relational model of data for large shared data banks. Communications of the ACM, 54(4):49--58, 2011. Google Scholar
Digital Library
- K. Nguyen. Language of Combinators for XML: Conception, Typing, Implementation. PhD thesis, Université Paris-Sud 11, 2008.Google Scholar
- Odata.texttthttp://www.odata.org/.Google Scholar
- A. Ohori and P. Buneman. Type Inference in a Database Programming Language. In LISP and Functional Programming, 1988. Google Scholar
Digital Library
- A. Ohori, P. Buneman, and V. Tannen. Database Programming in Machiavelli --a Polymorphic Language with Static Type Inference. In ACM SIGMOD Conf., 1989. Google Scholar
Digital Library
- A. Ohori and K. Ueno. Making standard ML a practical database programming language. In ICFP\,'11, 2011. Google Scholar
Digital Library
- C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In ACM SIGMOD Conf., 2008. Google Scholar
Digital Library
- F. Özcan phet al. Emerging trends in the enterprise data analytics: connecting Hadoop and DB2 warehouse. In ACM SIGMOD Conf., 2011. Google Scholar
Digital Library
- J. Robie (editor). JSONiq. http://jsoniq.org.Google Scholar
- J. Schmidt and M. Mall. Pascal/R Report. Technical Report 66, Fachbereich Informatik, université de Hamburg, 1980.Google Scholar
- Squeryl: A Scala ORM and DSL for talking with Databases with minimum verbosity and maximum type safety.texttthttp://squeryl.org/.Google Scholar
- V. Tannen, P. Buneman, and L. Wong. Naturally embedded query languages. In ICDT, pages 140--154, 1992. Google Scholar
Digital Library
- P. Trinder and P. Wadler. Improving list comprehension database queries. In 4th IEEE Region 10 Conference (TENCON), 1989.Google Scholar
Cross Ref
- Unql.texttthttp://www.unqlspec.org/.Google Scholar
Index Terms
Static and dynamic semantics of NoSQL languages
Recommendations
Static and dynamic semantics of NoSQL languages
POPL '13: Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languagesWe present a calculus for processing semistructured data that spans differences of application area among several novel query languages, broadly categorized as "NoSQL". This calculus lets users define their own operators, capturing a wider range of data ...
Static typing with value space-based subtyping
SAICSIT '11: Proceedings of the South African Institute of Computer Scientists and Information Technologists Conference on Knowledge, Innovation and Leadership in a Diverse, Multidisciplinary EnvironmentNumerous programming and schema languages contain the notion of value types. However, support for value space-based subtyping is spotty. This paper presents a formal type system for atomic value types as an extension of the simply typed lambda calculus ...
An Unboxed Operational Semantics for ML Polymorphism
We present an unboxed operational semantics for an ML-style polymorphic language. Different from the conventional formalisms, the proposed semantics accounts for actual representations of run-time objects of various types, and supports a refined notion of ...







Comments