ABSTRACT
The World Wide Web Consortium (W3C) recently introduced property paths in SPARQL 1.1, a query language for RDF data. Property paths allow SPARQL queries to evaluate regular expressions over graph data. However, they differ from standard regular expressions in several notable aspects. For example, they have a limited form of negation, they have numerical occurrence indicators as syntactic sugar, and their semantics on graphs is defined in a non-standard manner. We formalize the W3C semantics of property paths and investigate various query evaluation problems on graphs. More specifically, let x and y be two nodes in an edge-labeled graph and r be an expression. We study the complexities of (1) deciding whether there exists a path from x to y that matches r and (2) counting how many paths from x to y match r. Our main results show that, compared to an alternative semantics of regular expressions on graphs, the complexity of (1) and (2) under W3C semantics is significantly higher. Whereas the alternative semantics remains in polynomial time for large fragments of expressions, the W3C semantics makes problems (1) and (2) intractable almost immediately.
As a side-result, we prove that the membership problem for regular expressions with numerical occurrence indicators and negation is in polynomial time.
- S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. L. Wiener. The Lorel query language for semistructured data. Int. J. on Digital Libraries, 1(1):68--88, 1997.Google Scholar
Cross Ref
- S. Abiteboul and V. Vianu. Regular path queries with constraints. J. Comput. Syst. Sci., 58(3):428--452, 1999. Google Scholar
Digital Library
- F. Alkhateeb, J.-F. Baget, and J. Euzenat. Extending SPARQL with regular expression patterns (for querying RDF). J. Web Sem., 7(2):57--73, 2009. Google Scholar
Digital Library
- C. Álvarez and B. Jenner. A very hard log-space counting class. Theor. Comput. Sci., 107:3--30, 1993. Google Scholar
Digital Library
- M. Arenas, S. Conca, and J. Pérez. Counting beyond a yottabyte, or how SPARQL 1.1 property paths will prevent the adoption of the standard. In World Wide Web Conference (WWW), 2012. To appear. Google Scholar
Digital Library
- M. Arenas and J. Pérez. Querying semantic web data with SPARQL. In Principles of Database Systems (PODS), p. 305--316, 2011. Google Scholar
Digital Library
- C. Berge. Graphs and Hypergraphs. North-Holland Publishing Company, 1973. Google Scholar
Digital Library
- G. J. Bex, F. Neven, T. Schwentick, and S. Vansummeren. Inference of concise regular expressions and DTDs. ACM Trans. Database Syst., 2010. Google Scholar
Digital Library
- R. Book, S. Even, S. Greibach, and G. Ott. Ambiguity in graphs and expressions. IEEE Trans. Comput., 20:149--153, 1971. Google Scholar
Digital Library
- P. Buneman, S. B. Davidson, G. G. Hillebrand, and D. Suciu. A query language and optimization techniques for unstructured data. In SIGMOD Conference, p. 505--516, 1996. Google Scholar
Digital Library
- D. Calvanese, G. De Giacomo, M. Lenzerini, and M. Y. Vardi. Containment of conjunctive regular path queries with inverse. In Principles of Knowledge Representation and Reasoning (KR), p. 176--185, 2000.Google Scholar
- D. Calvanese, G. De Giacomo, M. Lenzerini, and M. Y. Vardi. View-based query processing for regular path queries with inverse. In Principles of Database Systems (PODS), pages 58--66, 2000. Google Scholar
Digital Library
- D. Calvanese, G. De Giacomo, M. Lenzerini, and M.Y. Vardi. Rewriting of regular expressions and regular path queries. J. Comput. Syst. Sci., 64(3):443--465, 2002.Google Scholar
Digital Library
- D. Colazzo, G. Ghelli, and C. Sartiani. Efficient asymmetric inclusion between regular expression types. In International Conference on Database Theory (ICDT), pages 174--182, 2009. Google Scholar
Digital Library
- D. Colazzo, G. Ghelli, and C. Sartiani. Efficient inclusion for a class of XML types with interleaving and counting. Information Systems, 34(7):643--656, 2009. Google Scholar
Digital Library
- M. P. Consens and A. O. Mendelzon. GraphLog: a visual formalism for real life recursion. In Principles of Database Systems (PODS), p. 404--416, 1990. Google Scholar
Digital Library
- I. F. Cruz, A. O. Mendelzon, and P. T. Wood. A graphical query language supporting recursion. In SIGMOD Conference, p. 323--330, 1987. Google Scholar
Digital Library
- A. Deutsch and V. Tannen. Optimization properties for classes of conjunctive regular path queries. In Database Programming Languages (DBPL), p. 21--39, 2001. Google Scholar
Digital Library
- M. F. Fernández, D. Florescu, A. Y. Levy, and D. Suciu. Declarative specification of web sites with strudel. VLDB J., 9(1):38--55, 2000. Google Scholar
Digital Library
- D. Florescu, A. Y. Levy, and D. Suciu. Query containment for conjunctive queries with regular expressions. In Principles of Database Systems (PODS), p. 139--148, 1998. Google Scholar
Digital Library
- S. Gao, C. M. Sperberg-McQueen, H.S. Thompson, N. Mendelsohn, D. Beech, and M. Maloney. W3C XML Schema Definition Language (XSD) 1.1 part 1: Structures. Tech. report, World Wide Web Consortium, April 2009.Google Scholar
- W. Gelade, M. Gyssens, and W. Martens. Regular expressions with counting: Weak versus strong determinism. SIAM J. Comput., 41(1):160--190, 2012. Google Scholar
Digital Library
- W. Gelade, W. Martens, and F. Neven. Optimizing schema languages for XML: Numerical constraints and interleaving. SIAM J. Comput., 38(5), 2009. Google Scholar
Digital Library
- V. M. Glushkov. The abstract theory of automata. Russian Math. Surveys, 16(5(101)):1--53, 1961.Google Scholar
- S. Harris and A. Seaborne. SPARQL 1.1 query language. Tech. report, World Wide Web Consortium (W3C), January2012.Google Scholar
- J.E. Hopcroft and J.D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 1979. Google Scholar
Digital Library
- S. Kannan, Z. Sweedyk, and S. R. Mahaney. Counting and random generation of strings in regular languages. In Symp.\ on Discrete Algorithms (SODA), p. 551--557, 1995. Google Scholar
Digital Library
- P. Kilpeläinen and R. Tuhkanen. Regular expressions with numerical occurrence indicators -- preliminary results. In Symp. on Prog. Lang. and Software Tools (SPLST), p. 163--173, 2003.Google Scholar
- P. Kilpeläinen and R. Tuhkanen. One-unambiguity of regular expressions with numeric occurrence indicators. Information and Computation, 205(6):890--916, 2007. Google Scholar
Digital Library
- S. C. Kleene. Automata Studies, chapter Representations of events in nerve sets and finite automata, p. 3--42. Princeton Univ. Press, 1956.Google Scholar
- L. Libkin and D. Vrgoc. Regular path queries on graphs with data. In International Conference on Database Theory (ICDT),2012. To appear. Google Scholar
Digital Library
- Y. A. Liu and F. Yu. Solving regular path queries. In Intl. Conf. on Mathematics of Program Construction (MPC), p. 195--208, 2002. Google Scholar
Digital Library
- W. Martens, F. Neven, and T. Schwentick. Complexity of decision problems for simple regular expressions. In Mathematical Foundations of Computer Science (MFCS), p. 889--900, 2004.Google Scholar
Cross Ref
- W. Martens, F. Neven, and T. Schwentick. Complexity of decision problems for XML schemas and chain regular expressions. SIAM J. Comput., 39(4):1486--1530, 2009. Google Scholar
Digital Library
- A. O. Mendelzon and P. T. Wood. Finding regular simple paths in graph databases. SIAM J. Comput., 24(6):1235--1258, 1995. Google Scholar
Digital Library
- J. Pérez, M. Arenas, and C. Gutierrez. Semantics and complexity of SPARQL. ACM Trans. Database Syst., 34(3), 2009. Google Scholar
Digital Library
- J. Pérez, M. Arenas, and C. Gutierrez. nSPARQL: A navigational language for RDF. J. Web Sem., 8(4):255--270, 2010. Google Scholar
Digital Library
- M. Schmidt, M. Meier, and G. Lausen. Foundations of SPARQL query optimization. In International Conference on Database Theory (ICDT), pages 4--33, 2010. Google Scholar
Digital Library
- L. Stockmeyer. The complexity of decision problems in automata theory and logic. PhD thesis, Massachusetts Institute of Technology, 1974.Google Scholar
- L. G. Valiant. The complexity of enumeration and reliability problems. SIAM J. Comput., 8(3):410--421, 1979.Google Scholar
Digital Library
- M. Yannakakis. Graph-theoretic methods in database theory. In Principles of Database Systems (PODS), p. 230--242, 1990. Google Scholar
Digital Library
Index Terms
The complexity of evaluating path expressions in SPARQL
Recommendations
The complexity of regular expressions and property paths in SPARQL
Invited papers issueThe World Wide Web Consortium (W3C) recently introduced property paths in SPARQL 1.1, a query language for RDF data. Property paths allow SPARQL queries to evaluate regular expressions over graph-structured data. However, they differ from standard ...
Processing SPARQL queries with regular expressions in RDF databases
DTMBIO '10: Proceedings of the ACM fourth international workshop on Data and text mining in biomedical informaticsAs the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL -- a W3C recommendation query ...
From regular expressions to smaller NFAs
Several methods have been developed to construct @l-free automata that represent a regular expression. Among the most widely known are the position automaton (Glushkov), the partial derivatives automaton (Antimirov) and the follow automaton (Ilie and Yu)...






Comments