Abstract
Existential blank nodes greatly complicate a number of fundamental operations on Resource Description Framework (RDF) graphs. In particular, the problems of determining if two RDF graphs have the same structure modulo blank node labels (i.e., if they are isomorphic), or determining if two RDF graphs have the same meaning under simple semantics (i.e., if they are simple-equivalent), have no known polynomial-time algorithms. In this article, we propose methods that can produce two canonical forms of an RDF graph. The first canonical form preserves isomorphism such that any two isomorphic RDF graphs will produce the same canonical form; this iso-canonical form is produced by modifying the well-known canonical labelling algorithm Nauty for application to RDF graphs. The second canonical form additionally preserves simple-equivalence such that any two simple-equivalent RDF graphs will produce the same canonical form; this equi-canonical form is produced by, in a preliminary step, leaning the RDF graph, and then computing the iso-canonical form. These algorithms have a number of practical applications, such as for identifying isomorphic or equivalent RDF graphs in a large collection without requiring pairwise comparison, for computing checksums or signing RDF graphs, for applying consistent Skolemisation schemes where blank nodes are mapped in a canonical manner to Internationalised Resource Identifiers (IRIs), and so forth. Likewise a variety of algorithms can be simplified by presupposing RDF graphs in one of these canonical forms. Both algorithms require exponential steps in the worst case; in our evaluation we demonstrate that there indeed exist difficult synthetic cases, but we also provide results over 9.9 million RDF graphs that suggest such cases occur infrequently in the real world, and that both canonical forms can be efficiently computed in all but a handful of such cases.
- Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan L. Reutter, and Domagoj Vrgoc. 2016. Foundations of modern graph query languages. CoRR abs/1610.06264 (2016), 1--50. http://arxiv.org/abs/1610.06264Google Scholar
- Jesús Arias-Fisteus, Norberto Fernández García, Luis Sánchez Fernández, and Carlos Delgado Kloos. 2010. Hashing and canonicalizing Notation 3 graphs. J. Comput. Syst. Sci. 76, 7 (2010), 663--685. Google Scholar
Digital Library
- László Babai. 2015. Graph isomorphism in quasipolynomial time. CoRR abs/1512.03547 (2015), 1--89. http://arxiv.org/abs/1512.03547Google Scholar
- László Babai, Paul Erdös, and Stanley M. Selkow. 1980. Random graph isomorphism. SIAM J. Comput. 9, 3 (1980), 628--635. Google Scholar
Cross Ref
- David Beckett, Tim Berners-Lee, Eric Prud’hommeaux, and Gavin Carothers. 2014. RDF 1.1 Turtle -- Terse RDF Triple Language. W3C Recommendation. Retrieved from http://www.w3.org/TR/turtle/.Google Scholar
- David Booth. 2012. Well Behaved RDF: A Straw-Man Proposal for Taming Blank Nodes. Retrieved from http://dbooth.org/2013/well-behaved-rdf/Booth-well-behaved-rdf. pdf.Google Scholar
- Jin-yi Cai, Martin Fürer, and Neil Immerman. 1992. An optimal lower bound on the number of variables for graph identifications. Combinatorica 12, 4 (1992), 389--410. Google Scholar
Cross Ref
- Gavin Carothers. 2014. RDF 1.1 N-Quads. W3C Recommendation. Retrieved from http://www.w3.org/TR/n-quads/.Google Scholar
- Jeremy J. Carroll. 2003. Signing RDF graphs. In International Semantic Web Conference. 369--384. Google Scholar
Digital Library
- Yodsawalai Chodpathumwan, Amirhossein Aleyasen, Arash Termehchy, and Yizhou Sun. 2016. Towards representation independent similarity search over graph databases. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM’16). ACM, 2233--2238. Google Scholar
Digital Library
- Richard Cyganiak, David Wood, and Markus Lanthaler. 2014. RDF 1.1 Concepts and Abstract Syntax. W3C Recommendation. Retrieved from http://www.w3.org/TR/rdf11-concepts/.Google Scholar
- Ronald Fagin, Phokion G. Kolaitis, and Lucian Popa. 2005. Data exchange: Getting to the core. TODS 30, 1 (2005), 174--210. Google Scholar
Digital Library
- Wenfei Fan and Philip Bohannon. 2008. Information preserving XML schema embedding. ACM Trans. Database Syst. 33, 1 (2008), 4:1--4:44.Google Scholar
Digital Library
- Mark Giereth. 2005. On partial encryption of RDF-graphs. In The Semantic Web - ISWC 2005, Proceedings of the 4th International Semantic Web Conference (ISWC’05). Springer, 308--322.Google Scholar
- Georg Gottlob. 2005. Computing cores for data exchange: New algorithms and practical solutions. In ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM, 148--159. Google Scholar
Digital Library
- Georg Gottlob and Alan Nash. 2008. Efficient core computation in data exchange. J. ACM 55, 2 (2008). Google Scholar
Digital Library
- Bernardo Cuenca Grau, Boris Motik, Zhe Wu, Achille Fokoue, and Carsten Lutz. 2009. OWL 2 Web Ontology Language: Profiles. W3C Recommendation. Retrieved from http://www.w3.org/TR/owl2-profiles/.Google Scholar
- Claudio Gutierrez, Carlos A. Hurtado, Alberto O. Mendelzon, and Jorge Pérez. 2011. Foundations of semantic web databases. J. Comput. Syst. Sci. 77, 3 (2011), 520--541. Google Scholar
Digital Library
- Steve Harris, Andy Seaborne, and Eric Prud’hommeaux. 2013. SPARQL 1.1 Query Language. W3C Recommendation. Retrieved from http://www.w3.org/TR/sparql11-query/.Google Scholar
- Patrick Hayes. 2004. RDF Semantics. W3C Recommendation. Retrieved from http://www.w3.org/TR/2004/REC-rdf-mt-20040210/.Google Scholar
- Patrick Hayes and Peter F. Patel-Schneider. 2014. RDF 1.1 Semantics. W3C Recommendation. Retrieved from http://www.w3.org/TR/2014/REC-rdf11-mt-20140225/.Google Scholar
- Tom Heath and Christian Bizer. 2011. Linked Data: Evolving the Web into a Global Data Space. Vol. 1, Issue 1. Morgan 8 Claypool. 1--136 pages.Google Scholar
Digital Library
- Pavol Hell and Jaroslav Nes̆etr̆il. 1992. The core of a graph. Discr. Math. 109, 1--3 (1992), 127--126.Google Scholar
Digital Library
- Ivan Herman, Ben Adida, Manu Sporny, and Mark Birbeck. 2013. RDFa 1.1 Primer -- Second Edition -- Rich Structured Data Markup for Web Documents. W3C Working Group Note. (22 Aug. 2013). http://www.w3.org/TR/rdfa-primer/.Google Scholar
- Daniel Hernández, Aidan Hogan, and Markus Krötzsch. 2015. Reifying RDF: What works well with wikidata?. In Proceedings of the 11th International Workshop on Scalable Semantic Web Knowledge Base Systems co-located with the 14th International Semantic Web Conference (ISWC’15) (CEUR Workshop Proceedings), Vol. 1457. 32--47. http://ceur-ws.org/Vol-1457/SSWS2015_paper3.pdf.Google Scholar
- Edzard Höfig and Ina Schieferdecker. 2014. Hashing of RDF graphs and a solution to the blank node problem. In Proceedings of the 10th International Workshop on Uncertainty Reasoning for the Semantic Web (URSW’14) co-located with the 13th International Semantic Web Conference (ISWC’14) (CEUR Workshop Proceedings), Vol. 1259. 55--66. http://ceur-ws.org/Vol-1259/method2014_submission_1.pdf.Google Scholar
- Aidan Hogan. 2015. Skolemising blank nodes while preserving isomorphism. In International Conference on World Wide Web (WWW’15). 430--440. Google Scholar
Digital Library
- Aidan Hogan, Marcelo Arenas, Alejandro Mallea, and Axel Polleres. 2014. Everything you always wanted to know about blank nodes. J. Web Sem. 27 (2014), 42--69. Google Scholar
Digital Library
- Aidan Hogan, Jürgen Umbrich, Andreas Harth, Richard Cyganiak, Axel Polleres, and Stefan Decker. 2012. An empirical survey of Linked Data conformance. J. Web Sem. 14 (2012), 14--44. Google Scholar
Digital Library
- Tommi A. Junttila and Petteri Kaski. 2007. Engineering an efficient canonical labeling tool for large and sparse graphs. In Workshop on Algorithm Engineering and Experiments (ALENEX). Google Scholar
Cross Ref
- Tobias Käfer and Andreas Harth. 2014. Billion Triples Challenge data set. Retrieved from http://km.aifb.kit.edu/projects/btc-2014/.Google Scholar
- Andreas Kasten, Ansgar Scherp, and Peter Schauß. 2014. A framework for iterative signing of graph data on the web. In The Semantic Web: Trends and Challenges, Proceedings of the 11th International Conference (ESWC’14). Springer, 146--160. Google Scholar
Cross Ref
- Tobias Kuhn and Michel Dumontier. 2014. Trusty URIs: Verifiable, immutable, and permanent digital artifacts for linked data. In ESWC. 395--410.Google Scholar
- Christina Lantzaki, Panagiotis Papadakos, Anastasia Analyti, and Yannis Tzitzikas. 2017. Radius-aware approximate blank node matching using signatures. Knowl. Inf. Syst. 50, 2 (2017), 505--542. Google Scholar
Digital Library
- Ora Lassila and Ralph R. Swick. 1999. Resource Description Framework (RDF) Model and Syntax Specification. W3C Recommendation. Retrieved from http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/.Google Scholar
- Andrés Letelier, Jorge Pérez, Reinhard Pichler, and Sebastian Skritek. 2013. Static analysis and optimization of semantic web queries. ACM Trans. Database Syst. 38, 4 (2013), 25:1--25:45.Google Scholar
Digital Library
- Alejandro Mallea, Marcelo Arenas, Aidan Hogan, and Axel Polleres. 2011. On blank nodes. In International Semantic Web Conference. 421--437. Google Scholar
Cross Ref
- Bruno Marnette, Giansalvatore Mecca, and Paolo Papotti. 2010. Scalable data exchange with functional dependencies. PVLDB 3, 1 (2010), 105--116. Google Scholar
Digital Library
- Brian McBride. 2002. Jena: A semantic web toolkit. IEEE Internet Computing 6, 6 (2002), 55--59. Google Scholar
Digital Library
- Brendan McKay. 1980. Practical graph isomorphism. In Congressum Numerantium, Vol. 30. 45--87.Google Scholar
- Brendan D. McKay and Adolfo Piperno. 2014. Practical graph isomorphism, II. J. Symb. Comput. 60 (2014), 94--112. Google Scholar
Digital Library
- Giansalvatore Mecca, Paolo Papotti, and Salvatore Raunich. 2012. Core schema mappings: Scalable core computations in data exchange. Inf. Syst. 37, 7 (2012), 677--711. Google Scholar
Digital Library
- Robert Meusel, Petar Petrovski, and Christian Bizer. 2014. The webdatacommons microdata, RDFa and microformat dataset series. In International Semantic Web Conference (ISWC’14). 277--292. Google Scholar
Digital Library
- Takunari Miyazaki. 1997. The complexity of McKay’s canonical labeling algorithm. In Groups and Computation, II. 239--256. Google Scholar
Cross Ref
- Reinhard Pichler, Axel Polleres, Sebastian Skritek, and Stefan Woltran. 2013. Complexity of redundancy detection on RDF graphs in the presence of rules, constraints, and queries. Semantic Web 4, 4 (2013), 351--393.Google Scholar
Digital Library
- Reinhard Pichler, Axel Polleres, Fang Wei, and Stefan Woltran. 2008. dRDF: Entailment for domain-restricted RDF. In ESWC. 200--214.Google Scholar
- Reinhard Pichler and Vadim Savenkov. 2010. Towards practical feasibility of core computation in data exchange. Theor. Comput. Sci. 411, 7--9 (2010), 935--957.Google Scholar
Digital Library
- Adolfo Piperno. 2008. Search space contraction in canonical labeling of graphs (preliminary version). CoRR abs/0804.4881 (2008). http://arxiv.org/abs/0804.4881Google Scholar
- Vadim Savenkov. 2013. Algorithms for core computation in data exchange. In Data Exchange, Integration, and Streams. Dagstuhl Follow-Ups, Vol. 5. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 38--68.Google Scholar
- Craig Sayers and Alan H. Karp. 2004. Computing the Digest of an RDF Graph. HP Technical Report. http://www.hpl.hp.com/techreports/2003/HPL-2003-235R1.pdf.Google Scholar
- Max Schmachtenberg, Christian Bizer, and Heiko Paulheim. 2014. Adoption of the linked data best practices in different topical domains. In International Semantic Web Conference (ISWC’14). 245--260. Google Scholar
Digital Library
- Stephen B. Seidman. 1983. Network structure and minimum degree. Soc. Netw. 5 (1983), 269--287. Google Scholar
Cross Ref
- Greg Daniel Tener. 2009. Attacks on Difficult Instances of Graph Isomorphism: Sequential and Parallel Algorithms. Ph.D. dissertation. University of Central Florida, Orlando, FL.Google Scholar
- Arash Termehchy, Marianne Winslett, Yodsawalai Chodpathumwan, and Austin Gibbons. 2012. Design independent query interfaces. IEEE Trans. Knowl. Data Eng. 24, 10 (2012), 1819--1832. Google Scholar
Digital Library
- Giovanni Tummarello, Christian Morbidoni, Paolo Puliti, and Francesco Piazza. 2005. Signing individual fragments of an RDF graph. In Proceedings of the 14th International Conference on World Wide Web (WWW’05) -- Special interest tracks and posters. ACM, 1020--1021. Google Scholar
Digital Library
- Denny Vrandecic and Markus Krötzsch. 2014. Wikidata: A free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78--85. Google Scholar
Digital Library
Index Terms
Canonical Forms for Isomorphic and Equivalent RDF Graphs: Algorithms for Leaning and Labelling Blank Nodes
Recommendations
Extended RDF: Computability and complexity issues
ERDF stable model semantics is a recently proposed semantics for ERDF ontologies and a faithful extension of RDFS semantics on RDF graphs. In this paper, we elaborate on the computability and complexity issues of the ERDF stable model semantics. Based ...
Compacting frequent star patterns in RDF graphs
AbstractKnowledge graphs have become a popular formalism for representing entities and their properties using a graph data model, e.g., the Resource Description Framework (RDF). An RDF graph comprises entities of the same type connected to objects or ...
RDF Graph Visualization Tools: a Survey
FRUCT'23: Proceedings of the 23rd Conference of Open Innovations Association FRUCTSemantic Web technologies are increasingly being used for the development of Future Internet applications, mainly due to the impressive growth of the Internet of Things research area. This spread pushes for effective and ef?cient ways to visualize the ...






Comments