skip to main content
research-article

Canonical Forms for Isomorphic and Equivalent RDF Graphs: Algorithms for Leaning and Labelling Blank Nodes

Published:25 July 2017Publication History
Skip Abstract Section

Abstract

Existential blank nodes greatly complicate a number of fundamental operations on Resource Description Framework (RDF) graphs. In particular, the problems of determining if two RDF graphs have the same structure modulo blank node labels (i.e., if they are isomorphic), or determining if two RDF graphs have the same meaning under simple semantics (i.e., if they are simple-equivalent), have no known polynomial-time algorithms. In this article, we propose methods that can produce two canonical forms of an RDF graph. The first canonical form preserves isomorphism such that any two isomorphic RDF graphs will produce the same canonical form; this iso-canonical form is produced by modifying the well-known canonical labelling algorithm Nauty for application to RDF graphs. The second canonical form additionally preserves simple-equivalence such that any two simple-equivalent RDF graphs will produce the same canonical form; this equi-canonical form is produced by, in a preliminary step, leaning the RDF graph, and then computing the iso-canonical form. These algorithms have a number of practical applications, such as for identifying isomorphic or equivalent RDF graphs in a large collection without requiring pairwise comparison, for computing checksums or signing RDF graphs, for applying consistent Skolemisation schemes where blank nodes are mapped in a canonical manner to Internationalised Resource Identifiers (IRIs), and so forth. Likewise a variety of algorithms can be simplified by presupposing RDF graphs in one of these canonical forms. Both algorithms require exponential steps in the worst case; in our evaluation we demonstrate that there indeed exist difficult synthetic cases, but we also provide results over 9.9 million RDF graphs that suggest such cases occur infrequently in the real world, and that both canonical forms can be efficiently computed in all but a handful of such cases.

References

  1. Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan L. Reutter, and Domagoj Vrgoc. 2016. Foundations of modern graph query languages. CoRR abs/1610.06264 (2016), 1--50. http://arxiv.org/abs/1610.06264Google ScholarGoogle Scholar
  2. Jesús Arias-Fisteus, Norberto Fernández García, Luis Sánchez Fernández, and Carlos Delgado Kloos. 2010. Hashing and canonicalizing Notation 3 graphs. J. Comput. Syst. Sci. 76, 7 (2010), 663--685. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. László Babai. 2015. Graph isomorphism in quasipolynomial time. CoRR abs/1512.03547 (2015), 1--89. http://arxiv.org/abs/1512.03547Google ScholarGoogle Scholar
  4. László Babai, Paul Erdös, and Stanley M. Selkow. 1980. Random graph isomorphism. SIAM J. Comput. 9, 3 (1980), 628--635. Google ScholarGoogle ScholarCross RefCross Ref
  5. David Beckett, Tim Berners-Lee, Eric Prud’hommeaux, and Gavin Carothers. 2014. RDF 1.1 Turtle -- Terse RDF Triple Language. W3C Recommendation. Retrieved from http://www.w3.org/TR/turtle/.Google ScholarGoogle Scholar
  6. David Booth. 2012. Well Behaved RDF: A Straw-Man Proposal for Taming Blank Nodes. Retrieved from http://dbooth.org/2013/well-behaved-rdf/Booth-well-behaved-rdf. pdf.Google ScholarGoogle Scholar
  7. Jin-yi Cai, Martin Fürer, and Neil Immerman. 1992. An optimal lower bound on the number of variables for graph identifications. Combinatorica 12, 4 (1992), 389--410. Google ScholarGoogle ScholarCross RefCross Ref
  8. Gavin Carothers. 2014. RDF 1.1 N-Quads. W3C Recommendation. Retrieved from http://www.w3.org/TR/n-quads/.Google ScholarGoogle Scholar
  9. Jeremy J. Carroll. 2003. Signing RDF graphs. In International Semantic Web Conference. 369--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yodsawalai Chodpathumwan, Amirhossein Aleyasen, Arash Termehchy, and Yizhou Sun. 2016. Towards representation independent similarity search over graph databases. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM’16). ACM, 2233--2238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Richard Cyganiak, David Wood, and Markus Lanthaler. 2014. RDF 1.1 Concepts and Abstract Syntax. W3C Recommendation. Retrieved from http://www.w3.org/TR/rdf11-concepts/.Google ScholarGoogle Scholar
  12. Ronald Fagin, Phokion G. Kolaitis, and Lucian Popa. 2005. Data exchange: Getting to the core. TODS 30, 1 (2005), 174--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Wenfei Fan and Philip Bohannon. 2008. Information preserving XML schema embedding. ACM Trans. Database Syst. 33, 1 (2008), 4:1--4:44.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Mark Giereth. 2005. On partial encryption of RDF-graphs. In The Semantic Web - ISWC 2005, Proceedings of the 4th International Semantic Web Conference (ISWC’05). Springer, 308--322.Google ScholarGoogle Scholar
  15. Georg Gottlob. 2005. Computing cores for data exchange: New algorithms and practical solutions. In ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM, 148--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Georg Gottlob and Alan Nash. 2008. Efficient core computation in data exchange. J. ACM 55, 2 (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Bernardo Cuenca Grau, Boris Motik, Zhe Wu, Achille Fokoue, and Carsten Lutz. 2009. OWL 2 Web Ontology Language: Profiles. W3C Recommendation. Retrieved from http://www.w3.org/TR/owl2-profiles/.Google ScholarGoogle Scholar
  18. Claudio Gutierrez, Carlos A. Hurtado, Alberto O. Mendelzon, and Jorge Pérez. 2011. Foundations of semantic web databases. J. Comput. Syst. Sci. 77, 3 (2011), 520--541. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Steve Harris, Andy Seaborne, and Eric Prud’hommeaux. 2013. SPARQL 1.1 Query Language. W3C Recommendation. Retrieved from http://www.w3.org/TR/sparql11-query/.Google ScholarGoogle Scholar
  20. Patrick Hayes. 2004. RDF Semantics. W3C Recommendation. Retrieved from http://www.w3.org/TR/2004/REC-rdf-mt-20040210/.Google ScholarGoogle Scholar
  21. Patrick Hayes and Peter F. Patel-Schneider. 2014. RDF 1.1 Semantics. W3C Recommendation. Retrieved from http://www.w3.org/TR/2014/REC-rdf11-mt-20140225/.Google ScholarGoogle Scholar
  22. Tom Heath and Christian Bizer. 2011. Linked Data: Evolving the Web into a Global Data Space. Vol. 1, Issue 1. Morgan 8 Claypool. 1--136 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Pavol Hell and Jaroslav Nes̆etr̆il. 1992. The core of a graph. Discr. Math. 109, 1--3 (1992), 127--126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ivan Herman, Ben Adida, Manu Sporny, and Mark Birbeck. 2013. RDFa 1.1 Primer -- Second Edition -- Rich Structured Data Markup for Web Documents. W3C Working Group Note. (22 Aug. 2013). http://www.w3.org/TR/rdfa-primer/.Google ScholarGoogle Scholar
  25. Daniel Hernández, Aidan Hogan, and Markus Krötzsch. 2015. Reifying RDF: What works well with wikidata?. In Proceedings of the 11th International Workshop on Scalable Semantic Web Knowledge Base Systems co-located with the 14th International Semantic Web Conference (ISWC’15) (CEUR Workshop Proceedings), Vol. 1457. 32--47. http://ceur-ws.org/Vol-1457/SSWS2015_paper3.pdf.Google ScholarGoogle Scholar
  26. Edzard Höfig and Ina Schieferdecker. 2014. Hashing of RDF graphs and a solution to the blank node problem. In Proceedings of the 10th International Workshop on Uncertainty Reasoning for the Semantic Web (URSW’14) co-located with the 13th International Semantic Web Conference (ISWC’14) (CEUR Workshop Proceedings), Vol. 1259. 55--66. http://ceur-ws.org/Vol-1259/method2014_submission_1.pdf.Google ScholarGoogle Scholar
  27. Aidan Hogan. 2015. Skolemising blank nodes while preserving isomorphism. In International Conference on World Wide Web (WWW’15). 430--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Aidan Hogan, Marcelo Arenas, Alejandro Mallea, and Axel Polleres. 2014. Everything you always wanted to know about blank nodes. J. Web Sem. 27 (2014), 42--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Aidan Hogan, Jürgen Umbrich, Andreas Harth, Richard Cyganiak, Axel Polleres, and Stefan Decker. 2012. An empirical survey of Linked Data conformance. J. Web Sem. 14 (2012), 14--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Tommi A. Junttila and Petteri Kaski. 2007. Engineering an efficient canonical labeling tool for large and sparse graphs. In Workshop on Algorithm Engineering and Experiments (ALENEX). Google ScholarGoogle ScholarCross RefCross Ref
  31. Tobias Käfer and Andreas Harth. 2014. Billion Triples Challenge data set. Retrieved from http://km.aifb.kit.edu/projects/btc-2014/.Google ScholarGoogle Scholar
  32. Andreas Kasten, Ansgar Scherp, and Peter Schauß. 2014. A framework for iterative signing of graph data on the web. In The Semantic Web: Trends and Challenges, Proceedings of the 11th International Conference (ESWC’14). Springer, 146--160. Google ScholarGoogle ScholarCross RefCross Ref
  33. Tobias Kuhn and Michel Dumontier. 2014. Trusty URIs: Verifiable, immutable, and permanent digital artifacts for linked data. In ESWC. 395--410.Google ScholarGoogle Scholar
  34. Christina Lantzaki, Panagiotis Papadakos, Anastasia Analyti, and Yannis Tzitzikas. 2017. Radius-aware approximate blank node matching using signatures. Knowl. Inf. Syst. 50, 2 (2017), 505--542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ora Lassila and Ralph R. Swick. 1999. Resource Description Framework (RDF) Model and Syntax Specification. W3C Recommendation. Retrieved from http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/.Google ScholarGoogle Scholar
  36. Andrés Letelier, Jorge Pérez, Reinhard Pichler, and Sebastian Skritek. 2013. Static analysis and optimization of semantic web queries. ACM Trans. Database Syst. 38, 4 (2013), 25:1--25:45.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Alejandro Mallea, Marcelo Arenas, Aidan Hogan, and Axel Polleres. 2011. On blank nodes. In International Semantic Web Conference. 421--437. Google ScholarGoogle ScholarCross RefCross Ref
  38. Bruno Marnette, Giansalvatore Mecca, and Paolo Papotti. 2010. Scalable data exchange with functional dependencies. PVLDB 3, 1 (2010), 105--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Brian McBride. 2002. Jena: A semantic web toolkit. IEEE Internet Computing 6, 6 (2002), 55--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Brendan McKay. 1980. Practical graph isomorphism. In Congressum Numerantium, Vol. 30. 45--87.Google ScholarGoogle Scholar
  41. Brendan D. McKay and Adolfo Piperno. 2014. Practical graph isomorphism, II. J. Symb. Comput. 60 (2014), 94--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Giansalvatore Mecca, Paolo Papotti, and Salvatore Raunich. 2012. Core schema mappings: Scalable core computations in data exchange. Inf. Syst. 37, 7 (2012), 677--711. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Robert Meusel, Petar Petrovski, and Christian Bizer. 2014. The webdatacommons microdata, RDFa and microformat dataset series. In International Semantic Web Conference (ISWC’14). 277--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Takunari Miyazaki. 1997. The complexity of McKay’s canonical labeling algorithm. In Groups and Computation, II. 239--256. Google ScholarGoogle ScholarCross RefCross Ref
  45. Reinhard Pichler, Axel Polleres, Sebastian Skritek, and Stefan Woltran. 2013. Complexity of redundancy detection on RDF graphs in the presence of rules, constraints, and queries. Semantic Web 4, 4 (2013), 351--393.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Reinhard Pichler, Axel Polleres, Fang Wei, and Stefan Woltran. 2008. dRDF: Entailment for domain-restricted RDF. In ESWC. 200--214.Google ScholarGoogle Scholar
  47. Reinhard Pichler and Vadim Savenkov. 2010. Towards practical feasibility of core computation in data exchange. Theor. Comput. Sci. 411, 7--9 (2010), 935--957.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Adolfo Piperno. 2008. Search space contraction in canonical labeling of graphs (preliminary version). CoRR abs/0804.4881 (2008). http://arxiv.org/abs/0804.4881Google ScholarGoogle Scholar
  49. Vadim Savenkov. 2013. Algorithms for core computation in data exchange. In Data Exchange, Integration, and Streams. Dagstuhl Follow-Ups, Vol. 5. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 38--68.Google ScholarGoogle Scholar
  50. Craig Sayers and Alan H. Karp. 2004. Computing the Digest of an RDF Graph. HP Technical Report. http://www.hpl.hp.com/techreports/2003/HPL-2003-235R1.pdf.Google ScholarGoogle Scholar
  51. Max Schmachtenberg, Christian Bizer, and Heiko Paulheim. 2014. Adoption of the linked data best practices in different topical domains. In International Semantic Web Conference (ISWC’14). 245--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Stephen B. Seidman. 1983. Network structure and minimum degree. Soc. Netw. 5 (1983), 269--287. Google ScholarGoogle ScholarCross RefCross Ref
  53. Greg Daniel Tener. 2009. Attacks on Difficult Instances of Graph Isomorphism: Sequential and Parallel Algorithms. Ph.D. dissertation. University of Central Florida, Orlando, FL.Google ScholarGoogle Scholar
  54. Arash Termehchy, Marianne Winslett, Yodsawalai Chodpathumwan, and Austin Gibbons. 2012. Design independent query interfaces. IEEE Trans. Knowl. Data Eng. 24, 10 (2012), 1819--1832. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Giovanni Tummarello, Christian Morbidoni, Paolo Puliti, and Francesco Piazza. 2005. Signing individual fragments of an RDF graph. In Proceedings of the 14th International Conference on World Wide Web (WWW’05) -- Special interest tracks and posters. ACM, 1020--1021. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Denny Vrandecic and Markus Krötzsch. 2014. Wikidata: A free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78--85. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Canonical Forms for Isomorphic and Equivalent RDF Graphs: Algorithms for Leaning and Labelling Blank Nodes

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!