skip to main content
research-article

Knowledge Graph Construction with a Façade: A Unified Method to Access Heterogeneous Data Sources on the Web

Published:23 February 2023Publication History
Skip Abstract Section

Abstract

Data integration is the dominant use case for RDF Knowledge Graphs. However, Web resources come in formats with weak semantics (for example, CSV and JSON), or formats specific to a given application (for example, BibTex, HTML, and Markdown). To solve this problem, Knowledge Graph Construction (KGC) is gaining momentum due to its focus on supporting users in transforming data into RDF. However, using existing KGC frameworks result in complex data processing pipelines, which mix structural and semantic mappings, whose development and maintenance constitute a significant bottleneck for KG engineers. Such frameworks force users to rely on different tools, sometimes based on heterogeneous languages, for inspecting sources, designing mappings, and generating triples, thus making the process unnecessarily complicated. We argue that it is possible and desirable to equip KG engineers with the ability of interacting with Web data formats by relying on their expertise in RDF and the well-established SPARQL query language [2].

In this article, we study a unified method for data access to heterogeneous data sources with Facade-X, a meta-model implemented in a new data integration system called SPARQL Anything. We demonstrate that our approach is theoretically sound, since it allows a single meta-model, based on RDF, to represent data from (a) any file format expressible in BNF syntax, as well as (b) any relational database. We compare our method to state-of-the-art approaches in terms of usability (cognitive complexity of the mappings) and general performance. Finally, we discuss the benefits and challenges of this novel approach by engaging with the reference user community.

Skip Supplemental Material Section

Supplemental Material

REFERENCES

  1. [1] Souripriya Das, Seema Sundara, and Richard Cyganiak. 2012. R2RML: RDB to RDF Mapping Language. W3C Recommendation. W3C. Retrieved from https://www.w3.org/TR/r2rml/.Google ScholarGoogle Scholar
  2. [2] The W3C SPARQL Working Group (Eds.). 2013. SPARQL 1.1 Overview. W3C Recommendation. W3C. Retrieved from https://www.w3.org/TR/2013/REC-sparql11-overview-20130321/.Google ScholarGoogle Scholar
  3. [3] Enrico Daga, Luigi Asprino, and Justin Dowdy. 2022. SPARQL-Anything/sparql.anything. GitHub? DOI:Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Arenas-Guerrero Julián, Scrocca Mario, Iglesias-Molina Ana, Toledo Jhon, Pozo-Gilo Luis, Dona Daniel, Corcho Oscar, and Chaves-Fraga David. 2021. Knowledge graph construction: An ETL system-based overview. In Proceedings of the Knowledge Graph Construction Workshop (ESWC’21).Google ScholarGoogle Scholar
  5. [5] Atkin Michael, Deely Thomas, and Scharffe François. 2021. Knowledge Graph Benchmarking Report 2021. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Baker Collin F., Fillmore Charles J., and Lowe John B.. 1998. The Berkeley FrameNet project. In Proceedings of the International Conference on Computational Linguistics. 8690. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Banarescu Laura, Bonial Claire, Cai Shu, Georgescu Madalina, Griffitt Kira, Hermjakob Ulf, Knight Kevin, Koehn Philipp, Palmer Martha, and Schneider Nathan. 2012. Semantic parsing on Freebase from question-answer pairs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 15331544.Google ScholarGoogle Scholar
  8. [8] Bereta Konstantina, Papadakis George, and Koubarakis Manolis. 2020. OBDA for the web: Creating virtual RDF graphs on top of web data sources. arXiv preprint arXiv:2005.11264 (2020).Google ScholarGoogle Scholar
  9. [9] Botoeva Elena, Calvanese Diego, Cogrel Benjamin, Corman Julien, and Xiao Guohui. 2019. Ontology-based data access—Beyond relational sources. Intelligenza Artificiale 13, 1 (2019), 2136. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Bruni Luis Emilio, Daga Enrico, Damiano Rossana, Diaz Lily, Kuflik Tsvi, Lieto Antonio, Gangemi Aldo, Mulholland Paul, Peroni Silvio, Pescarin Sofia, and Wecker Alan. 2020. Towards advanced interfaces for citizen curation. (Sept.2020). Retrieved from http://oro.open.ac.uk/72524/.Google ScholarGoogle Scholar
  11. [11] Calvanese Diego, Cogrel Benjamin, Komla-Ebri Sarah, Kontchakov Roman, Lanti Davide, Rezk Martin, Rodriguez-Muro Mariano, and Xiao Guohui. 2017. Ontop: Answering SPARQL queries over relational databases. Semant. Web 8, 3 (2017), 471487. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Calvanese Diego, Giacomo Giuseppe De, Lembo Domenico, Lenzerini Maurizio, and Rosati Riccardo. 2007. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. Autom. Reason. 39, 3 (2007), 385429. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Calvanese Diego, Giacomo Giuseppe De, Lenzerini Maurizio, and Vardi Moshe Y.. 2012. Query processing under GLAV mappings for relational and graph databases. Proc. VLDB Endow. 6, 2 (2012), 6172. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Chang Kerry Shih-Ping and Myers Brad A.. 2016. Using and exploring hierarchical data in spreadsheets. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 24972507.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Chaves-Fraga David, Ruckhaus Edna, Priyatna Freddy, Vidal Maria-Esther, and Corcho Oscar. 2021. Enhancing virtual ontology based access over tabular data with Morph-CSV. Semant. Web 12, 6 (2021), 869902.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Chen Xinlei, Fang Hao, Lin Tsung-Yi, Vedantam Ramakrishna, Gupta Saurabh, Dollár Piotr, and Zitnick C. Lawrence. 2015. Microsoft COCO captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325 (2015).Google ScholarGoogle Scholar
  17. [17] Chiatti Agnese, Motta Enrico, and Daga Enrico. 2020. Towards a framework for visual intelligence in service robotics: Epistemic requirements and gap analysis. In Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning. Retrieved from http://oro.open.ac.uk/72318/.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Cyganiak Richard, Wood David, and Lanthaler Markus. 2014. RDF 1.1 Concepts and Abstract Syntax. W3C Recommendation. W3C. Retrieved from https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/.Google ScholarGoogle Scholar
  19. [19] Daga Enrico, Asprino Luigi, Damiano Rossana, Daquino Marilena, Agudo Belen Diaz, Gangemi Aldo, Kuflik Tsvi, Lieto Antonio, Maguire Mark, Marras Anna Maria et al. 2022. Integrating citizen experiences in cultural heritage archives: Requirements, state-of-the-art, and challenges. ACM J. Comput. Cult. Herit. 15, 1 (2022), 135.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Daga Enrico, Asprino Luigi, Mulholland Paul, and Gangemi Aldo. 2021. Facade-X: An opinionated approach to SPARQL anything. In Further with Knowledge Graphs, Vol. 53. IOS Press, 5873. Retrieved from http://oro.open.ac.uk/78973/.Google ScholarGoogle Scholar
  21. [21] Daga Enrico, d’Aquin Mathieu, Adamou Alessandro, and Brown Stuart. 2016. The open university linked data–data.open.ac.uk. Semant. Web 7, 2 (2016), 183191.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Daga Enrico, Meroño-Peñuela Albert, and Motta Enrico. 2019. Modelling and querying lists in RDF. A pragmatic study. In Proceedings of the 3rd Workshop on Querying and Benchmarking the Web of Data co-located with 18th International Semantic Web Conference (ISWC’19). CEUR-WS.org.Google ScholarGoogle Scholar
  23. [23] Daga Enrico, Meroño-Peñuela Albert, and Motta Enrico. 2021. Sequential linked data: The state of affairs. Semant. Web (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Daga Enrico, Panziera Luca, and Pedrinaci Carlos. 2015. A BASILar approach for building web APIs on top of SPARQL endpoints. In Proceedings of the Workshop on Services and Applications over Linked APIs and Data co-located with the Extended Semantic Web Conference ([email protected]). 2232.Google ScholarGoogle Scholar
  25. [25] Daquino Marilena, Daga Enrico, d’Aquin Mathieu, Gangemi Aldo, Holland Simon, Laney Robin, Penuela Albert Merono, and Mulholland Paul. 2017. Characterizing the landscape of musical data on the web: State of the art and challenges. In Proceedings of the Workshop on Humanities in the Semantic Web, co-located with the International Symposium on Wearable Computers ([email protected]). Retrieved from http://oro.open.ac.uk/51570/.Google ScholarGoogle Scholar
  26. [26] Daquino Marilena, Wigham Mari, Daga Enrico, Giagnolini Lucia, and Tomasi Francesca. 2022. CLEF. A linked open data native system for crowdsourcing. arXiv preprint arXiv:2206.08259 (2022).Google ScholarGoogle Scholar
  27. [27] Dimou Anastasia, Sande Miel Vander, Colpaert Pieter, Verborgh Ruben, Mannens Erik, and Walle Rik Van de. 2014. RML: A generic language for integrated RDF mappings of heterogeneous data. In Proceedings of the Workshop on Linked Data on the Web.Google ScholarGoogle Scholar
  28. [28] Gamma Erich, Helm Richard, Johnson Ralph, and Vlissides John. 1993. Design patterns: Abstraction and reuse of object-oriented design. In Proceedings of the European Conference on Object-Oriented Programming. Springer, 406431.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Gangemi Aldo, Alam Mehwish, Asprino Luigi, Presutti Valentina, and Recupero Diego Reforgiato. 2016. Framester: A wide coverage linguistic linked data hub. In Proceedings of the International Conference on Knowledge Engineering and Knowledge Management (EKAW). 239254. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Gangemi Aldo and Presutti Valentina. 2009. Ontology design patterns. In Handbook on Ontologies. Springer, 221243. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] García-González Herminio, Boneva Iovka, Staworko Sławek, Labra-Gayo José Emilio, and Lovelle Juan Manuel Cueva. 2020. ShExML: Improving the usability of heterogeneous data mapping languages for first-time users. PeerJ Comput. Sci. 6 (2020), e318.Google ScholarGoogle Scholar
  32. [32] Giacomo Giuseppe De, Lembo Domenico, Lenzerini Maurizio, Poggi Antonella, and Rosati Riccardo. 2018. Using ontologies for semantic data integration. In A Comprehensive Guide through the Italian Database Research over the Last 25 Years(Studies in Big Data, Vol. 31). Springer International Publishing, 187202. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Halford Graeme S. and Andrews Glenda. 2004. The development of deductive reasoning: How important is complexity?Think. Reason. 10, 2 (2004), 123145.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Hall Alan Geoffrey. 2019. The Lish: A Data Model for Grid Free Spreadsheets. Ph.D. Dissertation. The Open University.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Haslhofer Bernhard and Isaac Antoine. 2011. data.europeana.eu: The Europeana linked open data pilot. In Proceedings of the Dublin Core Conference (DC). 94104.Google ScholarGoogle Scholar
  36. [36] Heyvaert Pieter, Meester Ben De, Dimou Anastasia, and Verborgh Ruben. 2018. Declarative rules for linked data generation at your fingertips! In Proceedings of the European Semantic Web Conference. Springer, 213217.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Humphrey Eric J., Salamon Justin, Nieto Oriol, Forsyth Jon, Bittner Rachel M., and Bello Juan Pablo. 2014. JAMS: A JSON annotated music specification for reproducible MIR research. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR). 591596.Google ScholarGoogle Scholar
  38. [38] Iglesias Enrique, Jozashoori Samaneh, Chaves-Fraga David, Collarana Diego, and Vidal Maria-Esther. 2020. SDM-RDFizer: An RML interpreter for the efficient creation of RDF knowledge graphs. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 30393046.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Ko Andrew J., Abraham Robin, Beckwith Laura, Blackwell Alan, Burnett Margaret, Erwig Martin, Scaffidi Chris, Lawrance Joseph, Lieberman Henry, Myers Brad et al. 2011. The state of the art in end-user software engineering. ACM Comput. Surv. 43, 3 (2011), 144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Kontchakov Roman, Rezk Martin, Rodriguez-Muro Mariano, Xiao Guohui, and Zakharyaschev Michael. 2014. Answering SPARQL queries over databases under OWL 2 QL entailment regime. In Proceedings of the 13th International Semantic Web Conference (ISWC’14). Springer, 552567. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Kyzirakos Kostis, Vlachopoulos Ioannis, Savva Dimitrianos, Manegold Stefan, and Koubarakis Manolis. 2014. GeoTriples: A tool for publishing geospatial data as RDF graphs using R2RML mappings. In Proceedings of the Terra Cognita - Semantic Sensor Networks, Joint Proceedings of the International Semantic Web Conference (TC/SSN@ ISWC). 3344.Google ScholarGoogle Scholar
  42. [42] Lefrançois Maxime, Zimmermann Antoine, and Bakerally Noorani. 2017. A SPARQL extension for generating RDF from heterogeneous formats. In Proceedings of the Extended Semantic Web Conference (ESWC). Springer, 3550.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Lieberman Henry, Paternò Fabio, Klann Markus, and Wulf Volker. 2006. End-user Development: An emerging paradigm. In End User Development. Springer, 18.Google ScholarGoogle Scholar
  44. [44] Liskov Barbara and Zilles Stephen. 1974. Programming with abstract data types. ACM SIGPLAN Not. 9, 4 (1974), 5059.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Michel Franck, Djimenou Loïc, Faron-Zucker Catherine, and Montagnat Johan. 2015. Translation of relational and non-relational databases into RDF with xR2RML. In Proceedings of the 11th International Conference on Web Information Systems and Technologies. SciTePress, 443454. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Michel Franck, Faron-Zucker Catherine, Corby Olivier, and Gandon Fabien. 2019. Enabling automatic discovery and querying of web APIs at web scale using linked data standards. In Proceedings of the International Conference on World Wide Web (WWW). 883892.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Mora Jose and Corcho Óscar. 2013. Engineering optimisations in query rewriting for OBDA. In Proceedings of the 9th International Conference on Semantic Systems (ISEM’13). ACM, 4148. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Mulholland Paul, Daga Enrico, Daquino Marilena, Díaz-Kommonen Lily, Gangemi Aldo, Kulfik Tsvi, Wecker Alan J., Maguire Mark, Peroni Silvio, and Pescarin Sofia. 2020. Enabling multiple voices in the museum: Challenges and approaches. Digit. Cult. Societ. 6, 2 (2020), 259266.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Nuzzolese Andrea Giovanni, Gangemi Aldo, and Presutti Valentina. 2011. Gathering lexical linked data and knowledge patterns from FrameNet. In Proceedings of the International Conference on Knowledge Capture (K-CAP). ACM, 4148. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Nuzzolese Andrea Giovanni, Gangemi Aldo, Presutti Valentina, and Ciancarini Paolo. 2010. Fine-tuning triplification with Semion. In Proceedings of the Workshop on Knowledge Injection into and Extraction from Linked Data (KIELD). 214.Google ScholarGoogle Scholar
  51. [51] Panko Raymond R. and Aurigemma Salvatore. 2010. Revising the Panko–Halverson taxonomy of spreadsheet errors. Decis. Supp. Syst. 49, 2 (2010), 235244.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Paternò Fabio and Santoro Carmen. 2019. End-user development for personalizing applications, things, and robots. Int. J. Hum.-comput. Stud. 131 (2019), 120130.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Poggi Antonella, Lembo Domenico, Calvanese Diego, Giacomo Giuseppe De, Lenzerini Maurizio, and Rosati Riccardo. 2008. Linking data to ontologies. J. Data Semant. 10 (2008), 133173. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Priyatna Freddy, Corcho Oscar, and Sequeda Juan. 2014. Formalisation and experiences of R2RML-based SPARQL to SQL query translation using morph. In Proceedings of the 23rd International Conference on World Wide Web. 479490.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Prud’hommeaux Eric, Arenas Marcelo, Bertails Alexandre, and Sequeda Juan. 2012. A Direct Mapping of Relational Data to RDF. W3C Recommendation. W3C. Retrieved from https://www.w3.org/TR/2012/REC-rdb-direct-mapping-20120927/.Google ScholarGoogle Scholar
  56. [56] Redmon Joseph, Divvala Santosh, Girshick Ross, and Farhadi Ali. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779788.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Rodriguez-Muro Mariano and Rezk Martin. 2015. Efficient SPARQL-to-SQL with R2RML mappings. J. Web Semant. 33 (2015), 141169.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Sequeda Juan F., Arenas Marcelo, and Miranker Daniel P.. 2014. OBDA: Query rewriting or materialization? In practice, both! In Proceedings of the 13th International Semantic Web Conference (ISWC’14). Springer, 535551. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Sequeda Juan F. and Miranker Daniel P.. 2017. A pay-as-you-go methodology for ontology-based data access. IEEE Int. Comput. 21, 2 (2017), 9296.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. [60] Slepicka Jason, Yin Chengye, Szekely Pedro A., and Knoblock Craig A.. 2015. KR2RML: An alternative interpretation of R2RML for heterogenous sources. In Proceedings of the 6th International Workshop on Consuming Linked Data (COLD’15).Google ScholarGoogle Scholar
  61. [61] Warren Paul and Mulholland Paul. 2018. Using SPARQL-The practitioners’ viewpoint. In Proceedings of International Conference on Knowledge Engineering and Knowledge Management (EKAW). Springer, 485500.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Warren Paul, Mulholland Paul, Collins Trevor, and Motta Enrico. 2015. Making sense of description logics. In Proceedings of the 11th International Conference on Semantic Systems. 4956.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. [63] Wieringa Roel. 2010. Design science methodology: Principles and practice. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 493494.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. [64] Xiao Guohui, Calvanese Diego, Kontchakov Roman, Lembo Domenico, Poggi Antonella, Rosati Riccardo, and Zakharyaschev Michael. 2018. Ontology-based data access: A survey. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI).Google ScholarGoogle ScholarCross RefCross Ref
  65. [65] Xiao Guohui, Kontchakov Roman, Cogrel Benjamin, Calvanese Diego, and Botoeva Elena. 2018. Efficient handling of SPARQL OPTIONAL for OBDA. In Proceedings of the International Joint Conferences on Artificial Intelligence (ISWC). Springer, 354373. DOI:Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Knowledge Graph Construction with a Façade: A Unified Method to Access Heterogeneous Data Sources on the Web

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Internet Technology
        ACM Transactions on Internet Technology  Volume 23, Issue 1
        February 2023
        564 pages
        ISSN:1533-5399
        EISSN:1557-6051
        DOI:10.1145/3584863
        • Editor:
        • Ling Liu
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 February 2023
        • Online AM: 4 November 2022
        • Accepted: 14 July 2022
        • Revised: 24 June 2022
        • Received: 17 January 2022
        Published in toit Volume 23, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!