Abstract
Data integration is the dominant use case for RDF Knowledge Graphs. However, Web resources come in formats with weak semantics (for example, CSV and JSON), or formats specific to a given application (for example, BibTex, HTML, and Markdown). To solve this problem, Knowledge Graph Construction (KGC) is gaining momentum due to its focus on supporting users in transforming data into RDF. However, using existing KGC frameworks result in complex data processing pipelines, which mix structural and semantic mappings, whose development and maintenance constitute a significant bottleneck for KG engineers. Such frameworks force users to rely on different tools, sometimes based on heterogeneous languages, for inspecting sources, designing mappings, and generating triples, thus making the process unnecessarily complicated. We argue that it is possible and desirable to equip KG engineers with the ability of interacting with Web data formats by relying on their expertise in RDF and the well-established SPARQL query language [2].
In this article, we study a unified method for data access to heterogeneous data sources with Facade-X, a meta-model implemented in a new data integration system called SPARQL Anything. We demonstrate that our approach is theoretically sound, since it allows a single meta-model, based on RDF, to represent data from (a) any file format expressible in BNF syntax, as well as (b) any relational database. We compare our method to state-of-the-art approaches in terms of usability (cognitive complexity of the mappings) and general performance. Finally, we discuss the benefits and challenges of this novel approach by engaging with the reference user community.
Supplemental Material
Available for Download
Supplementary material
- [1] Souripriya Das, Seema Sundara, and Richard Cyganiak. 2012. R2RML: RDB to RDF Mapping Language.
W3C Recommendation . W3C. Retrieved from https://www.w3.org/TR/r2rml/.Google Scholar - [2] The W3C SPARQL Working Group (Eds.). 2013. SPARQL 1.1 Overview.
W3C Recommendation . W3C. Retrieved from https://www.w3.org/TR/2013/REC-sparql11-overview-20130321/.Google Scholar - [3] Enrico Daga, Luigi Asprino, and Justin Dowdy. 2022. SPARQL-Anything/sparql.anything. GitHub?
DOI: Google ScholarCross Ref
- [4] . 2021. Knowledge graph construction: An ETL system-based overview. In Proceedings of the Knowledge Graph Construction Workshop (ESWC’21).Google Scholar
- [5] . 2021. Knowledge Graph Benchmarking Report 2021.
DOI: Google ScholarCross Ref
- [6] . 1998. The Berkeley FrameNet project. In Proceedings of the International Conference on Computational Linguistics. 86–90.
DOI: Google ScholarDigital Library
- [7] . 2012. Semantic parsing on Freebase from question-answer pairs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1533–1544.Google Scholar
- [8] . 2020. OBDA for the web: Creating virtual RDF graphs on top of web data sources. arXiv preprint arXiv:2005.11264 (2020).Google Scholar
- [9] . 2019. Ontology-based data access—Beyond relational sources. Intelligenza Artificiale 13, 1 (2019), 21–36.
DOI: Google ScholarCross Ref
- [10] . 2020. Towards advanced interfaces for citizen curation. (
Sept. 2020). Retrieved from http://oro.open.ac.uk/72524/.Google Scholar - [11] . 2017. Ontop: Answering SPARQL queries over relational databases. Semant. Web 8, 3 (2017), 471–487.
DOI: Google ScholarDigital Library
- [12] . 2007. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. Autom. Reason. 39, 3 (2007), 385–429.
DOI: Google ScholarDigital Library
- [13] . 2012. Query processing under GLAV mappings for relational and graph databases. Proc. VLDB Endow. 6, 2 (2012), 61–72.
DOI: Google ScholarDigital Library
- [14] . 2016. Using and exploring hierarchical data in spreadsheets. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 2497–2507.Google Scholar
Digital Library
- [15] . 2021. Enhancing virtual ontology based access over tabular data with Morph-CSV. Semant. Web 12, 6 (2021), 869–902.Google Scholar
Digital Library
- [16] . 2015. Microsoft COCO captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325 (2015).Google Scholar
- [17] . 2020. Towards a framework for visual intelligence in service robotics: Epistemic requirements and gap analysis. In Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning. Retrieved from http://oro.open.ac.uk/72318/.Google Scholar
Cross Ref
- [18] . 2014. RDF 1.1 Concepts and Abstract Syntax.
W3C Recommendation . W3C. Retrieved from https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/.Google Scholar - [19] . 2022. Integrating citizen experiences in cultural heritage archives: Requirements, state-of-the-art, and challenges. ACM J. Comput. Cult. Herit. 15, 1 (2022), 1–35.Google Scholar
Digital Library
- [20] . 2021. Facade-X: An opinionated approach to SPARQL anything. In Further with Knowledge Graphs, Vol. 53. IOS Press, 58–73. Retrieved from http://oro.open.ac.uk/78973/.Google Scholar
- [21] . 2016. The open university linked data–data.open.ac.uk. Semant. Web 7, 2 (2016), 183–191.Google Scholar
Digital Library
- [22] . 2019. Modelling and querying lists in RDF. A pragmatic study. In Proceedings of the 3rd Workshop on Querying and Benchmarking the Web of Data co-located with 18th International Semantic Web Conference (ISWC’19). CEUR-WS.org.Google Scholar
- [23] . 2021. Sequential linked data: The state of affairs. Semant. Web (2021).Google Scholar
Digital Library
- [24] . 2015. A BASILar approach for building web APIs on top of SPARQL endpoints. In Proceedings of the Workshop on Services and Applications over Linked APIs and Data co-located with the Extended Semantic Web Conference ([email protected]). 22–32.Google Scholar
- [25] . 2017. Characterizing the landscape of musical data on the web: State of the art and challenges. In Proceedings of the Workshop on Humanities in the Semantic Web, co-located with the International Symposium on Wearable Computers ([email protected]). Retrieved from http://oro.open.ac.uk/51570/.Google Scholar
- [26] . 2022. CLEF. A linked open data native system for crowdsourcing. arXiv preprint arXiv:2206.08259 (2022).Google Scholar
- [27] . 2014. RML: A generic language for integrated RDF mappings of heterogeneous data. In Proceedings of the Workshop on Linked Data on the Web.Google Scholar
- [28] . 1993. Design patterns: Abstraction and reuse of object-oriented design. In Proceedings of the European Conference on Object-Oriented Programming. Springer, 406–431.Google Scholar
Cross Ref
- [29] . 2016. Framester: A wide coverage linguistic linked data hub. In Proceedings of the International Conference on Knowledge Engineering and Knowledge Management (EKAW). 239–254.
DOI: Google ScholarDigital Library
- [30] . 2009. Ontology design patterns. In Handbook on Ontologies. Springer, 221–243.
DOI: Google ScholarCross Ref
- [31] . 2020. ShExML: Improving the usability of heterogeneous data mapping languages for first-time users. PeerJ Comput. Sci. 6 (2020), e318.Google Scholar
- [32] . 2018. Using ontologies for semantic data integration. In A Comprehensive Guide through the Italian Database Research over the Last 25 Years
(Studies in Big Data , Vol. 31). Springer International Publishing, 187–202.DOI: Google ScholarCross Ref
- [33] . 2004. The development of deductive reasoning: How important is complexity?Think. Reason. 10, 2 (2004), 123–145.Google Scholar
Cross Ref
- [34] . 2019. The Lish: A Data Model for Grid Free Spreadsheets. Ph.D. Dissertation. The Open University.Google Scholar
Digital Library
- [35] . 2011. data.europeana.eu: The Europeana linked open data pilot. In Proceedings of the Dublin Core Conference (DC). 94–104.Google Scholar
- [36] . 2018. Declarative rules for linked data generation at your fingertips! In Proceedings of the European Semantic Web Conference. Springer, 213–217.Google Scholar
Cross Ref
- [37] . 2014. JAMS: A JSON annotated music specification for reproducible MIR research. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR). 591–596.Google Scholar
- [38] . 2020. SDM-RDFizer: An RML interpreter for the efficient creation of RDF knowledge graphs. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 3039–3046.Google Scholar
Digital Library
- [39] . 2011. The state of the art in end-user software engineering. ACM Comput. Surv. 43, 3 (2011), 1–44.Google Scholar
Digital Library
- [40] . 2014. Answering SPARQL queries over databases under OWL 2 QL entailment regime. In Proceedings of the 13th International Semantic Web Conference (ISWC’14). Springer, 552–567.
DOI: Google ScholarDigital Library
- [41] . 2014. GeoTriples: A tool for publishing geospatial data as RDF graphs using R2RML mappings. In Proceedings of the Terra Cognita - Semantic Sensor Networks, Joint Proceedings of the International Semantic Web Conference (TC/SSN@ ISWC). 33–44.Google Scholar
- [42] . 2017. A SPARQL extension for generating RDF from heterogeneous formats. In Proceedings of the Extended Semantic Web Conference (ESWC). Springer, 35–50.Google Scholar
Digital Library
- [43] . 2006. End-user Development: An emerging paradigm. In End User Development. Springer, 1–8.Google Scholar
- [44] . 1974. Programming with abstract data types. ACM SIGPLAN Not. 9, 4 (1974), 50–59.Google Scholar
Digital Library
- [45] . 2015. Translation of relational and non-relational databases into RDF with xR2RML. In Proceedings of the 11th International Conference on Web Information Systems and Technologies. SciTePress, 443–454.
DOI: Google ScholarCross Ref
- [46] . 2019. Enabling automatic discovery and querying of web APIs at web scale using linked data standards. In Proceedings of the International Conference on World Wide Web (WWW). 883–892.Google Scholar
Digital Library
- [47] . 2013. Engineering optimisations in query rewriting for OBDA. In Proceedings of the 9th International Conference on Semantic Systems (ISEM’13). ACM, 41–48.
DOI: Google ScholarDigital Library
- [48] . 2020. Enabling multiple voices in the museum: Challenges and approaches. Digit. Cult. Societ. 6, 2 (2020), 259–266.Google Scholar
Cross Ref
- [49] . 2011. Gathering lexical linked data and knowledge patterns from FrameNet. In Proceedings of the International Conference on Knowledge Capture (K-CAP). ACM, 41–48.
DOI: Google ScholarDigital Library
- [50] . 2010. Fine-tuning triplification with Semion. In Proceedings of the Workshop on Knowledge Injection into and Extraction from Linked Data (KIELD). 2–14.Google Scholar
- [51] . 2010. Revising the Panko–Halverson taxonomy of spreadsheet errors. Decis. Supp. Syst. 49, 2 (2010), 235–244.Google Scholar
Digital Library
- [52] . 2019. End-user development for personalizing applications, things, and robots. Int. J. Hum.-comput. Stud. 131 (2019), 120–130.Google Scholar
Digital Library
- [53] . 2008. Linking data to ontologies. J. Data Semant. 10 (2008), 133–173.
DOI: Google ScholarCross Ref
- [54] . 2014. Formalisation and experiences of R2RML-based SPARQL to SQL query translation using morph. In Proceedings of the 23rd International Conference on World Wide Web. 479–490.Google Scholar
Digital Library
- [55] . 2012. A Direct Mapping of Relational Data to RDF.
W3C Recommendation . W3C. Retrieved from https://www.w3.org/TR/2012/REC-rdb-direct-mapping-20120927/.Google Scholar - [56] . 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779–788.Google Scholar
Cross Ref
- [57] . 2015. Efficient SPARQL-to-SQL with R2RML mappings. J. Web Semant. 33 (2015), 141–169.Google Scholar
Cross Ref
- [58] . 2014. OBDA: Query rewriting or materialization? In practice, both! In Proceedings of the 13th International Semantic Web Conference (ISWC’14). Springer, 535–551.
DOI: Google ScholarDigital Library
- [59] . 2017. A pay-as-you-go methodology for ontology-based data access. IEEE Int. Comput. 21, 2 (2017), 92–96.Google Scholar
Digital Library
- [60] . 2015. KR2RML: An alternative interpretation of R2RML for heterogenous sources. In Proceedings of the 6th International Workshop on Consuming Linked Data (COLD’15).Google Scholar
- [61] . 2018. Using SPARQL-The practitioners’ viewpoint. In Proceedings of International Conference on Knowledge Engineering and Knowledge Management (EKAW). Springer, 485–500.Google Scholar
Cross Ref
- [62] . 2015. Making sense of description logics. In Proceedings of the 11th International Conference on Semantic Systems. 49–56.Google Scholar
Digital Library
- [63] . 2010. Design science methodology: Principles and practice. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 493–494.Google Scholar
Digital Library
- [64] . 2018. Ontology-based data access: A survey. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI).Google Scholar
Cross Ref
- [65] . 2018. Efficient handling of SPARQL OPTIONAL for OBDA. In Proceedings of the International Joint Conferences on Artificial Intelligence (ISWC). Springer, 354–373.
DOI: Google ScholarCross Ref
Index Terms
Knowledge Graph Construction with a Façade: A Unified Method to Access Heterogeneous Data Sources on the Web
Recommendations
Using the relation ontology Metarel for modelling Linked Data as multi-digraphs
Linked Data for Health Care and the Life SciencesThe Semantic Web standards OWL and RDF are often used to represent biomedical information as Linked Data; however, the OWL/RDF syntax, which combines both, was never optimised for querying. By combining two formal paradigms for modelling Linked Data, ...
RDF, Jena, SparQL and the 'Semantic Web'
SIGUCCS '09: Proceedings of the 37th annual ACM SIGUCCS fall conference: communication and collaborationThe Resource Description Format (RDF) is used to represent information modeled as a "graph": a set of individual objects, along with a set of connections among those objects. In that role, RDF is one of the pillars of the so-called Semantic Web. This ...
Querying semantic web data with SPARQL
PODS '11: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsThe Semantic Web is the initiative of the W3C to make information on the Web readable not only by humans but also by machines. RDF is the data model for Semantic Web data, and SPARQL is the standard query language for this data model. In the last ten ...






Comments