ABSTRACT
Unlike traditional data such as relational databases and XML documents, most of graphs do not have their own schema. However, schema is a concise representation of a graph, and if we can extract a "good" schema from a graph, we can take advantage of the extracted schema for effective graph data management. In this paper, we focus on Shape Expression Schemas (ShEx) and consider extracting ShEx schemas from RDF/graph data. To manage both efficiency and quality of extracted schema, our algorithm consists of two schema extraction steps: (i) edge-label based clustering and (ii) type-merge method for target nodes of outgoing edges. We made preliminary experiments, which result suggests that our algorithm can extract ShEx schemas appropriately.
References
- Shape expressions (ShEx) primer. http://shexspec.github.io/primer/.Google Scholar
- Balmin, A., Hristidis, V., and Papakonstantinou, Y. ObjectRank: authority-based keyword search in databases. In Proceedings of the Thirtieth International Conference on Very Large Data Bases (2004), pp. 564--575.Google Scholar
- Bizer, C., and Schultz, A. The berlin SPARQL benchmark. International Journal on Semantic Web & Information Systems (2009), 1--24.Google Scholar
- Chidlovskii, B. Schema extraction from XML collections. In Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries (2002), pp. 291--292.Google Scholar
Digital Library
- Garofalakis, M. N., Gionis, A., Rastogi, R., Seshadri, S., and Shim, K. XTRACT: A system for extracting document type descriptors from XML documents. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (2000), pp. 165--176.Google Scholar
Digital Library
- Goldman, R., and Widom, J. DataGuides: enabling query formulation and optimization in semistructured databases. In Proceedings of 23rd International Conference on Very Large Data Bases (VLDB '97) (1997), pp. 436--445.Google Scholar
- Hegewald, J., Naumann, F., and Weis, M. XStruct: efficient schema extraction from multiple and large XML documents. In Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006 (2006), p. 81.Google Scholar
Digital Library
- Kellou-Menouer, K., and Kedad, Z. Schema discovery in RDF data sources. In Proceedings of 34th International Conference on Conceptual Modeling (ER 2015) (2015), pp. 481--495.Google Scholar
Cross Ref
- Nestorov, S., Abiteboul, S., and Motwani, R. Extracting schema from semistructured data. In Proceedings ACM SIGMOD International Conference on Management of Data (1998), pp. 295--306.Google Scholar
Digital Library
- Schmidt, M., Hornung, T., Lausen, G., and Pinkel, C. SP2Bench: a SPARQL Performance Benchmark. In Proceedings of the IEEE 25th International Conference on Data Engineering (ICDE'09) (2009), IEEE, pp. 222--233.Google Scholar
- Sekine, Y., and Suzuki, N. An algorithm for extracting schemas from external memory graphs. In Proceedings of the first workshop on Big Network Analytics (in conjunction with CIKM 2016) (2016).Google Scholar
- Staworko, S., Boneva, I., Gayo, J. E. L., Hym, S., Prud'Hommeaux, E. G., and Solbrig, H. Complexity and expressiveness of ShEx for RDF. In Proceedings of 18th International Conference on Database Theory (ICDT 2015) (2015), p. 17.Google Scholar
- Šejla Čebirić, Franccois Goasdoué, I. M. Query-oriented summarization of RDF graphs. In Proceedings of the VLDB Endowment (2015), pp. 2012--2015.Google Scholar
- Wang, Q. Y., Yu, J. X., and Wong, K.-F. Approximate graph schema extraction for semi-structured data. In Proceedings of 7th International Conference on Extending Database Technology (EDBT 2000) (2000), pp. 302--316.Google Scholar
Cross Ref
Index Terms
An Algorithm for Extracting Shape Expression Schemas from Graphs




Comments