10.1145/3342558.3345417acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedings
short-paper

An Algorithm for Extracting Shape Expression Schemas from Graphs

ABSTRACT

Unlike traditional data such as relational databases and XML documents, most of graphs do not have their own schema. However, schema is a concise representation of a graph, and if we can extract a "good" schema from a graph, we can take advantage of the extracted schema for effective graph data management. In this paper, we focus on Shape Expression Schemas (ShEx) and consider extracting ShEx schemas from RDF/graph data. To manage both efficiency and quality of extracted schema, our algorithm consists of two schema extraction steps: (i) edge-label based clustering and (ii) type-merge method for target nodes of outgoing edges. We made preliminary experiments, which result suggests that our algorithm can extract ShEx schemas appropriately.

References

  1. Shape expressions (ShEx) primer. http://shexspec.github.io/primer/.Google ScholarGoogle Scholar
  2. Balmin, A., Hristidis, V., and Papakonstantinou, Y. ObjectRank: authority-based keyword search in databases. In Proceedings of the Thirtieth International Conference on Very Large Data Bases (2004), pp. 564--575.Google ScholarGoogle Scholar
  3. Bizer, C., and Schultz, A. The berlin SPARQL benchmark. International Journal on Semantic Web & Information Systems (2009), 1--24.Google ScholarGoogle Scholar
  4. Chidlovskii, B. Schema extraction from XML collections. In Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries (2002), pp. 291--292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Garofalakis, M. N., Gionis, A., Rastogi, R., Seshadri, S., and Shim, K. XTRACT: A system for extracting document type descriptors from XML documents. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (2000), pp. 165--176.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Goldman, R., and Widom, J. DataGuides: enabling query formulation and optimization in semistructured databases. In Proceedings of 23rd International Conference on Very Large Data Bases (VLDB '97) (1997), pp. 436--445.Google ScholarGoogle Scholar
  7. Hegewald, J., Naumann, F., and Weis, M. XStruct: efficient schema extraction from multiple and large XML documents. In Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006 (2006), p. 81.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kellou-Menouer, K., and Kedad, Z. Schema discovery in RDF data sources. In Proceedings of 34th International Conference on Conceptual Modeling (ER 2015) (2015), pp. 481--495.Google ScholarGoogle ScholarCross RefCross Ref
  9. Nestorov, S., Abiteboul, S., and Motwani, R. Extracting schema from semistructured data. In Proceedings ACM SIGMOD International Conference on Management of Data (1998), pp. 295--306.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Schmidt, M., Hornung, T., Lausen, G., and Pinkel, C. SP2Bench: a SPARQL Performance Benchmark. In Proceedings of the IEEE 25th International Conference on Data Engineering (ICDE'09) (2009), IEEE, pp. 222--233.Google ScholarGoogle Scholar
  11. Sekine, Y., and Suzuki, N. An algorithm for extracting schemas from external memory graphs. In Proceedings of the first workshop on Big Network Analytics (in conjunction with CIKM 2016) (2016).Google ScholarGoogle Scholar
  12. Staworko, S., Boneva, I., Gayo, J. E. L., Hym, S., Prud'Hommeaux, E. G., and Solbrig, H. Complexity and expressiveness of ShEx for RDF. In Proceedings of 18th International Conference on Database Theory (ICDT 2015) (2015), p. 17.Google ScholarGoogle Scholar
  13. Šejla Čebirić, Franccois Goasdoué, I. M. Query-oriented summarization of RDF graphs. In Proceedings of the VLDB Endowment (2015), pp. 2012--2015.Google ScholarGoogle Scholar
  14. Wang, Q. Y., Yu, J. X., and Wong, K.-F. Approximate graph schema extraction for semi-structured data. In Proceedings of 7th International Conference on Extending Database Technology (EDBT 2000) (2000), pp. 302--316.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. An Algorithm for Extracting Shape Expression Schemas from Graphs

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!