Abstract
Triple-structured open data creates value in many ways. However, the reuse of datasets is still challenging. Users feel difficult to assess the usefulness of a large dataset containing thousands or millions of triples. To satisfy the needs, existing abstractive methods produce a concise high-level abstraction of data. Complementary to that, we adopt the extractive strategy and aim to select the optimum small subset of data from a dataset as a snippet to compactly illustrate the content of the dataset. This has been formulated as a combinatorial optimization problem in our previous work. In this article, we design a new algorithm for the problem, which is an order of magnitude faster than the previous one but has the same approximation ratio. We also develop an anytime algorithm that can generate empirically better solutions using additional time. To suit datasets that are partially accessible via online query services (e.g., SPARQL endpoints for RDF data), we adapt our algorithms to trade off quality of snippet for feasibility and efficiency in the Web environment. We carry out extensive experiments based on real RDF datasets and SPARQL endpoints for evaluating quality and running time. The results demonstrate the effectiveness and practicality of our proposed algorithms.
- Keith Alexander, Richard Cyganiak, Michael Hausenblas, and Jun Zhao. 2011. Describing Linked Datasets with the VoID Vocabulary. W3C Note. Retrieved from https://www.w3.org/TR/void/.Google Scholar
- Xi Bai, Renaud Delbru, and Giovanni Tummarello. 2008. RDF snippets for semantic Web search engines. In Proceedings of the OnTheMove Federated Conferences 8 Workshops (OTM’08), Part II. Springer, Berlin, 1304--1318. DOI:https://doi.org/10.1007/978-3-540-88873-4_27Google Scholar
Digital Library
- Adrien Basse, Fabien Gandon, Isabelle Mirbel, and Moussa Lo. 2010. DFS-based frequent graph pattern extraction to characterize the content of RDF triple stores. In Proceedings of the Web Science Conference 2010 (WebSci’10).Google Scholar
- Fabio Benedetti, Laura Po, and Sonia Bergamaschi. 2014. A visual summary for linked open data sources. In Proceedings of the International Semantic Web Conference 2014 Posters 8 Demonstrations Track (ISWC-PD’14), 173--176. DOI:https://doi.org/10.1007/978-3-642-41338-4_18Google Scholar
- Christoph Böhm, Gjergji Kasneci, and Felix Naumann. 2012. Latent topics in graph-structured data. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, NY, 2663--2666. DOI:https://doi.org/10.1145/2396761.2398718Google Scholar
Digital Library
- Dan Brickley, Matthew Burgess, and Natasha F. Noy. 2019. Google dataset search: Building a search engine for datasets in an open web ecosystem. In Proceedings of the The World Wide Web Conference (WWW’19). 1365--1375. DOI:https://doi.org/10.1145/3308558.3313685Google Scholar
- Carlos Buil-Aranda, Aidan Hogan, Jürgen Umbrich, and Pierre-Yves Vandenbussche. 2013. SPARQL web-querying infrastructure: Ready for action?. In Proceedings of the 12th International Semantic Web Conference, Part II. Springer, Berlin, 277--293. DOI:https://doi.org/10.1007/978-3-642-41338-4_18Google Scholar
Digital Library
- Stéphane Campinas, Renaud Delbru, and Giovanni Tummarello. 2013. Efficiency and precision trade-offs in graph summary algorithms. In Proceedings of the 17th International Database Engineering & Applications Symposium (IDEAS’13). ACM, New York, NY, 38--47. DOI:https://doi.org/10.1145/2513591.2513654Google Scholar
Digital Library
- Šejla Čebirić, François Goasdoué, Haridimos Kondylakis, Dimitris Kotzinos, Ioana Manolescu, Georgia Troullinou, and Mussab Zneika. 2019. Summarizing semantic graphs: A survey. The VLDB Journal 28, 3 (June 2019), 295--327. DOI:https://doi.org/10.1007/s00778-018-0528-3Google Scholar
Digital Library
- Gong Cheng, Cheng Jin, Wentao Ding, Danyun Xu, and Yuzhong Qu. 2017. Generating illustrative snippets for open data on the web. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining (WSDM’17). ACM, New York, NY, 151--159. DOI:https://doi.org/10.1145/3018661.3018670Google Scholar
Digital Library
- Gong Cheng, Cheng Jin, and Yuzhong Qu. 2016. HIEDS: A generic and efficient approach to hierarchical dataset summarization. In Proceedings of the 25th International Joint Conference on Artificial Intelligence. IJCAI/AAAI Press, New York, NY, 3705--3711.Google Scholar
- Gong Cheng, Thanh Tran, and Yuzhong Qu. 2011. RELIN: Relatedness and informativeness-based centrality for entity summarization. In Proceedings of the 10th International Semantic Web Conference, Part I. Springer, Berlin, 114--129. DOI:https://doi.org/10.1007/978-3-642-25073-_8Google Scholar
Cross Ref
- Gong Cheng, Danyun Xu, and Yuzhong Qu. 2015. C3D+P: A summarization method for interactive entity resolution. J. Web Semant. 35 (Dec. 2015), 203--213. DOI:https://doi.org/10.1016/j.websem.2015.05.004Google Scholar
- Gong Cheng, Danyun Xu, and Yuzhong Qu. 2015. Summarizing entity descriptions for effective and efficient human-centered entity linking. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 184--194. DOI:https://doi.org/10.1145/2736277.2741094Google Scholar
Digital Library
- Klitos Christodoulou, Norman W. Paton, and Alvaro A. A. Fernandes. 2013. Structure inference for linked data sources using clustering. In Proceedings of the Joint EDBT/ICDT 2013 Workshops (EDBT’13). ACM, New York, NY, 60--67. DOI:https://doi.org/10.1007/978-3-662-46562-2_1Google Scholar
- Richard Cyganiak, David Wood, Markus Lanthaler, Graham Klyne, Jeremy J. Carroll, and Brian McBride. 2014. RDF 1.1 Concepts and Abstract Syntax. W3C recommendation. Retrieved from http://www.w3.org/TR/rdf11-concepts/.Google Scholar
- Julian Dolby, Achille Fokoue, Aditya Kalyanpur, Aaron Kershenbaum, Edith Schonberg, Kavitha Srinivas, and Li Ma. 2007. Scalable semantic retrieval through summarization and refinement. In Proceedings of the 22nd AAAI Conference on Artificial Intelligence, Vol. 7. AAAI Press, 299--304.Google Scholar
- Marek Dudáš, Vojtěch Svátek, and Jindřich Mynarz. 2015. Dataset summary visualization with LODSight. In Proceedings of The Semantic Web: ESWC 2015 Satellite Events, Revised Selected Papers. Springer, Cham, Switzerland, 36--40. DOI:https://doi.org/10.1007/978-3-319-25639-9_7Google Scholar
Digital Library
- Georgios Fakas, Zhi Cai, and Nikos Mamoulis. 2015. Diverse and proportional size-l object summaries for keyword search. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD’15). ACM, New York, NY, 363--375. DOI:https://doi.org/10.1145/2723372.2737783Google Scholar
Digital Library
- Georgios J. Fakas, Zhi Cai, and Nikos Mamoulis. 2013. Versatile size-l object summaries for relational keyword search. IEEE Trans. Knowl. Data Eng. 26, 4 (Jun. 2013), 1026--1038. DOI:https://doi.org/10.1109/tkde.2013.110Google Scholar
- Lee Feigenbaum, Gregory Todd Williams, Kendall Grant Clark, and Elias Torres. 2013. SPARQL 1.1 Protocol. W3C Recommendation. Retrieved from https://www.w3.org/TR/2013/REC-sparql11-protocol-20130321/.Google Scholar
- Santo Fortunato, Marián Boguñá, Alessandro Flammini, and Filippo Menczer. 2006. Approximating pagerank from In-Degree. In Proceedings of the 4th International WorkshopAlgorithms and Models for the Web-Graph. Springer, Berlin, 59--71. DOI:https://doi.org/10.1007/978-3-540-78808-9_6Google Scholar
- Kalpa Gunaratna, Krishnaprasad Thirunarayan, and Amit P. Sheth. 2015. FACES: Diversity-aware entity summarization using incremental hierarchical conceptual clustering. In Proceedings of the 29th AAAI Conference on Artificial Intelligence. AAAI Press, 116--122.Google Scholar
- Steve Harris and Andy Seaborne. 2013. SPARQL 1.1 Query Language. W3C Recommendation. Retrieved from https://www.w3.org/TR/2013/REC-sparql11-query-20130321/.Google Scholar
- D. S. Hochbaum and A. Pathria. 1994. Node-optimal connected k-subgraphs. Technical Report. UC Berkeley.Google Scholar
- Christian Hübler, Hans-Peter Kriegel, Karsten Borgwardt, and Zoubin Ghahramani. 2008. Metropolis algorithms for representative subgraph sampling. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM’08). IEEE, 283--292. DOI:https://doi.org/10.1109/ICDM.2008.124Google Scholar
Digital Library
- Shahan Khatchadourian and Mariano P. Consens. 2010. ExpLOD: Summary-based exploration of interlinking and RDF usage in the linked open data cloud. In Proceedings of The Semantic Web: Research and Applications, Part II. Springer, Berlin, 272--287. DOI:https://doi.org/10.1007/978-3-642-13489-0_19Google Scholar
- Heungsoon Felix Lee and Daniel R. Dooly. 1998. Decomposition algorithms for the maximum-weight connected graph problem. Nav. Res. Log. 45, 8 (Dec. 1998), 817--837. DOI:https://doi.org/10.1002/(SICI)1520-6750(199812)45:83.0.CO;2-1Google Scholar
Cross Ref
- Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06). ACM, New York, NY, 631--636. DOI:https://doi.org/10.1145/1150402.1150479Google Scholar
Digital Library
- Ziyang Liu and Yi Chen. 2012. Differentiating search results on structured data. ACM Trans. Datab. Syst. 37, 1, Article 4 (2012), 30 pages. DOI:https://doi.org/10.1145/2109196.2109200Google Scholar
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999-66. Stanford InfoLab.Google Scholar
- Seyedamin Pouriyeh, Mehdi Allahyari, Qingxia Liu, Gong Cheng, Hamid Reza Arabnia, Maurizio Atzori, Farid Ghareh Mohammadi, and Krys Kochut. 2019. Ontology summarization: Graph-based methods and beyond. Int. J. Semant. Comput. 13, 2 (2019), 259--283. DOI:https://doi.org/10.1142/S1793351X19300012Google Scholar
Cross Ref
- Valentina Presutti, Lora Aroyo, Alessandro Adamou, Balthasar Schopman, Aldo Gangemi, and Guus Schreiber. 2011. Extracting core knowledge from linked data. In Proceedings of the 2nd International Conference on Consuming Linked Data, Volume 782 (COLD’11). CEUR-WS.org, Aachen, Germany, 37--48.Google Scholar
- Laurens Rietveld, Rinke Hoekstra, Stefan Schlobach, and Christophe Guéret. 2014. Structural properties as proxy for semantic relevance in RDF graph sampling. In Proceedings of the 13th International Semantic Web Conference, Part II (ISWC’14). Springer, Cham, Switzerland, 81--96. DOI:https://doi.org/10.1007/978-3-319-11915-1_6Google Scholar
Digital Library
- Blerina Spahiu, Riccardo Porrini, Matteo Palmonari, Anisa Rula, and Andrea Maurino. 2016. ABSTAT: Ontology-driven linked data summaries with pattern minimalization. In Proceedings of the 2nd International Workshop on Summarizing and Presenting Entities and Ontologies. Springer, Cham, Switzerland, 381--395. DOI:https://doi.org/10.1007/978-3-319-47602-5_51Google Scholar
Cross Ref
- Andreas Thalhammer and Achim Rettinger. 2016. PageRank on wikipedia: Towards general importance scores for entities. In The Semantic Web: ESWC 2015 Satellite Events. Springer, Cham, Switzerland, 227--240. DOI:https://doi.org/10.1007/978-3-319-47602-5_41Google Scholar
Cross Ref
- Yuanyuan Tian, Richard A. Hankins, and Jignesh M. Patel. 2008. Efficient aggregation for graph summarization. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD’08). ACM, New York, NY, 567--580. DOI:https://doi.org/10.1145/1376616.1376675Google Scholar
- Georgia Troullinou, Haridimos Kondylakis, Evangelia Daskalaki, and Dimitris Plexousakis. 2015. RDF digest: Efficient summarization of RDF/S KBs. In Proceeding of the 12th European Semantic Web Conference. Springer, Cham, Switzerland, 119--134. DOI:https://doi.org/10.1007/978-3-319-18818-8_8Google Scholar
Digital Library
- Xiang Zhang, Gong Cheng, and Yuzhong Qu. 2007. Ontology summarization based on RDF sentence graph. In Proceedings of the 16th International Conference on World Wide Web (WWW’07). ACM, New York, NY, 707--716. DOI:https://doi.org/10.1145/1242572.1242668Google Scholar
Digital Library
Index Terms
Fast and Practical Snippet Generation for RDF Datasets
Recommendations
Generating Illustrative Snippets for Open Data on the Web
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data MiningTo embrace the open data movement, increasingly many datasets have been published on the Web to be reused. Users, when assessing the usefulness of an unfamiliar dataset, need means to quickly inspect its contents. To satisfy the needs, we propose to ...
PCSG: Pattern-Coverage Snippet Generation for RDF Datasets
The Semantic Web – ISWC 2021AbstractFor reusing an RDF dataset, understanding its content is a prerequisite. To support the comprehension of its large and complex structure, existing methods mainly generate an abridged version of an RDF dataset by extracting representative data ...
Aether – Generating and Viewing Extended VoID Statistical Descriptions of RDF Datasets
The Semantic Web: ESWC 2014 Satellite EventsAbstractThis paper presents the Aether web application for generating, viewing and comparing extended VoID statistical descriptions of RDF datasets. The tool is useful for example in getting to know a newly encountered dataset, in comparing datasets ...






Comments