skip to main content
research-article

Fast and Practical Snippet Generation for RDF Datasets

Authors Info & Claims
Published:16 November 2019Publication History
Skip Abstract Section

Abstract

Triple-structured open data creates value in many ways. However, the reuse of datasets is still challenging. Users feel difficult to assess the usefulness of a large dataset containing thousands or millions of triples. To satisfy the needs, existing abstractive methods produce a concise high-level abstraction of data. Complementary to that, we adopt the extractive strategy and aim to select the optimum small subset of data from a dataset as a snippet to compactly illustrate the content of the dataset. This has been formulated as a combinatorial optimization problem in our previous work. In this article, we design a new algorithm for the problem, which is an order of magnitude faster than the previous one but has the same approximation ratio. We also develop an anytime algorithm that can generate empirically better solutions using additional time. To suit datasets that are partially accessible via online query services (e.g., SPARQL endpoints for RDF data), we adapt our algorithms to trade off quality of snippet for feasibility and efficiency in the Web environment. We carry out extensive experiments based on real RDF datasets and SPARQL endpoints for evaluating quality and running time. The results demonstrate the effectiveness and practicality of our proposed algorithms.

References

  1. Keith Alexander, Richard Cyganiak, Michael Hausenblas, and Jun Zhao. 2011. Describing Linked Datasets with the VoID Vocabulary. W3C Note. Retrieved from https://www.w3.org/TR/void/.Google ScholarGoogle Scholar
  2. Xi Bai, Renaud Delbru, and Giovanni Tummarello. 2008. RDF snippets for semantic Web search engines. In Proceedings of the OnTheMove Federated Conferences 8 Workshops (OTM’08), Part II. Springer, Berlin, 1304--1318. DOI:https://doi.org/10.1007/978-3-540-88873-4_27Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Adrien Basse, Fabien Gandon, Isabelle Mirbel, and Moussa Lo. 2010. DFS-based frequent graph pattern extraction to characterize the content of RDF triple stores. In Proceedings of the Web Science Conference 2010 (WebSci’10).Google ScholarGoogle Scholar
  4. Fabio Benedetti, Laura Po, and Sonia Bergamaschi. 2014. A visual summary for linked open data sources. In Proceedings of the International Semantic Web Conference 2014 Posters 8 Demonstrations Track (ISWC-PD’14), 173--176. DOI:https://doi.org/10.1007/978-3-642-41338-4_18Google ScholarGoogle Scholar
  5. Christoph Böhm, Gjergji Kasneci, and Felix Naumann. 2012. Latent topics in graph-structured data. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, NY, 2663--2666. DOI:https://doi.org/10.1145/2396761.2398718Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dan Brickley, Matthew Burgess, and Natasha F. Noy. 2019. Google dataset search: Building a search engine for datasets in an open web ecosystem. In Proceedings of the The World Wide Web Conference (WWW’19). 1365--1375. DOI:https://doi.org/10.1145/3308558.3313685Google ScholarGoogle Scholar
  7. Carlos Buil-Aranda, Aidan Hogan, Jürgen Umbrich, and Pierre-Yves Vandenbussche. 2013. SPARQL web-querying infrastructure: Ready for action?. In Proceedings of the 12th International Semantic Web Conference, Part II. Springer, Berlin, 277--293. DOI:https://doi.org/10.1007/978-3-642-41338-4_18Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Stéphane Campinas, Renaud Delbru, and Giovanni Tummarello. 2013. Efficiency and precision trade-offs in graph summary algorithms. In Proceedings of the 17th International Database Engineering & Applications Symposium (IDEAS’13). ACM, New York, NY, 38--47. DOI:https://doi.org/10.1145/2513591.2513654Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Šejla Čebirić, François Goasdoué, Haridimos Kondylakis, Dimitris Kotzinos, Ioana Manolescu, Georgia Troullinou, and Mussab Zneika. 2019. Summarizing semantic graphs: A survey. The VLDB Journal 28, 3 (June 2019), 295--327. DOI:https://doi.org/10.1007/s00778-018-0528-3Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gong Cheng, Cheng Jin, Wentao Ding, Danyun Xu, and Yuzhong Qu. 2017. Generating illustrative snippets for open data on the web. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining (WSDM’17). ACM, New York, NY, 151--159. DOI:https://doi.org/10.1145/3018661.3018670Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gong Cheng, Cheng Jin, and Yuzhong Qu. 2016. HIEDS: A generic and efficient approach to hierarchical dataset summarization. In Proceedings of the 25th International Joint Conference on Artificial Intelligence. IJCAI/AAAI Press, New York, NY, 3705--3711.Google ScholarGoogle Scholar
  12. Gong Cheng, Thanh Tran, and Yuzhong Qu. 2011. RELIN: Relatedness and informativeness-based centrality for entity summarization. In Proceedings of the 10th International Semantic Web Conference, Part I. Springer, Berlin, 114--129. DOI:https://doi.org/10.1007/978-3-642-25073-_8Google ScholarGoogle ScholarCross RefCross Ref
  13. Gong Cheng, Danyun Xu, and Yuzhong Qu. 2015. C3D+P: A summarization method for interactive entity resolution. J. Web Semant. 35 (Dec. 2015), 203--213. DOI:https://doi.org/10.1016/j.websem.2015.05.004Google ScholarGoogle Scholar
  14. Gong Cheng, Danyun Xu, and Yuzhong Qu. 2015. Summarizing entity descriptions for effective and efficient human-centered entity linking. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 184--194. DOI:https://doi.org/10.1145/2736277.2741094Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Klitos Christodoulou, Norman W. Paton, and Alvaro A. A. Fernandes. 2013. Structure inference for linked data sources using clustering. In Proceedings of the Joint EDBT/ICDT 2013 Workshops (EDBT’13). ACM, New York, NY, 60--67. DOI:https://doi.org/10.1007/978-3-662-46562-2_1Google ScholarGoogle Scholar
  16. Richard Cyganiak, David Wood, Markus Lanthaler, Graham Klyne, Jeremy J. Carroll, and Brian McBride. 2014. RDF 1.1 Concepts and Abstract Syntax. W3C recommendation. Retrieved from http://www.w3.org/TR/rdf11-concepts/.Google ScholarGoogle Scholar
  17. Julian Dolby, Achille Fokoue, Aditya Kalyanpur, Aaron Kershenbaum, Edith Schonberg, Kavitha Srinivas, and Li Ma. 2007. Scalable semantic retrieval through summarization and refinement. In Proceedings of the 22nd AAAI Conference on Artificial Intelligence, Vol. 7. AAAI Press, 299--304.Google ScholarGoogle Scholar
  18. Marek Dudáš, Vojtěch Svátek, and Jindřich Mynarz. 2015. Dataset summary visualization with LODSight. In Proceedings of The Semantic Web: ESWC 2015 Satellite Events, Revised Selected Papers. Springer, Cham, Switzerland, 36--40. DOI:https://doi.org/10.1007/978-3-319-25639-9_7Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Georgios Fakas, Zhi Cai, and Nikos Mamoulis. 2015. Diverse and proportional size-l object summaries for keyword search. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD’15). ACM, New York, NY, 363--375. DOI:https://doi.org/10.1145/2723372.2737783Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Georgios J. Fakas, Zhi Cai, and Nikos Mamoulis. 2013. Versatile size-l object summaries for relational keyword search. IEEE Trans. Knowl. Data Eng. 26, 4 (Jun. 2013), 1026--1038. DOI:https://doi.org/10.1109/tkde.2013.110Google ScholarGoogle Scholar
  21. Lee Feigenbaum, Gregory Todd Williams, Kendall Grant Clark, and Elias Torres. 2013. SPARQL 1.1 Protocol. W3C Recommendation. Retrieved from https://www.w3.org/TR/2013/REC-sparql11-protocol-20130321/.Google ScholarGoogle Scholar
  22. Santo Fortunato, Marián Boguñá, Alessandro Flammini, and Filippo Menczer. 2006. Approximating pagerank from In-Degree. In Proceedings of the 4th International WorkshopAlgorithms and Models for the Web-Graph. Springer, Berlin, 59--71. DOI:https://doi.org/10.1007/978-3-540-78808-9_6Google ScholarGoogle Scholar
  23. Kalpa Gunaratna, Krishnaprasad Thirunarayan, and Amit P. Sheth. 2015. FACES: Diversity-aware entity summarization using incremental hierarchical conceptual clustering. In Proceedings of the 29th AAAI Conference on Artificial Intelligence. AAAI Press, 116--122.Google ScholarGoogle Scholar
  24. Steve Harris and Andy Seaborne. 2013. SPARQL 1.1 Query Language. W3C Recommendation. Retrieved from https://www.w3.org/TR/2013/REC-sparql11-query-20130321/.Google ScholarGoogle Scholar
  25. D. S. Hochbaum and A. Pathria. 1994. Node-optimal connected k-subgraphs. Technical Report. UC Berkeley.Google ScholarGoogle Scholar
  26. Christian Hübler, Hans-Peter Kriegel, Karsten Borgwardt, and Zoubin Ghahramani. 2008. Metropolis algorithms for representative subgraph sampling. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM’08). IEEE, 283--292. DOI:https://doi.org/10.1109/ICDM.2008.124Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Shahan Khatchadourian and Mariano P. Consens. 2010. ExpLOD: Summary-based exploration of interlinking and RDF usage in the linked open data cloud. In Proceedings of The Semantic Web: Research and Applications, Part II. Springer, Berlin, 272--287. DOI:https://doi.org/10.1007/978-3-642-13489-0_19Google ScholarGoogle Scholar
  28. Heungsoon Felix Lee and Daniel R. Dooly. 1998. Decomposition algorithms for the maximum-weight connected graph problem. Nav. Res. Log. 45, 8 (Dec. 1998), 817--837. DOI:https://doi.org/10.1002/(SICI)1520-6750(199812)45:83.0.CO;2-1Google ScholarGoogle ScholarCross RefCross Ref
  29. Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06). ACM, New York, NY, 631--636. DOI:https://doi.org/10.1145/1150402.1150479Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ziyang Liu and Yi Chen. 2012. Differentiating search results on structured data. ACM Trans. Datab. Syst. 37, 1, Article 4 (2012), 30 pages. DOI:https://doi.org/10.1145/2109196.2109200Google ScholarGoogle Scholar
  31. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999-66. Stanford InfoLab.Google ScholarGoogle Scholar
  32. Seyedamin Pouriyeh, Mehdi Allahyari, Qingxia Liu, Gong Cheng, Hamid Reza Arabnia, Maurizio Atzori, Farid Ghareh Mohammadi, and Krys Kochut. 2019. Ontology summarization: Graph-based methods and beyond. Int. J. Semant. Comput. 13, 2 (2019), 259--283. DOI:https://doi.org/10.1142/S1793351X19300012Google ScholarGoogle ScholarCross RefCross Ref
  33. Valentina Presutti, Lora Aroyo, Alessandro Adamou, Balthasar Schopman, Aldo Gangemi, and Guus Schreiber. 2011. Extracting core knowledge from linked data. In Proceedings of the 2nd International Conference on Consuming Linked Data, Volume 782 (COLD’11). CEUR-WS.org, Aachen, Germany, 37--48.Google ScholarGoogle Scholar
  34. Laurens Rietveld, Rinke Hoekstra, Stefan Schlobach, and Christophe Guéret. 2014. Structural properties as proxy for semantic relevance in RDF graph sampling. In Proceedings of the 13th International Semantic Web Conference, Part II (ISWC’14). Springer, Cham, Switzerland, 81--96. DOI:https://doi.org/10.1007/978-3-319-11915-1_6Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Blerina Spahiu, Riccardo Porrini, Matteo Palmonari, Anisa Rula, and Andrea Maurino. 2016. ABSTAT: Ontology-driven linked data summaries with pattern minimalization. In Proceedings of the 2nd International Workshop on Summarizing and Presenting Entities and Ontologies. Springer, Cham, Switzerland, 381--395. DOI:https://doi.org/10.1007/978-3-319-47602-5_51Google ScholarGoogle ScholarCross RefCross Ref
  36. Andreas Thalhammer and Achim Rettinger. 2016. PageRank on wikipedia: Towards general importance scores for entities. In The Semantic Web: ESWC 2015 Satellite Events. Springer, Cham, Switzerland, 227--240. DOI:https://doi.org/10.1007/978-3-319-47602-5_41Google ScholarGoogle ScholarCross RefCross Ref
  37. Yuanyuan Tian, Richard A. Hankins, and Jignesh M. Patel. 2008. Efficient aggregation for graph summarization. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD’08). ACM, New York, NY, 567--580. DOI:https://doi.org/10.1145/1376616.1376675Google ScholarGoogle Scholar
  38. Georgia Troullinou, Haridimos Kondylakis, Evangelia Daskalaki, and Dimitris Plexousakis. 2015. RDF digest: Efficient summarization of RDF/S KBs. In Proceeding of the 12th European Semantic Web Conference. Springer, Cham, Switzerland, 119--134. DOI:https://doi.org/10.1007/978-3-319-18818-8_8Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Xiang Zhang, Gong Cheng, and Yuzhong Qu. 2007. Ontology summarization based on RDF sentence graph. In Proceedings of the 16th International Conference on World Wide Web (WWW’07). ACM, New York, NY, 707--716. DOI:https://doi.org/10.1145/1242572.1242668Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fast and Practical Snippet Generation for RDF Datasets

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image ACM Transactions on the Web
                ACM Transactions on the Web  Volume 13, Issue 4
                November 2019
                139 pages
                ISSN:1559-1131
                EISSN:1559-114X
                DOI:10.1145/3372405
                Issue’s Table of Contents

                Copyright © 2019 ACM

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 16 November 2019
                • Accepted: 1 September 2019
                • Revised: 1 June 2019
                • Received: 1 March 2018
                Published in tweb Volume 13, Issue 4

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article
                • Research
                • Refereed

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader

              HTML Format

              View this article in HTML Format .

              View HTML Format
              About Cookies On This Site

              We use cookies to ensure that we give you the best experience on our website.

              Learn more

              Got it!