skip to main content
research-article
Public Access

Top-k User-Defined Vertex Scoring Queries in Edge-Labeled Graph Databases

Published:27 September 2018Publication History
Skip Abstract Section

Abstract

We consider identifying highly ranked vertices in large graph databases such as social networks or the Semantic Web where there are edge labels. There are many applications where users express scoring queries against such databases that involve two elements: (i) a set of patterns describing relationships that a vertex of interest to the user must satisfy and (ii) a scoring mechanism in which the user may use properties of the vertex to assign a score to that vertex. We define the concept of a partial pattern map query (partial PM-query), which intuitively allows us to prune partial matchings, and show that finding an optimal partial PM-query is NP-hard. We then propose two algorithms, PScore_LP and PScore_NWST, to find the answer to a scoring (top-k) query. In PScore_LP, the optimal partial PM-query is found using a list-oriented pruning method. PScore_NWST leverages node-weighted Steiner trees to quickly compute slightly sub-optimal solutions. We conduct detailed experiments comparing our algorithms with (i) an algorithm (PScore_Base) that computes all answers to the query, evaluates them according to the scoring method, and chooses the top-k, and (ii) two Semantic Web query processing systems (Jena and GraphDB). Our algorithms show better performance than PScore_Base and the Semantic Web query processing systems—moreover, PScore_NWST outperforms PScore_LP on large queries and on queries with a tree structure.

References

  1. Ian A. Andrews, Srijan Kumar, Francesca Spezzano, and V. S. Subrahmanian. 2015. SPINN: Suspicion prediction in nuclear networks. In IEEE International Conference on Intelligence and Security Informatics (ISI). 19--24.Google ScholarGoogle Scholar
  2. Marcelo Arenas, Sebastián Conca, and Jorge Pérez. 2012. Counting beyond a yottabyte, or how SPARQL 1.1 property paths will prevent adoption of the standard. In International World Wide Web Conference (WWW). 629--638. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Medha Atre, Vineet Chaoji, Mohammed J. Zaki, and James A. Hendler. 2010. Matrix “Bit” loaded: A scalable lightweight join query processor for RDF data. In International World Wide Web Conference (WWW). 41--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Medha Atre, Jagannathan Srinivasan, and James A. Hendler. 2008. BitMat: A main-memory bit matrix of RDF triples for conjunctive triple pattern queries. In International Semantic Web Conference Posters and Demos. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Barry Bishop, Atanas Kiryakov, Damyan Ognyanoff, Ivan Peikov, Zdravko Tashev, and Ruslan Velkov. 2011. OWLIM: A family of scalable semantic repositories. Semantic Web 2, 1 (2011), 33--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Christian Bizer and Andreas Schultz. 2009. The Berlin SPARQL benchmark. Int. J. Semantic Web Inf. Syst. 5, 2 (2009), 1--24.Google ScholarGoogle ScholarCross RefCross Ref
  7. Matthias Bröcheler, Andrea Pugliese, and V. S. Subrahmanian. 2009. DOGMA: A disk-oriented graph matching algorithm for RDF databases. In International Semantic Web Conference. 97--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jeremy J. Carroll, Ian Dickinson, Chris Dollin, Dave Reynolds, Andy Seaborne, and Kevin Wilkinson. 2004. Jena: Implementing the semantic web recommendations. In International World Wide Web Conference (WWW). 74--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Meeyoung Cha, Alan Mislove, and P. Krishna Gummadi. 2009. A measurement-driven analysis of information propagation in the Flickr social network. In International World Wide Web Conference (WWW). 721--730. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jiefeng Cheng and Jeffrey Xu Yu. 2011. A survey of relational approaches for graph pattern matching over large graphs. In Graph Data Management. 112--141.Google ScholarGoogle Scholar
  11. Jiefeng Cheng, Xianggang Zeng, and Jeffrey Xu Yu. 2013. Top-k graph pattern matching over large graphs. In IEEE International Conference on Data Engineering. 1033--1044. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Xu Cheng, Cameron Dale, and Jiangchuan Liu. 2008. Statistics and social network of youtube videos. In IEEE/ACM International Symposium on Quality of Service (IWQoS). 229--238.Google ScholarGoogle ScholarCross RefCross Ref
  13. CiteSeerX. 2016. Public dataset. Retrieved March 15, 2018 from http://csxstatic.ist.psu.edu/about/data.Google ScholarGoogle Scholar
  14. Shady Elbassuoni, Maya Ramanath, and Gerhard Weikum. 2012. RDF Xpress: A flexible expressive RDF search engine. In International ACM SIGIR Conference on Research and Development in Information Retrieval. 1013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Wenfei Fan, Xin Wang, and Yinghui Wu. 2013. Diversified top-k graph pattern matching. PVLDB 6, 13 (2013), 1510--1521. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yuan Fang, Wenqing Lin, Vincent Wenchen Zheng, Min Wu, Kevin Chen-Chuan Chang, and Xiaoli Li. 2016. Semantic proximity search on graphs with metagraph-based learning. In IEEE International Conference on Data Engineering. 277--288.Google ScholarGoogle ScholarCross RefCross Ref
  17. Michael R. Garey and David S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman 8 Co., New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Gang Gou and Rada Chirkova. 2008. Efficient algorithms for exact ranked twig-pattern matching over graphs. In ACM SIGMOD Conference. 581--594. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. GraphDB. 2016. Home page. Retrieved March 15, 2018 from http://www.ontotext.com.Google ScholarGoogle Scholar
  20. Alvaro Graves, Sibel Adali, and Jim Hendler. 2008. A method to rank nodes in an RDF graph. In International Semantic Web Conference Posters and Demos. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sudipto Guha and Samir Khuller. 1999. Improved methods for approximating node weighted Steiner trees and connected dominating sets. Inf. Comput. 150, 1 (1999), 57--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Wook-Shin Han, Jinsoo Lee, Minh-Duc Pham, and Jeffrey Xu Yu. 2010. iGraph: A framework for comparisons of disk-based graph indexing techniques. PVLDB (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Steve Harris and Andy Seaborne. 2016. SPARQL 1.1 Query Language. Retrieved March 15, 2018 from http://www.w3.org/TR/sparql11-query.Google ScholarGoogle Scholar
  24. Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis, and Xiang Li. 2016. Meta structure: Computing relevance in large heterogeneous information networks. In SIGKDD Conference on Knowledge Discovery and Data Mining. 1595--1604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. IMDb. 2016. Public dataset. Retrieved March 15, 2018 from http://www.imdb.com/interfaces.Google ScholarGoogle Scholar
  26. Jena. 2016. Home page. Retrieved March 15, 2018 from https://jena.apache.org.Google ScholarGoogle Scholar
  27. Foteini Katsarou, Nikos Ntarmos, and Peter Triantafillou. 2015. Performance and scalability of indexed subgraph query processing methods. PVLDB (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Arijit Khan, Yinghui Wu, Charu C. Aggarwal, and Xifeng Yan. 2013. NeMa: Fast graph search with label similarity. PVLDB 6, 3 (2013), 181--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jinsoo Lee, Wook-Shin Han, Romans Kasperovics, and Jeong-Hoon Lee. 2013. An in-depth comparison of subgraph isomorphism algorithms in graph databases. In Proceedings of the International Conference on Very Large Data Bases (VLDB).Google ScholarGoogle Scholar
  30. Lehigh University Benchmark (LUBM). 2016. Home page. Retrieved March 15, 2018 from http://swat.cse.lehigh.edu/projects/lubm.Google ScholarGoogle Scholar
  31. Wenqing Lin, Xiaokui Xiao, James Cheng, and Sourav S. Bhowmick. 2012. Efficient algorithms for generalized subgraph query processing. In ACM International Conference on Information and Knowledge Management. 325--334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sara Magliacane, Alessandro Bozzon, and Emanuele Della Valle. 2012. Efficient execution of top-k SPARQL queries. In International Semantic Web Conference. 344--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Alan Mislove, Massimiliano Marcon, P. Krishna Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In ACM SIGCOMM Conference. 29--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Mohamed Morsey, Jens Lehmann, Sören Auer, and Axel-Cyrille Ngonga Ngomo. 2011. DBpedia SPARQL benchmark—Performance assessment with real queries on real data. In International Semantic Web Conference. 454--469. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Lorenzo De Nardo, Francesco Ranzato, and Francesco Tapparo. 2009. The subgraph similarity problem. IEEE Trans. Knowl. Data Eng. 21, 5 (2009), 748--749. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Neo4j. 2016. Home page. Retrieved March 15, 2018 from http://neo4j.com.Google ScholarGoogle Scholar
  37. Thomas Neumann and Gerhard Weikum. 2010. The RDF-3X engine for scalable management of RDF data. VLDB J. 19, 1 (2010), 91--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Thomas Neumann and Gerhard Weikum. 2010. x-RDF-3X: Fast querying, high update rates, and consistency for RDF databases. PVLDB 3, 1 (2010), 256--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Michael Ovelgönne, Noseong Park, V. S. Subrahmanian, Elizabeth K. Bowman, and Kirk Ogaard. 2013. Personalized best answer computation in graph databases. In International Semantic Web Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Zhengxiang Pan, Xingjian Zhang, and Jeff Heflin. 2008. DLDB2: A scalable multi-perspective semantic web repository. In International Conference on Web intelligence (WI). 489--495. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Robert Pienta, Acar Tamersoy, Hanghang Tong, and Duen Horng Chau. 2014. MAGE: Matching approximate patterns in richly-attributed graphs. In IEEE International Conference on Big Data. 585--590.Google ScholarGoogle ScholarCross RefCross Ref
  42. Yan Qi, K. Selçuk Candan, and Maria Luisa Sapino. 2007. Sum-max monotonic ranked joins for evaluating top-k twig queries on weighted data graphs. In International Conference on Very Large Data Bases (VLDB). 507--518. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Sherif Sakr and Ghazi Al-Naymat. 2009. Relational processing of RDF queries: A survey. ACM SIGMOD Conference Rec. 38, 4 (2009), 23--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and Philip S. Yu. 2017. A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 29, 1 (2017), 17--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Social Network Intelligence Benchmark. 2016. Home page. Retrieved March 15, 2018 from http://www.w3.org/wiki/Social_Network_Intelligence_BenchMark.Google ScholarGoogle Scholar
  46. SP<sup>2</sup>Bench. 2016. Home page. Retrieved March 15, 2018 from http://dbis.informatik.uni-freiburg.de/forschung/projekte/SP2B.Google ScholarGoogle Scholar
  47. SPARQLer. 2016. Home page. Retrieved March 15, 2018 from http://www.sparql.org.Google ScholarGoogle Scholar
  48. Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. 2011. PathSim: Meta path-based top-k similarity search in heterogeneous information networks. PVLDB 4, 11 (2011), 992--1003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Zhao Sun, Hongzhi Wang, Haixun Wang, Bin Shao, and Jianzhong Li. 2012. Efficient subgraph matching on billion node graphs. PVLDB 5, 9 (2012), 788--799. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Titan. 2016. Home page. Retrieved March 15, 2018 from http://thinkaurelius.github.io/titan.Google ScholarGoogle Scholar
  51. Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, and Dave Reynolds. 2003. Efficient RDF storage and retrieval in jena2. In International Conference on Semantic Web and Databases. 131--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Xifeng Yan, Bin He, Feida Zhu, and Jiawei Han. 2010. Top-k aggregation queries over large networks. In IEEE International Conference on Data Engineering. 377--380.Google ScholarGoogle ScholarCross RefCross Ref
  53. Shengqi Yang, Fangqiu Han, Yinghui Wu, and Xifeng Yan. 2016. Fast top-k search in knowledge graphs. In IEEE International Conference on Data Engineering.Google ScholarGoogle ScholarCross RefCross Ref
  54. Yuanyuan Zhu, Lu Qin, Jeffrey Xu Yu, and Hong Cheng. 2012. Finding top-k similar graphs in graph databases. In International Conference on Extending Database Technology. 456--467. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Lei Zou, Lei Chen, and Yansheng Lu. 2007. Top-k subgraph matching query in a large graph. In Ph.D. Workshop on Information and Knowledge Management. 139--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Lei Zou, M. Tamer Özsu, Lei Chen, Xuchuan Shen, Ruizhe Huang, and Dongyan Zhao. 2014. gStore: A graph-based SPARQL query engine. VLDB J. 23, 4 (2014), 565--590. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Top-k User-Defined Vertex Scoring Queries in Edge-Labeled Graph Databases

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on the Web
          ACM Transactions on the Web  Volume 12, Issue 4
          November 2018
          215 pages
          ISSN:1559-1131
          EISSN:1559-114X
          DOI:10.1145/3281744
          Issue’s Table of Contents

          Copyright © 2018 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 September 2018
          • Accepted: 1 March 2018
          • Revised: 1 October 2017
          • Received: 1 May 2016
          Published in tweb Volume 12, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!