Abstract
We consider identifying highly ranked vertices in large graph databases such as social networks or the Semantic Web where there are edge labels. There are many applications where users express scoring queries against such databases that involve two elements: (i) a set of patterns describing relationships that a vertex of interest to the user must satisfy and (ii) a scoring mechanism in which the user may use properties of the vertex to assign a score to that vertex. We define the concept of a partial pattern map query (partial PM-query), which intuitively allows us to prune partial matchings, and show that finding an optimal partial PM-query is NP-hard. We then propose two algorithms, PScore_LP and PScore_NWST, to find the answer to a scoring (top-k) query. In PScore_LP, the optimal partial PM-query is found using a list-oriented pruning method. PScore_NWST leverages node-weighted Steiner trees to quickly compute slightly sub-optimal solutions. We conduct detailed experiments comparing our algorithms with (i) an algorithm (PScore_Base) that computes all answers to the query, evaluates them according to the scoring method, and chooses the top-k, and (ii) two Semantic Web query processing systems (Jena and GraphDB). Our algorithms show better performance than PScore_Base and the Semantic Web query processing systems—moreover, PScore_NWST outperforms PScore_LP on large queries and on queries with a tree structure.
- Ian A. Andrews, Srijan Kumar, Francesca Spezzano, and V. S. Subrahmanian. 2015. SPINN: Suspicion prediction in nuclear networks. In IEEE International Conference on Intelligence and Security Informatics (ISI). 19--24.Google Scholar
- Marcelo Arenas, Sebastián Conca, and Jorge Pérez. 2012. Counting beyond a yottabyte, or how SPARQL 1.1 property paths will prevent adoption of the standard. In International World Wide Web Conference (WWW). 629--638. Google Scholar
Digital Library
- Medha Atre, Vineet Chaoji, Mohammed J. Zaki, and James A. Hendler. 2010. Matrix “Bit” loaded: A scalable lightweight join query processor for RDF data. In International World Wide Web Conference (WWW). 41--50. Google Scholar
Digital Library
- Medha Atre, Jagannathan Srinivasan, and James A. Hendler. 2008. BitMat: A main-memory bit matrix of RDF triples for conjunctive triple pattern queries. In International Semantic Web Conference Posters and Demos. Google Scholar
Digital Library
- Barry Bishop, Atanas Kiryakov, Damyan Ognyanoff, Ivan Peikov, Zdravko Tashev, and Ruslan Velkov. 2011. OWLIM: A family of scalable semantic repositories. Semantic Web 2, 1 (2011), 33--42. Google Scholar
Digital Library
- Christian Bizer and Andreas Schultz. 2009. The Berlin SPARQL benchmark. Int. J. Semantic Web Inf. Syst. 5, 2 (2009), 1--24.Google Scholar
Cross Ref
- Matthias Bröcheler, Andrea Pugliese, and V. S. Subrahmanian. 2009. DOGMA: A disk-oriented graph matching algorithm for RDF databases. In International Semantic Web Conference. 97--113. Google Scholar
Digital Library
- Jeremy J. Carroll, Ian Dickinson, Chris Dollin, Dave Reynolds, Andy Seaborne, and Kevin Wilkinson. 2004. Jena: Implementing the semantic web recommendations. In International World Wide Web Conference (WWW). 74--83. Google Scholar
Digital Library
- Meeyoung Cha, Alan Mislove, and P. Krishna Gummadi. 2009. A measurement-driven analysis of information propagation in the Flickr social network. In International World Wide Web Conference (WWW). 721--730. Google Scholar
Digital Library
- Jiefeng Cheng and Jeffrey Xu Yu. 2011. A survey of relational approaches for graph pattern matching over large graphs. In Graph Data Management. 112--141.Google Scholar
- Jiefeng Cheng, Xianggang Zeng, and Jeffrey Xu Yu. 2013. Top-k graph pattern matching over large graphs. In IEEE International Conference on Data Engineering. 1033--1044. Google Scholar
Digital Library
- Xu Cheng, Cameron Dale, and Jiangchuan Liu. 2008. Statistics and social network of youtube videos. In IEEE/ACM International Symposium on Quality of Service (IWQoS). 229--238.Google Scholar
Cross Ref
- CiteSeerX. 2016. Public dataset. Retrieved March 15, 2018 from http://csxstatic.ist.psu.edu/about/data.Google Scholar
- Shady Elbassuoni, Maya Ramanath, and Gerhard Weikum. 2012. RDF Xpress: A flexible expressive RDF search engine. In International ACM SIGIR Conference on Research and Development in Information Retrieval. 1013. Google Scholar
Digital Library
- Wenfei Fan, Xin Wang, and Yinghui Wu. 2013. Diversified top-k graph pattern matching. PVLDB 6, 13 (2013), 1510--1521. Google Scholar
Digital Library
- Yuan Fang, Wenqing Lin, Vincent Wenchen Zheng, Min Wu, Kevin Chen-Chuan Chang, and Xiaoli Li. 2016. Semantic proximity search on graphs with metagraph-based learning. In IEEE International Conference on Data Engineering. 277--288.Google Scholar
Cross Ref
- Michael R. Garey and David S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman 8 Co., New York. Google Scholar
Digital Library
- Gang Gou and Rada Chirkova. 2008. Efficient algorithms for exact ranked twig-pattern matching over graphs. In ACM SIGMOD Conference. 581--594. Google Scholar
Digital Library
- GraphDB. 2016. Home page. Retrieved March 15, 2018 from http://www.ontotext.com.Google Scholar
- Alvaro Graves, Sibel Adali, and Jim Hendler. 2008. A method to rank nodes in an RDF graph. In International Semantic Web Conference Posters and Demos. Google Scholar
Digital Library
- Sudipto Guha and Samir Khuller. 1999. Improved methods for approximating node weighted Steiner trees and connected dominating sets. Inf. Comput. 150, 1 (1999), 57--74. Google Scholar
Digital Library
- Wook-Shin Han, Jinsoo Lee, Minh-Duc Pham, and Jeffrey Xu Yu. 2010. iGraph: A framework for comparisons of disk-based graph indexing techniques. PVLDB (2010). Google Scholar
Digital Library
- Steve Harris and Andy Seaborne. 2016. SPARQL 1.1 Query Language. Retrieved March 15, 2018 from http://www.w3.org/TR/sparql11-query.Google Scholar
- Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis, and Xiang Li. 2016. Meta structure: Computing relevance in large heterogeneous information networks. In SIGKDD Conference on Knowledge Discovery and Data Mining. 1595--1604. Google Scholar
Digital Library
- IMDb. 2016. Public dataset. Retrieved March 15, 2018 from http://www.imdb.com/interfaces.Google Scholar
- Jena. 2016. Home page. Retrieved March 15, 2018 from https://jena.apache.org.Google Scholar
- Foteini Katsarou, Nikos Ntarmos, and Peter Triantafillou. 2015. Performance and scalability of indexed subgraph query processing methods. PVLDB (2015). Google Scholar
Digital Library
- Arijit Khan, Yinghui Wu, Charu C. Aggarwal, and Xifeng Yan. 2013. NeMa: Fast graph search with label similarity. PVLDB 6, 3 (2013), 181--192. Google Scholar
Digital Library
- Jinsoo Lee, Wook-Shin Han, Romans Kasperovics, and Jeong-Hoon Lee. 2013. An in-depth comparison of subgraph isomorphism algorithms in graph databases. In Proceedings of the International Conference on Very Large Data Bases (VLDB).Google Scholar
- Lehigh University Benchmark (LUBM). 2016. Home page. Retrieved March 15, 2018 from http://swat.cse.lehigh.edu/projects/lubm.Google Scholar
- Wenqing Lin, Xiaokui Xiao, James Cheng, and Sourav S. Bhowmick. 2012. Efficient algorithms for generalized subgraph query processing. In ACM International Conference on Information and Knowledge Management. 325--334. Google Scholar
Digital Library
- Sara Magliacane, Alessandro Bozzon, and Emanuele Della Valle. 2012. Efficient execution of top-k SPARQL queries. In International Semantic Web Conference. 344--360. Google Scholar
Digital Library
- Alan Mislove, Massimiliano Marcon, P. Krishna Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In ACM SIGCOMM Conference. 29--42. Google Scholar
Digital Library
- Mohamed Morsey, Jens Lehmann, Sören Auer, and Axel-Cyrille Ngonga Ngomo. 2011. DBpedia SPARQL benchmark—Performance assessment with real queries on real data. In International Semantic Web Conference. 454--469. Google Scholar
Digital Library
- Lorenzo De Nardo, Francesco Ranzato, and Francesco Tapparo. 2009. The subgraph similarity problem. IEEE Trans. Knowl. Data Eng. 21, 5 (2009), 748--749. Google Scholar
Digital Library
- Neo4j. 2016. Home page. Retrieved March 15, 2018 from http://neo4j.com.Google Scholar
- Thomas Neumann and Gerhard Weikum. 2010. The RDF-3X engine for scalable management of RDF data. VLDB J. 19, 1 (2010), 91--113. Google Scholar
Digital Library
- Thomas Neumann and Gerhard Weikum. 2010. x-RDF-3X: Fast querying, high update rates, and consistency for RDF databases. PVLDB 3, 1 (2010), 256--263. Google Scholar
Digital Library
- Michael Ovelgönne, Noseong Park, V. S. Subrahmanian, Elizabeth K. Bowman, and Kirk Ogaard. 2013. Personalized best answer computation in graph databases. In International Semantic Web Conference. Google Scholar
Digital Library
- Zhengxiang Pan, Xingjian Zhang, and Jeff Heflin. 2008. DLDB2: A scalable multi-perspective semantic web repository. In International Conference on Web intelligence (WI). 489--495. Google Scholar
Digital Library
- Robert Pienta, Acar Tamersoy, Hanghang Tong, and Duen Horng Chau. 2014. MAGE: Matching approximate patterns in richly-attributed graphs. In IEEE International Conference on Big Data. 585--590.Google Scholar
Cross Ref
- Yan Qi, K. Selçuk Candan, and Maria Luisa Sapino. 2007. Sum-max monotonic ranked joins for evaluating top-k twig queries on weighted data graphs. In International Conference on Very Large Data Bases (VLDB). 507--518. Google Scholar
Digital Library
- Sherif Sakr and Ghazi Al-Naymat. 2009. Relational processing of RDF queries: A survey. ACM SIGMOD Conference Rec. 38, 4 (2009), 23--28. Google Scholar
Digital Library
- Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and Philip S. Yu. 2017. A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 29, 1 (2017), 17--37. Google Scholar
Digital Library
- Social Network Intelligence Benchmark. 2016. Home page. Retrieved March 15, 2018 from http://www.w3.org/wiki/Social_Network_Intelligence_BenchMark.Google Scholar
- SP<sup>2</sup>Bench. 2016. Home page. Retrieved March 15, 2018 from http://dbis.informatik.uni-freiburg.de/forschung/projekte/SP2B.Google Scholar
- SPARQLer. 2016. Home page. Retrieved March 15, 2018 from http://www.sparql.org.Google Scholar
- Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. 2011. PathSim: Meta path-based top-k similarity search in heterogeneous information networks. PVLDB 4, 11 (2011), 992--1003.Google Scholar
Digital Library
- Zhao Sun, Hongzhi Wang, Haixun Wang, Bin Shao, and Jianzhong Li. 2012. Efficient subgraph matching on billion node graphs. PVLDB 5, 9 (2012), 788--799. Google Scholar
Digital Library
- Titan. 2016. Home page. Retrieved March 15, 2018 from http://thinkaurelius.github.io/titan.Google Scholar
- Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, and Dave Reynolds. 2003. Efficient RDF storage and retrieval in jena2. In International Conference on Semantic Web and Databases. 131--150. Google Scholar
Digital Library
- Xifeng Yan, Bin He, Feida Zhu, and Jiawei Han. 2010. Top-k aggregation queries over large networks. In IEEE International Conference on Data Engineering. 377--380.Google Scholar
Cross Ref
- Shengqi Yang, Fangqiu Han, Yinghui Wu, and Xifeng Yan. 2016. Fast top-k search in knowledge graphs. In IEEE International Conference on Data Engineering.Google Scholar
Cross Ref
- Yuanyuan Zhu, Lu Qin, Jeffrey Xu Yu, and Hong Cheng. 2012. Finding top-k similar graphs in graph databases. In International Conference on Extending Database Technology. 456--467. Google Scholar
Digital Library
- Lei Zou, Lei Chen, and Yansheng Lu. 2007. Top-k subgraph matching query in a large graph. In Ph.D. Workshop on Information and Knowledge Management. 139--146. Google Scholar
Digital Library
- Lei Zou, M. Tamer Özsu, Lei Chen, Xuchuan Shen, Ruizhe Huang, and Dongyan Zhao. 2014. gStore: A graph-based SPARQL query engine. VLDB J. 23, 4 (2014), 565--590. Google Scholar
Digital Library
Index Terms
Top-k User-Defined Vertex Scoring Queries in Edge-Labeled Graph Databases
Recommendations
Efficient query processing on graph databases
We study the problem of processing subgraph queries on a database that consists of a set of graphs. The answer to a subgraph query is the set of graphs in the database that are supergraphs of the query. In this article, we propose an efficient index, FG*...
Top-k dominating queries in uncertain databases
EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database TechnologyDue to the existence of uncertain data in a wide spectrum of real applications, uncertain query processing has become increasingly important, which dramatically differs from handling certain data in a traditional database. In this paper, we formulate ...
Probabilistic top-k dominating queries in uncertain databases
Due to the existence of uncertain data in a wide spectrum of real applications, uncertain query processing has become increasingly important, which dramatically differs from handling certain data in a traditional database. In this paper, we formulate ...






Comments