Abstract
Knowledge Bases (KBs) are widely used as one of the fundamental components in Semantic Web applications as they provide facts and relationships that can be automatically understood by machines. Curated knowledge bases usually use Resource Description Framework (RDF) as the data representation model. To query the RDF-presented knowledge in curated KBs, Web interfaces are built via SPARQL Endpoints. Currently, querying SPARQL Endpoints has problems like network instability and latency, which affect the query efficiency. To address these issues, we propose a client-side caching framework, SPARQL Endpoint Caching Framework (SECF), aiming at accelerating the overall querying speed over SPARQL Endpoints. SECF identifies the potential issued queries by leveraging the querying patterns learned from clients’ historical queries and prefecthes/caches these queries. In particular, we develop a distance function based on graph edit distance to measure the similarity of SPARQL queries. We propose a feature modelling method to transform SPARQL queries to vector representation that are fed into machine-learning algorithms. A time-aware smoothing-based method, Modified Simple Exponential Smoothing (MSES), is developed for cache replacement. Extensive experiments performed on real-world queries showcase the effectiveness of our approach, which outperforms the state-of-the-art work in terms of the overall querying speed.
- Naomi S. Altman. 1992. An introduction to kernel and nearest-neighbor nonparametric regression. Amer. Stat. 46, 3 (1992), 175--185.Google Scholar
- Shady Elbassuoni, Maya Ramanath, and Gerhard Weikum. Query relaxation for entity-relationship search. In Proceedings of the 8th Extended Semantic Web Conference (ESWC’11). 62--76. Google Scholar
Digital Library
- Anthony Fader, Luke Zettlemoyer, and Oren Etzioni. Open question answering over curated and extracted knowledge bases. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). 1156--1165. Google Scholar
Digital Library
- Géraud Fokou, Stéphane Jean, Allel Hadjali, and Mickaël Baron. Cooperative techniques for SPARQL query relaxation in RDF databases. In Proceedings of the 12th Extended Semantic Web Conference (ESWC’15). 237--252. Google Scholar
Digital Library
- Everette S. Gardner. 2006. Exponential smoothing: The state of the art--part ii. Int. J. Forecast. 22, 4 (2006), 637--666.Google Scholar
Cross Ref
- Jiawei Han, Jian Pei, and Micheline Kamber. 2011. Data Mining: Concepts and Techniques. Elsevier. Google Scholar
Digital Library
- Rakebul Hasan. Predicting SPARQL query performance and explaining linked data. In Proceedings of the 11th Extended Semantic Web Conference (ESWC’14). 795--805.Google Scholar
- Harold Hotelling. 1936. Relations between two sets of variates. Biometrika (1936), 321--377.Google Scholar
- Ian Jolliffe. 2002. Principal Component Analysis. Wiley Online Library.Google Scholar
- Elem Guzel Kalayci, Tahir Emre Kalayci, and Derya Birant. 2015. An ant colony optimisation approach for optimising SPARQL queries by reordering triple patterns. Inf. Syst. 50 (2015), 51--68. Google Scholar
Digital Library
- Leonard Kaufman and Peter Rousseeuw. 1987. Clustering by Means of Medoids. North-Holland.Google Scholar
- Dashiell Kolbe, Qiang Zhu, and Sakti Pramanik. 2010. Efficient k-nearest neighbor searching in nonordered discrete data spaces. ACM Trans. Inf. Syst. 28, 2 (2010). Google Scholar
Digital Library
- Daniel D. Lee and H. Sebastian Seung. 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755 (1999), 788--791.Google Scholar
- Jens Lehmann and Lorenz Bühmann. AutoSPARQL: Let users query your knowledge base. In Proceedings of the 8th Extended Semantic Web Conference (ESWC’11). 63--79. Google Scholar
Digital Library
- Justin J. Levandoski, Per-Åke Larson, and Radu Stoica. Identifying hot and cold data in main-memory databases. In Proceedings of 29th International Conference on Data Engineering (ICDE’13). 26--37. Google Scholar
Digital Library
- Johannes Lorey and Felix Naumann. Detecting SPARQL query templates for data prefetching. In Proceedings of the 10th Extended Semantic Web Conference (ESWC’13). 124--139.Google Scholar
- Michael Martin, Jörg Unbehauen, and Sören Auer. Improving the performance of semantic web applications with SPARQL query caching. In Proceedings of the 7th Extended Semantic Web Conference (ESWC’10). 304--318. Google Scholar
Digital Library
- Mohamed Morsey, Jens Lehmann, Sören Auer, and Axel-Cyrille Ngonga Ngomo. Usage-centric benchmarking of RDF triple stores. In Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI’12). Google Scholar
Digital Library
- Thomas Neumann and Gerhard Weikum. 2010. The RDF-3X engine for scalable management of RDF data. VLDB J. 19, 1 (2010), 91--113. Google Scholar
Digital Library
- Elizabeth J. O’Neil, Patrick E. O’Neil, and Gerhard Weikum. The LRU-K page replacement algorithm for database disk buffering. In Proceedings of the International Conference on Management of Data (SIGMOD’93). 297--306. Google Scholar
Digital Library
- Nikolaos Papailiou, Dimitrios Tsoumakos, Panagiotis Karras, and Nectarios Koziris. Graph-aware, workload-adaptive SPARQL query caching. In Proceedings of the International Conference on Management of Data (SIGMOD’15). 1777--1792. Google Scholar
Digital Library
- Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. 2009. Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34, 3 (2009). Google Scholar
Digital Library
- Qun Ren, Margaret H. Dunham, and Vijay Kumar. 2003. Semantic caching and query processing. IEEE Trans. Knowl. Data Eng. 15, 1 (2003), 192--210. Google Scholar
Digital Library
- Alberto Sanfeliu and King-Sun Fu. 1983. A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst., Man, Cybern., Syst. 13, 3 (1983), 353--362.Google Scholar
Cross Ref
- Yanfeng Shu, Michael Compton, Heiko Müller, and Kerry Taylor. Towards content-aware SPARQL query caching for semantic web applications. In Proceedings of the 14th International Conference on Web Information Systems Engineering (WISE’13). 320--329.Google Scholar
- Ruben Verborgh, Olaf Hartig, Ben De Meester, Gerald Haesendonck, Laurens De Vocht, Miel Vander Sande, Richard Cyganiak, Pieter Colpaert, Erik Mannens, and Rik Van de Walle. Querying datasets on the web with high availability. In Proceedings of the 13th International Semantic Web Conference (ISWC’14). 180--196. Google Scholar
Digital Library
- Mengdong Yang and Gang Wu. Caching intermediate result of SPARQL queries. In Proceedings of the 20th International World Wide Web Conference (WWW’11). 159--160. Google Scholar
Digital Library
- Pengcheng Yin, Nan Duan, Ben Kao, Jun-Wei Bao, and Ming Zhou. Answering questions with complex semantic constraints on open knowledge bases. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM’15). 1301--1310. Google Scholar
Digital Library
- Wei Emma Zhang, Quan Z. Sheng, Kerry Taylor, and Yongrui Qin. Identifying and caching hot triples for efficient RDF query processing. In Proceedings of the 20th International Conference on Database Systems for Advanced Applications (DASFAA’15). 259--274.Google Scholar
- Wayne Xin Zhao, Xudong Zhang, Daniel Lemire, Dongdong Shan, Jian-Yun Nie, Hongfei Yan, and Ji-Rong Wen. 2015. A general SIMD-based approach to accelerating compression algorithms. ACM Trans. Inf. Syst. 33, 3 (2015), 15:1--15:28. Google Scholar
Digital Library
- Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Özsu, and Dongyan Zhao. 2011. gStore: Answering SPARQL queries via subgraph matching. PVLDB 4, 8 (2011), 482--493. Google Scholar
Digital Library
Index Terms
A Learning-Based Framework for Improving Querying on Web Interfaces of Curated Knowledge Bases
Recommendations
SECF: improving SPARQL querying performance with proactive fetching and caching
SAC '16: Proceedings of the 31st Annual ACM Symposium on Applied ComputingQuerying on SPARQL endpoints may be unsatisfactory due to high latency of connections to the endpoints. Caching is an important way to accelerate the query response speed. In this paper, we propose SPARQL Endpoint Caching Framework (SECF), a client-side ...
Expressive Languages for Querying the Semantic Web
Best of PODS 2017, Best of ICDT 2017 and Regular PapersThe problem of querying RDF data is a central issue for the development of the Semantic Web. The query language SPARQL has become the standard language for querying RDF since its W3C standardization in 2008. However, the 2008 version of this language ...
Graph-Aware, Workload-Adaptive SPARQL Query Caching
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataThe pace at which data is described, queried and exchanged using the RDF specification has been ever increasing with the proliferation of Semantic Web. Minimizing SPARQL query response times has been an open issue for the plethora of RDF stores, yet ...






Comments