ABSTRACT
Stream applications gained significant popularity over the last years that lead to the development of specialized stream engines. These systems are designed from scratch with a different philosophy than nowadays database engines in order to cope with the stream applications requirements. However, this means that they lack the power and sophisticated techniques of a full fledged database system that exploits techniques and algorithms accumulated over many years of database research.
In this paper, we take the opposite route and design a stream engine directly on top of a database kernel. Incoming tuples are directly stored upon arrival in a new kind of system tables, called baskets. A continuous query can then be evaluated over its relevant baskets as a typical one-time query exploiting the power of the relational engine. Once a tuple has been seen by all relevant queries/operators, it is dropped from its basket. A basket can be the input to a single or multiple similar query plans. Furthermore, a query plan can be split into multiple parts each one with its own input/output baskets allowing for flexible load sharing query scheduling. Contrary to traditional stream engines, that process one tuple at a time, this model allows batch processing of tuples, e.g., query a basket only after x tuples arrive or after a time threshold has passed. Furthermore, we are not restricted to process tuples in the order they arrive. Instead, we can selectively pick tuples from a basket based on the query requirements exploiting a novel query component, the basket expressions.
We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages. We propose a complete architecture, the DataCell, which we implemented on top of an open-source column-oriented DBMS. A detailed analysis and experimental evaluation of the core algorithms using both micro benchmarks and the standard Linear Road benchmark demonstrate the potential of this new approach.
References
- D. J. Abadi et al. The Design of the Borealis Stream Processing Engine. In CIDR, 2005.Google Scholar
- A. Arasu et al. CQL: A Language for Continuous Queries over Streams and Relations. In DBPL, 2003.Google Scholar
- A. Arasu et al. Linear Road: A Stream Data Management Benchmark. In VLDB, 2004. Google Scholar
Digital Library
- B. Babcock et al. Operator Scheduling in Data Stream Systems. The VLDB Journal, 13(4):333--353, 2004. Google Scholar
Digital Library
- S. Babu and J. Widom. Continuous Queries over Data Streams. SIGMOD Record, 30(3):109--120, 2001. Google Scholar
Digital Library
- H. Balakrishnan et al. Retrospective on Aurora. The VLDB Journal, 13(4):370--383, 2004. Google Scholar
Digital Library
- S. Chandrasekaran et al. TelegraphCQ: Continuous Data-flow Processing for an Uncertain World. In CIDR, 2003.Google Scholar
- J. Chen et al. NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In SIGMOD, 2000. Google Scholar
Digital Library
- C. D. Cranor et al. Gigascope: A Stream Database for Network Applications. In SIGMOD, 2003. Google Scholar
Digital Library
- L. Girod et al. The Case for a Signal-Oriented Data Stream Management System. In CIDR, 2007.Google Scholar
- S. Harizopoulos et al. QPipe: a simultaneously pipelined relational query engine. In SIGMOD, 2005. Google Scholar
Digital Library
- M. Ivanova and T. Risch. Customizable Parallel Execution of Scientific Stream Queries. In VLDB, 2005. Google Scholar
Digital Library
- N. Jain et al. Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core. In SIGMOD, 2006. Google Scholar
Digital Library
- M. Kersten, E. Liarou, and R. Goncalves. A Query Language for a Data Refinery Cell. In Int. Workshop on Event Driven Architecture and Event Processing Systems, 2007.Google Scholar
- H. Lim et al. Continuous query processing in data streams using duality of data and queries. In SIGMOD, 2006. Google Scholar
Digital Library
- S. Madden et al. Continuously Adaptive Continuous Queries over Streams. In SIGMOD, 2002. Google Scholar
Digital Library
- MonetDB. http://www.monetdb.com.Google Scholar
- J. L. Peterson. Petri nets. ACM Comput. Surv., 9(3), 1977. Google Scholar
Digital Library
- U. Schreier et al. Alert: An Architecture for Transforming a Passive DBMS into an Active DBMS. In VLDB, 1991. Google Scholar
Digital Library
- StreamSQL. http://blogs.streamsql.org/.Google Scholar
Index Terms
(auto-classified)Exploiting the power of relational databases for efficient stream processing




Comments