ABSTRACT

In recent years, our customers have expressed frustration in the traditional approach of using a combination of disparate products to handle their streaming, transactional and analytical needs. The common practice of stitching heterogeneous environments in custom ways has caused enormous production woes by increasing development complexity and total cost of ownership. With SnappyData, an open source platform, we propose a unified engine for real-time operational analytics, delivering stream analytics, OLTP and OLAP in a single integrated solution. We realize this platform through a seamless integration of Apache Spark (as a big data computational engine) with GemFire (as an in-memory transactional store with scale-out SQL semantics). In this demonstration, after presenting a few use case scenarios, we exhibit SnappyData as our our in-memory solution for delivering truly interactive analytics (i.e., a couple of seconds), when faced with large data volumes or high velocity streams. We show that SnappyData can exploit state-of-the-art approximate query processing techniques and a variety of data synopses. Finally, we allow the audience to define various high-level accuracy contracts (HAC), to communicate their accuracy requirements with SnappyData in an intuitive fashion.
- Apache Samza. http://samza.apache.org/.Google Scholar
- S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. BlinkDB: queries with bounded errors and bounded response times on very large data. In EuroSys, 2013. Google Scholar
Digital Library
- M. Armbrust et al. Spark SQL: Relational data processing in Spark. In SIGMOD, 2015. Google Scholar
Digital Library
- L. Braun et al. Analytics in motion: High performance event-processing and real-time analytics in the same database. In SIGMOD, 2015. Google Scholar
Digital Library
- G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55, 2005. Google Scholar
Digital Library
- M. Kornacker et al. Impala: A modern, open-source sql engine for hadoop. In CIDR, 2015.Google Scholar
- E. Liarou et al. Monetdb/datacell: online analytics in a streaming column-store. PVLDB, 2012. Google Scholar
Digital Library
- B. Mozafari and N. Niu. A handbook for building an approximate query engine. IEEE Data Engineering Bulletin, 2015.Google Scholar
- B. Mozafari and C. Zaniolo. Optimal load shedding with aggregates and mining queries. In ICDE, 2010.Google Scholar
Cross Ref
- A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment, 2(2):1626--1629, 2009. Google Scholar
Digital Library
- A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. [email protected] In SIGMOD, 2014. Google Scholar
Digital Library
Index Terms
SnappyData: A Hybrid Transactional Analytical Store Built On Spark
Recommendations
Hybrid Transactional/Analytical Processing: A Survey
The popularity of large-scale real-time analytics applications (real-time inventory/pricing, recommendations from mobile apps, fraud detection, risk analysis, IoT, etc.) keeps rising. These applications require distributed data management systems that ...
A configurable and executable model of Spark Streaming on Apache YARN
Streams of data are produced today at an unprecedented scale. Efficient and stable processing of these streams requires a careful interplay between the parameters of the streaming application and of the underlying stream processing framework. Today, ...
SnappyData: a hybrid system for transactions, analytics, and streaming: demo
An increasing number of applications rely on workflows that involve (1) continuous stream processing, (2) transactional and write-heavy workloads, and (3) interactive SQL analytics. These applications need to consume high-velocity streams to trigger ...





Comments