Abstract
This paper presents a new view of federated databases to address the growing need for managing information that spans multiple data models. This trend is fueled by the proliferation of storage engines and query languages based on the observation that 'no one size fits all'. To address this shift, we propose a polystore architecture; it is designed to unify querying over multiple data models. We consider the challenges and opportunities associated with polystores. Open questions in this space revolve around query optimization and the assignment of objects to storage engines. We introduce our approach to these topics and discuss our prototype in the context of the Intel Science and Technology Center for Big Data
- Accumulo. https://accumulo.apache.org/.Google Scholar
- L. Amsaleg, A. Tomasic, M. J. Franklin, and T. Urhan. Scrambling query plans to cope with unexpected delays. In Fourth International Conference on Parallel and Distributed Information Systems, 1996, pages 208--219. IEEE, 1996. Google Scholar
Digital Library
- B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In PODS, pages 1--16. ACM, 2002. Google Scholar
Digital Library
- C. Batini, M. Lenzerini, and S. B. Navathe. A comparative analysis of methodologies for database schema integration. ACM Computing Surveys, 18(4):323--364, 1986. Google Scholar
Digital Library
- L. Bouganim, F. Fabret, C. Mohan, and P. Valduriez. A dynamic query processing architecture for data integration systems. IEEE Data Eng. Bull., 23(2):42--48, 2000.Google Scholar
- P. G. Brown. Overview of scidb: large scale array storage, processing and analysis. In SIGMOD, pages 963--968. ACM, 2010. Google Scholar
Digital Library
- M. J. Carey, L. M. Haas, P. M. Schwarz, M. Arya, W. F. Cody, R. Fagin, M. Flickner, A. W. Luniewski,W. Niblack, and D. Petkovic. Towards heterogeneous multimedia information systems: The Garlic approach. In Data Engineering: Distributed Object Management, pages 124--131. IEEE, 1995. Google Scholar
Digital Library
- U. Cetintemel, J. Du, T. Kraska, S. Madden, D. Maier, J. Meehan, A. Pavlo, M. Stonebraker, E. Sutherland, and N. Tatbul. S-Store: A Streaming NewSQL System for Big Velocity Applications. PVLDB, 7(13), 2014. Google Scholar
Digital Library
- S. Chawathe, H. G. Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The TSIMMIS Project: Integration of Heterogeneous Information Sources. In IPSJ, 1994.Google Scholar
- A. Deshpande and J. M. Hellerstein. Decoupled query optimization for federated database systems. In ICDE, pages 716--727. IEEE, 2002. Google Scholar
Digital Library
- D. J. DeWitt, A. Halverson, R. Nehme, S. Shankar, J. Aguilar-Saborit, A. Avanes, M. Flasza, and J. Gramling. Split query processing in polybase. SIGMOD, pages 1255--1266, 2013. Google Scholar
Digital Library
- M. Franklin, A. Halevy, and D. Maier. From databases to dataspaces: a new abstraction for information management. Sigmod Record, 34(4):27--33, 2005. Google Scholar
Digital Library
- D. Halperin, V. Teixeira de Almeida, L. L. Choo, S. Chu, P. Koutris, D. Moritz, J. Ortiz, V. Ruamviboonsuk, J. Wang, A. Whitaker, et al. Demonstration of the Myria big data management service. In SIGMOD. ACM, 2014. Google Scholar
Digital Library
- R. Hull. Managing semantic heterogeneity in databases: a theoretical prospective. In PODS, pages 51--61. ACM, 1997. Google Scholar
Digital Library
- J. Kepner, W. Arcand, W. Bergeron, N. Bliss, R. Bond, C. Byun, G. Condon, K. Gregson, M. Hubbell, and J. Kurz. Dynamic distributed dimensional data model (d4m) database and computation system. In ICASSP. IEEE, 2012.Google Scholar
Cross Ref
- J. LeFevre, J. Sankaranarayanan, H. Hacigümüs, J. Tatemura, N. Polyzotis, and M. J. Carey. MISO: souping up big data query processing with a multistore system. In SIGMOD, pages 1591--1602, 2014. Google Scholar
Digital Library
- L. M. Mackinnon, D. H. Marwick, and M. H. Williams. A model for query decomposition and answer construction in heterogeneous distributed database systems. Journal of Intelligent Information Systems, 11(1):69--87, 1998. Google Scholar
Digital Library
- M. Saeed, M. Villarroel, A. T. Reisner, G. Clifford, L.-W. Lehman, G. Moody, T. Heldt, T. H. Kyaw, B. Moody, and R. G. Mark. Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II): A public-access intensive care unit database. Critical Care Medicine, 39:952--960, 2011.Google Scholar
Cross Ref
- P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In SIGMOD, pages 23--34. ACM, 1979. Google Scholar
Digital Library
- M. Stonebraker, P. M. Aoki, W. Litwin, A. Pfeffer, A. Sah, J. Sidell, C. Staelin, and A. Yu. Mariposa: a wide-area distributed database system. In The VLDB Journal, volume 5, pages 48--63. Springer, 1996. Google Scholar
Digital Library
- M. Stonebraker and U. Cetintemel. ¿One Size Fits All': An Idea Whose time has come and gone. In ICDE, pages 2--11, 2005. Google Scholar
Digital Library
- R. Taft, M. Vartak, N. R. Satish, N. Sundaram, S. Madden, and M. Stonebraker. Genbase: A complex analytics genomics benchmark. In SIGMOD, pages 177--188. ACM, 2014. Google Scholar
Digital Library
- G. Wiederhold. Mediators in the architecture of future information systems. Computer, pages 38--49, 1992. Google Scholar
Digital Library
Index Terms
(auto-classified)The BigDAWG Polystore System





Comments