ABSTRACT
As the amount of memory in database systems grows, entire database tables, or even databases, are able to fit in the system's memory, making in-memory database operations more prevalent. This shift from disk-based to in-memory database systems has contributed to a move from row-wise to columnar data storage. Furthermore, common database workloads have grown beyond online transaction processing (OLTP) to include online analytical processing and data mining. These workloads analyze huge datasets that are often irregular and not indexed, making traditional database operations like joins much more expensive.
In this paper we explore using dedicated hardware to accelerate in-memory database operations. We present hardware to accelerate the selection process of compacting a single column into a linear column of selected data, joining two sorted columns via merging, and sorting a column. Finally, we put these primitives together to accelerate an entire join operation. We implement a prototype of this system using FPGAs and show substantial improvements in both absolute throughput and utilization of memory bandwidth. Using the prototype as a guide, we explore how the hardware resources required by our design change with the desired throughput.
- M. Bauer, H. Cook, and B. Khailany. CudaDMA: optimizing GPU memory bandwidth via warp specialization. In High Performance Computing, Networking, Storage and Analysis, SC'11. Google Scholar
Digital Library
- S. Borkar and A. A. Chien. The future of microprocessors. Commun. ACM, 54(5):67--77, May 2011. Google Scholar
Digital Library
- S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De. Parameter variations and impact on circuits and microarchitecture. In Design Automation Conference, June 2003. Google Scholar
Digital Library
- J. Chhugani, A. D. Nguyen, V. W. Lee, W. Macy, M. Hagog, Y.-K. Chen, A. Baransi, S. Kumar, and P. Dubey. Efficient implementation of sorting on multi-core SIMD CPU architecture. Proc. VLDB Endow., August 2008. Google Scholar
Digital Library
- A. A. Chien, A. Snavely, and M. Gahagan. 10x10: A general-purpose architectural approach to heterogeneity and energy efficiency. Procedia Computer Science, 4(0):1987--1996, 2011.Google Scholar
Cross Ref
- T. Kaldewey, G. Lohman, R. Mueller, and P. Volk. GPU join processing revisited. In Workshop on Data Management on New Hardware, DaMoN '12. Google Scholar
Digital Library
- C. Kim, T. Kaldewey, V. W. Lee, E. Sedlar, A. D. Nguyen, N. Satish, J. Chhugani, A. Di Blas, and P. Dubey. Sort vs. hash revisited: fast join implementation on modern multi-core CPUs. Proc. VLDB Endow., August 2009. Google Scholar
Digital Library
- D. Koch and J. Torresen. FPGASort: a high performance sorting architecture exploiting run-time reconfiguration on fpgas for large problem sorting. In Field Programmable Gate Arrays, FPGA '11. Google Scholar
Digital Library
- N. Leischner, V. Osipov, and P. Sanders. GPU sample sort. In Parallel Distributed Processing, IPDPS '10.Google Scholar
- J. D. McCalpin. STREAM: Sustainable memory bandwidth in high performance computers. http://www.cs.virginia.edu/stream/.Google Scholar
- R. Mueller, J. Teubner, and G. Alonso. Glacier: a query-to-hardware compiler. In Conference on Management of data, SIGMOD '10. Google Scholar
Digital Library
- R. Mueller, J. Teubner, and G. Alonso. Data processing on FPGAs. Proc. VLDB Endow., August 2009. Google Scholar
Digital Library
- R. Mueller, J. Teubner, and G. Alonso. Streams on wires: a query compiler for FPGAs. Proc. VLDB Endow., August 2009. Google Scholar
Digital Library
- Netezza. The Netezza FAST engines framework.Google Scholar
- N. Satish, M. Harris, and M. Garland. Designing efficient sorting algorithms for manycore GPUs. In Parallel Distributed Processing, IPDPS '09. Google Scholar
Digital Library
- N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, V. W. Lee, D. Kim, and P. Dubey. Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In Conference on Management of data, SIGMOD '10. Google Scholar
Digital Library
- E. Sintorn and U. Assarsson. Fast parallel GPU-sorting using a hybrid algorithm. Journal of Parallel and Distributed Computing, 68(10), 2008. Google Scholar
Digital Library
- B. Sukhwani, H. Min, M. Thoennes, P. Dube, B. Iyer, B. Brezzo, D. Dillenberger, and S. Asaad. Database analytics acceleration using FPGAs. In Parallel Architectures and Compilation Techniques, PACT '12. Google Scholar
Digital Library
Index Terms
Hardware acceleration of database operations
Recommendations
A Hardware/Software Approach for Database Query Acceleration with FPGAs
Complex analytics queries often involve expensive operations that may require large computational runtimes leading to slow query responsiveness and hampering real-time performance. Moreover, running these expensive analytics queries inside traditional ...
Accelerating Complete Decision Support Queries Through High-Level Synthesis Technology (Abstract Only)
FPGA '15: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysRecently, with the rise of Internet of Things and Big Data, acceleration of database analytics in order to have faster query processing capabilities has gained significant attention. At the same time, High-Level Synthesis (HLS) technology has matured ...
Large Payload Streaming Database Sort and Projection on FPGAs
SBAC-PAD '13: Proceedings of the 2013 25th International Symposium on Computer Architecture and High Performance ComputingIn recent years, real-time analytics has seen widespread adoption in the business world. While it provides useful business insights and improved market responsiveness, it also adds a computational burden to traditional online transaction processing (...





Comments