ABSTRACT
This paper analyses different hardware sorting architectures in order to implement a highly scaleable sorter for solving huge problems at high performance up to the GB range in linear time complexity. It will be proven that a combination of a FIFO-based merge sorter and a tree-based merge sorter results in the best performance at low cost. Moreover, we will demonstrate how partial run-time reconfiguration can be used for saving almost half the FPGA resources or alternatively for improving the speed. Experiments show a sustainable sorting throughput of 2GB/s for problems fitting into the on-chip FPGA memory and 1 GB/s when using external memory. These values surpass the best published results on large problem sorting implementations on FPGAs, GPUs, and the Cell processor.
- P. Alfke. Take Advantage of Leftover Multipliers and Block RAMs. Xcell Journal, 2:48--49, 2001.Google Scholar
- K. E. Batcher. Sorting networks and their applications. In Proceedings of the April 30-May 2, 1968, spring joint computer conference (AFIPS 68), pages 307--314. ACM, 1968. Google Scholar
Digital Library
- M. Bednara, O. Beyer, J. Teich, and R. Wanka. Tradeoff Analysis and Architecture Design of a Hybrid Hardware/Software Sorter. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors (ASAP), pages 299--308. IEEE Computer Society, 2000. Google Scholar
Digital Library
- Berkeley Wireless Research Center. BEEcube Homepage, 2010. http://www.beecube.com/platform.html.Google Scholar
- C. Claus, R. Ahmed, F. Altenried, and W. Stechele. Towards Rapid Dynamic Partial Reconfiguration in Video-Based Driver Assistance Systems. In Reconfigurable Computing: Architectures, Tools and Applications (ARCS), volume 5992 of LNCS, pages 55--67. Springer, 2010. Google Scholar
Digital Library
- B. Gedik, R. R. Bordawekar, and P. S. Yu. CellSort: High Performance Sorting on the Cell Processor. In Proceedings of the 33rd international conference on Very large data bases (VLDB), pages 1286--1297. VLDB Endowment, 2007. Google Scholar
Digital Library
- N. K. Govindaraju, J. Gray, R. Kumar, and D. Manocha. GPUTeraSort: High Performance Graphics Coprocessor Sorting for Large Database Management. In Proceedings of the ACM international conference on management of data (SIGMOD), pages 325--336. ACM, 2006. Google Scholar
Digital Library
- G. Graefe. Implementing sorting in database systems. ACM Comput. Surv., 38(3):10, 2006. Google Scholar
Digital Library
- L. K. Ha, J. Krüger, and C. T. Silva. Fast Four-Way Parallel Radix Sorting on GPUs. Comput. Graph. Forum, 28(8):2368--2378, 2009.Google Scholar
Cross Ref
- J. K. L. Ha and C. Silva. Implicit radix sorting on GPUs, 2010. GPU GEMS volume 2, to appear, www.sci.utah.edu/~csilva/papers/ImplSorting.pdf.Google Scholar
- C. Layer and H.-J. Pfleiderer. A Reconfigurable Recurrent Bitonic Sorting Network for Concurrently Accessible Data. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), pages 648--657, 2004.Google Scholar
- R. Marcelino, H. Neto, and J. Cardoso. Sorting Units for FPGA-Based Embedded Systems. In Distributed Embedded Systems: Design, Middleware and Resources, volume 271 of IFIP International Federation for Information Processing, pages 11--22. Springer Boston, 2008.Google Scholar
- R. Marcelino, H. Neto, and J. Cardoso. Unbalanced FIFO Sorting for FPGA-Based Systems. In 16th IEEE International Conference on Electronics, Circuits, and Systems, (ICECS), pages 431--434, dec 2009.Google Scholar
- R. Mueller, J. Teubner, and G. Alonso. Data processing on fpgas. Proc. VLDB Endow., 2(1):910--921, 2009. Google Scholar
Digital Library
- Y. Seddiq, S. Alshebeili, S. Alhumaidi, and A. Obied. FPGA-Based Implementation of a CFAR Processor Using Batcher's Sort and LUT Arithmetic. In Design and Test Workshop (IDT), 2009 4th International, pages 1--6, nov. 2009.Google Scholar
Cross Ref
- S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. Scan Primitives for GPU Computing. In Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pages 97--106, Aire-la-Ville, Switzerland, Switzerland, 2007. Google Scholar
Digital Library
- The Unicode Consortium. About the Unicode Standard, 2010. http://www.unicode.org.Google Scholar
- S. Wong, S. Vassiliadis, and J. Hur. Parallel Merge Sort on a Binary Tree On-Chip Network. In Proceedings of the 16th Annual Workshop on Circuits, Systems and Signal Processing (ProRISC), pages 365--368, November 2005.Google Scholar
- Xilinx Inc. Partial Reconfiguration User Guide, May 2010. Rel 12.1.Google Scholar
Index Terms
FPGASort: a high performance sorting architecture exploiting run-time reconfiguration on fpgas for large problem sorting
Recommendations
Secure Extension of FPGA General Purpose Processors for Symmetric Key Cryptography with Partial Reconfiguration Capabilities
In data security systems, general purpose processors (GPPs) are often extended by a cryptographic accelerator. The article presents three ways of extending GPPs for symmetric key cryptography applications. Proposed extensions guarantee secure key ...
Design Assurance Strategy and Toolset for Partially Reconfigurable FPGA Systems
The growth of the Reconfigurable Computing (RC) systems community exposes diverse requirements with regard to functionality of Electronic Design Automation (EDA) tools. Low-level design tools are increasingly required for RC bitstream debugging and IP ...
A virtual VLSI architecture for computer hardware evolution
SAICSIT '10: Proceedings of the 2010 Annual Research Conference of the South African Institute of Computer Scientists and Information TechnologistsA System-On-Programmable-Chip (SOPC) is presented: the Virtual-FPGA (V-FPGA). It has been designed to ease the implementation of computer hardware evolution by giving direct access to the configuration bits. The V-FPGA is a second configuration layer ...





Comments