skip to main content
10.1145/1950413.1950427acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

FPGASort: a high performance sorting architecture exploiting run-time reconfiguration on fpgas for large problem sorting

Published:27 February 2011Publication History

ABSTRACT

This paper analyses different hardware sorting architectures in order to implement a highly scaleable sorter for solving huge problems at high performance up to the GB range in linear time complexity. It will be proven that a combination of a FIFO-based merge sorter and a tree-based merge sorter results in the best performance at low cost. Moreover, we will demonstrate how partial run-time reconfiguration can be used for saving almost half the FPGA resources or alternatively for improving the speed. Experiments show a sustainable sorting throughput of 2GB/s for problems fitting into the on-chip FPGA memory and 1 GB/s when using external memory. These values surpass the best published results on large problem sorting implementations on FPGAs, GPUs, and the Cell processor.

References

  1. P. Alfke. Take Advantage of Leftover Multipliers and Block RAMs. Xcell Journal, 2:48--49, 2001.Google ScholarGoogle Scholar
  2. K. E. Batcher. Sorting networks and their applications. In Proceedings of the April 30-May 2, 1968, spring joint computer conference (AFIPS 68), pages 307--314. ACM, 1968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Bednara, O. Beyer, J. Teich, and R. Wanka. Tradeoff Analysis and Architecture Design of a Hybrid Hardware/Software Sorter. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors (ASAP), pages 299--308. IEEE Computer Society, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Berkeley Wireless Research Center. BEEcube Homepage, 2010. http://www.beecube.com/platform.html.Google ScholarGoogle Scholar
  5. C. Claus, R. Ahmed, F. Altenried, and W. Stechele. Towards Rapid Dynamic Partial Reconfiguration in Video-Based Driver Assistance Systems. In Reconfigurable Computing: Architectures, Tools and Applications (ARCS), volume 5992 of LNCS, pages 55--67. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Gedik, R. R. Bordawekar, and P. S. Yu. CellSort: High Performance Sorting on the Cell Processor. In Proceedings of the 33rd international conference on Very large data bases (VLDB), pages 1286--1297. VLDB Endowment, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. K. Govindaraju, J. Gray, R. Kumar, and D. Manocha. GPUTeraSort: High Performance Graphics Coprocessor Sorting for Large Database Management. In Proceedings of the ACM international conference on management of data (SIGMOD), pages 325--336. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Graefe. Implementing sorting in database systems. ACM Comput. Surv., 38(3):10, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. K. Ha, J. Krüger, and C. T. Silva. Fast Four-Way Parallel Radix Sorting on GPUs. Comput. Graph. Forum, 28(8):2368--2378, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  10. J. K. L. Ha and C. Silva. Implicit radix sorting on GPUs, 2010. GPU GEMS volume 2, to appear, www.sci.utah.edu/~csilva/papers/ImplSorting.pdf.Google ScholarGoogle Scholar
  11. C. Layer and H.-J. Pfleiderer. A Reconfigurable Recurrent Bitonic Sorting Network for Concurrently Accessible Data. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), pages 648--657, 2004.Google ScholarGoogle Scholar
  12. R. Marcelino, H. Neto, and J. Cardoso. Sorting Units for FPGA-Based Embedded Systems. In Distributed Embedded Systems: Design, Middleware and Resources, volume 271 of IFIP International Federation for Information Processing, pages 11--22. Springer Boston, 2008.Google ScholarGoogle Scholar
  13. R. Marcelino, H. Neto, and J. Cardoso. Unbalanced FIFO Sorting for FPGA-Based Systems. In 16th IEEE International Conference on Electronics, Circuits, and Systems, (ICECS), pages 431--434, dec 2009.Google ScholarGoogle Scholar
  14. R. Mueller, J. Teubner, and G. Alonso. Data processing on fpgas. Proc. VLDB Endow., 2(1):910--921, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Seddiq, S. Alshebeili, S. Alhumaidi, and A. Obied. FPGA-Based Implementation of a CFAR Processor Using Batcher's Sort and LUT Arithmetic. In Design and Test Workshop (IDT), 2009 4th International, pages 1--6, nov. 2009.Google ScholarGoogle ScholarCross RefCross Ref
  16. S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. Scan Primitives for GPU Computing. In Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pages 97--106, Aire-la-Ville, Switzerland, Switzerland, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. The Unicode Consortium. About the Unicode Standard, 2010. http://www.unicode.org.Google ScholarGoogle Scholar
  18. S. Wong, S. Vassiliadis, and J. Hur. Parallel Merge Sort on a Binary Tree On-Chip Network. In Proceedings of the 16th Annual Workshop on Circuits, Systems and Signal Processing (ProRISC), pages 365--368, November 2005.Google ScholarGoogle Scholar
  19. Xilinx Inc. Partial Reconfiguration User Guide, May 2010. Rel 12.1.Google ScholarGoogle Scholar

Index Terms

  1. FPGASort: a high performance sorting architecture exploiting run-time reconfiguration on fpgas for large problem sorting

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      FPGA '11: Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
      February 2011
      300 pages
      ISBN:9781450305549
      DOI:10.1145/1950413

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 February 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate125of627submissions,20%

      Upcoming Conference

      FPGA '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader