skip to main content
research-article
Public Access

Analysis of Fixed, Reconfigurable, and Hybrid Devices with Computational, Memory, I/O, & Realizable-Utilization Metrics

Authors Info & Claims
Published:24 September 2016Publication History
Skip Abstract Section

Abstract

The modern processor landscape is a varied and diverse community. As such, developers need a way to quickly and fairly compare various devices for use with particular applications. This article expands the authors’ previously published computational-density metrics and presents an analysis of a new generation of various device architectures, including CPU, DSP, FPGA, GPU, and hybrid architectures. Also, new memory metrics are added to expand the existing suite of metrics to characterize the memory resources on various processing devices. Finally, a new relational metric, realizable utilization (RU), is introduced, which quantifies the fraction of the computational density metric that an application achieves within an individual implementation. The RU metric can be used to provide valuable feedback to application developers and architecture designers by highlighting the upper bound on specific application optimization and providing a quantifiable measure of theoretical and realizable performance. Overall, the analysis in this article quantifies the performance tradeoffs among the architectures studied, the memory characteristics of different device types, and the efficiency of device architectures.

Skip Supplemental Material Section

Supplemental Material

References

  1. S. Aarseth and S. J. Aarseth. 2003. Gravitational N-Body Simulations. Cambridge University Press. http://books.google.com/books?id=Xo8eaQzs0YoC. Google ScholarGoogle ScholarCross RefCross Ref
  2. A. Athavale and C. Christensen. 2005. High-Speed Serial I/O Made Simple, A Designers’ Guide, with FPGA Applications. Xilinx Connectivity Solutions.Google ScholarGoogle Scholar
  3. D. H. Bailey. 1995. Proceedings of the Seventh Siam Conference on Parallel Processing for Scientific Computing. Siam. http://books.google.com/books?id=FgDYbavV-R4C. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Sergio Barrachina, Maribel Castillo, Francisco Igual, Rafael Mayo, and Enrique Quintana-Ort. 2008. Solving dense linear systems on graphics processors. In Euro-Par 2008 Parallel Processing, Emilio Luque, T. Margalef, and Domingo Bentez (Eds.). Lecture Notes in Computer Science, Vol. 5168. Springer, Berlin, 739--748. http://dx.doi.org/10.1007/978-3-540-85451-7_79 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sergio Barrachina, Maribel Castillo, Francisco D. Igual, Rafael Mayo, Enrique S. Quintana-Orti, and Gregorio Quintana-Orti. June 7, 2009. Exploiting the capabilities of modern GPUs for dense matrix computations. Concurrency and Computation: Practice and Experience (June 7, 2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bathan Bell and Michael Garland. 2009. Implementing sparse matrix-vector multiplication on throughput-oriented processors. High Performance Computing, Networking, Storage and Analysis, 2009. SC 2009. International Conference for (November 2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Robert G. Belleman, Jeroen Bedorf, and Simon F. Portegies Zwart. 2008. High performance direct gravitational n-body simulations on graphics processing units II: An implementation in CUDA. New Astron. 13 (Feb. 2008). Issue 2.Google ScholarGoogle Scholar
  8. Bhaskar. 2006. Applied Mathematical Methods. Pearson Education. http://books.google.com/books?id=D4DA7rWWWPYC.Google ScholarGoogle Scholar
  9. Doug Burger, James R. Goodman, and Alain Kägi. 1996. Memory bandwidth limitations of future microprocessors. In ISCA’96: Proceedings of the 23rd Annual International Symposium on Computer Architecture. ACM, New York, NY, 78--89. DOI:http://dx.doi.org/10.1145/232973.232983 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jose M. Cecilia. The GPU on the Matrix-Matrix Multiply: Performance Study and Contributions.Google ScholarGoogle Scholar
  11. Andre DeHon. 1996. Reconfigurable Architectures for General-Purpose Computing. Technical Report. Massachusetts Institute of Technology, Cambridge, MA, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. J. Dongarra, Society for Industrial, and Applied Mathematics. 1979. Linpack Users’ Guide. Society for Industrial and Applied Mathematics. http://books.google.com/books?id=AmSm1n3Vw0cC.Google ScholarGoogle Scholar
  13. Tsuyoshi Hamada and Toshiaki Iitaka. March 5, 2007. The chamomile scheme: An optimized algorithm for n-body simulations on programmable graphics processing units. NewAstron. (March 5, 2007).Google ScholarGoogle Scholar
  14. Intel. 2014. Intel math kernel library reference manual. 072, MKL 11.2 (2014). https://software.intel.com/en-us/mkl_11.2_ref_pdf.Google ScholarGoogle Scholar
  15. P. Lancaster and M. Tismenetsky. 1985. The Theory of Matrices: With Applications. Academic Press. http://books.google.com/books?id=m8z6Xh1A3t8C.Google ScholarGoogle Scholar
  16. Andrew Milluzzi, Justin Richardson, Alan George, and Herman Lam. 2014. A multi-tiered optimization framework for heterogeneous computing. IEEE Proc. of High-Performance Extreme Computing Conference (September 2014). Google ScholarGoogle ScholarCross RefCross Ref
  17. NVidia. 2010a. NVidia SDK Core. http://developer.nvidia.com/cuda-toolkit. (2010).Google ScholarGoogle Scholar
  18. NVidia. 2010b. NVidia SDK DirectCompute Core. http://developer.nvidia.com/cuda-toolkit. (2010).Google ScholarGoogle Scholar
  19. NVidia. 2015. CUBLAS LIBRARY. 7.0 (2015). http://docs.nvidia.com/cuda/pdf/CUBLAS_Library.pdf.Google ScholarGoogle Scholar
  20. Lars Nyland, Mark Harris, and Jan Prins. 2007. Fast n-body simulation with CUDA. GPU Gems 3 (2007).Google ScholarGoogle Scholar
  21. J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips. 2008. GPU computing. Proc. IEEE 96, 5 (may 2008), 879--899. DOI:http://dx.doi.org/10.1109/JPROC.2008.917757 Google ScholarGoogle ScholarCross RefCross Ref
  22. Shane Ryoo, Christopher I. Rodrigues, Sara S. Baghsorkhi, Sam S. Stone, David B. Kirk, and Wen-mei W. Hwu. 2008. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’08). ACM, New York, NY, 73--82. DOI:http://dx.doi.org/10.1145/1345206.1345220 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ashley Saulsbury, Fong Pong, and Andreas Nowatzyk. 1996. Missing the memory wall: The case for processor/memory integration. In ISCA’96: Proceedings of the 23rd Annual International Symposium on Computer Architecture. ACM, New York, NY, 90--101. DOI:http://dx.doi.org/10.1145/232973.232984 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Gurindar S. Sohi and Manoj Franklin. 1991. High-bandwidth data memory systems for superscalar processors. SIGOPS Operat. Syst. Rev. 25, Special Issue (1991), 53--62. DOI:http://dx.doi.org/10.1145/106974.106980 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Vasily Volkov and James Demmel. 2008a. LU, QR and Cholesky Factorizations using Vector Capabilities of GPUs. http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-49.html. (May 2008).Google ScholarGoogle Scholar
  26. Vasily Volkov and James W. Demmel. Nov. 15--21, 2008b. Benchmarking GPUs to tune dense linear algebra. High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008. International Conference for (Nov. 15--21, 2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Clint Whaley, Antoine Petitet, and Jack J. Dongarra. 2001. Automated empirical optimization of software and the ATLAS project. Parallel Comput. 27, 1--2 (2001), 3--35. Also available as University of Tennessee LAPACK Working Note #147, UT-CS-00-448, 2000 (www.netlib.org/lapack/lawns/lawn147.ps).Google ScholarGoogle ScholarCross RefCross Ref
  28. J. Williams, A. George, J. Richardson, K. Gosrani, C. Massie, and H. Lam. 2011. Characterization of fixed and reconfigurable multi-core devices for application acceleration. ACM Trans. Reconfig. Technol. Syst. 3, 4 (2011), 19:1--19:29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Williams, A. George, J. Richardson, K. Gosrani, and S. Suresh. July 7--10, 2008a. Computational density of fixed and reconfigurable multi-core devices for application acceleration. Proc. of Reconfigurable Systems Summer Institute 2008 (RSSI) (July 7--10, 2008).Google ScholarGoogle Scholar
  30. J. Williams, A. George, J. Richardson, K. Gosrani, and S. Suresh. Sep. 23--25, 2008b. Fixed and reconfigurable multi-core device characterization for HPEC. Proc. of High-Performance Embedded Computing Workshop (HPEC) (Sep. 23--25, 2008).Google ScholarGoogle Scholar

Index Terms

  1. Analysis of Fixed, Reconfigurable, and Hybrid Devices with Computational, Memory, I/O, & Realizable-Utilization Metrics

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Reconfigurable Technology and Systems
            ACM Transactions on Reconfigurable Technology and Systems  Volume 10, Issue 1
            March 2017
            206 pages
            ISSN:1936-7406
            EISSN:1936-7414
            DOI:10.1145/3002131
            • Editor:
            • Steve Wilton
            Issue’s Table of Contents

            Copyright © 2016 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 24 September 2016
            • Revised: 1 January 2016
            • Accepted: 1 January 2016
            • Received: 1 March 2015
            Published in trets Volume 10, Issue 1

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!