Abstract
As on-chip transistor counts increase, the computing landscape has shifted to multi- and many-core devices. Computational accelerators have adopted this trend by incorporating both fixed and reconfigurable many-core and multi-core devices. As more, disparate devices enter the market, there is an increasing need for concepts, terminology, and classification techniques to understand the device tradeoffs. Additionally, computational performance, memory performance, and power metrics are needed to objectively compare devices. These metrics will assist application scientists in selecting the appropriate device early in the development cycle. This article presents a hierarchical taxonomy of computing devices, concepts and terminology describing reconfigurability, and computational density and internal memory bandwidth metrics to compare devices.
- Altera Corp. 2007a. Stratix II Device Handbook. Altera Corp.Google Scholar
- Altera Corp. 2007b. Stratix III Device Handbook. Altera Corp.Google Scholar
- Altera Corp. 2008. Stratix IV Device Handbook. Altera Corp.Google Scholar
- AMD, Inc. 2008. Key architectural features amd athlon x2 dual-core processors. http://www.amd.com/us-en/Processors/ProductInformation/ 0,,30_118_9485_130415E13043,00.html.Google Scholar
- Barton, M. 2007. Tilera’s cores communicate better. Microprocess. Rep.Google Scholar
- BittWare, Inc. 2008. B2-AMC Data Sheet. BittWare, Inc.Google Scholar
- Bondalapati, K. and Prasanna, V. K. 2002. Reconfigurable computing systems. Proc. IEEE 90, 7, 1201--1217.Google Scholar
Cross Ref
- Burger, D., Goodman, J. R., and Kägi, A. 1996. Memory bandwidth limitations of future microprocessors. In Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA’96). ACM, New York, 78--89. Google Scholar
Digital Library
- Chen, T., Raghavan, R., Dale, J. N., and Iwata, E. 2007. Cell broadband engine architecture and its first implementation: A performance view. IBM J. Res. Devel. 51, 5, 559--572. Google Scholar
Digital Library
- ClearSpeed Technology PLC 2007. CSX600 Architecture Whitepaper. ClearSpeed Technology PLC.Google Scholar
- Compton, K. and Hauck, S. 2002. Reconfigurable computing: A survey of systems and software. ACM Comp. Surv. 34, 2, 171--210. Google Scholar
Digital Library
- DeHon, A. 1996. Reconfigurable architectures for general-purpose computing. Tech. rep., Massachusetts Institute of Technology, Cambridge, MA. Google Scholar
Digital Library
- ElementCXI, Inc. 2007a. ECA-64 Device architecture overview. ElementCXI, Inc.Google Scholar
- ElementCXI, Inc. 2007b. ECA-64 Product brief. ElementCXI, Inc.Google Scholar
- Flynn, M. J. 1966. Very high-speed computing systems. Proc. IEEE 54, 12, 1901--1909.Google Scholar
Cross Ref
- Freescale Semiconductor, Inc. 2005. MPC7450 RISC Microprocessor Family Reference Manual Rev. 5. Freescale Semiconductor, Inc.Google Scholar
- Freescale Semiconductor, Inc. 2006. Altivec Technology Programming Environments Manual Rev. 3. Freescale Semiconductor, Inc.Google Scholar
- Freescale Semiconductor, Inc. 2008. MPC8641D Integrated Host Processor Family Reference Manual Rev. 2. Freescale Semiconductor, Inc.Google Scholar
- Guccione, S. and Gonzalez, M. J. 1995. Classification and performance of reconfigurable architectures. In Proceedings of the 5th International Workshop on Field-Programmable Logic and Applications (FPL’95). Springer-Verlag, Berlin, Germany, 439--448. Google Scholar
Digital Library
- Hennessy, J. L. and Patterson, D. A. 2007. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Francisco, CA. Google Scholar
Digital Library
- Hester, P. 2006. 2006 technology analyst day. http://www.amd.com/us-en/assets/content_type/DownloadableAssets/ PhilHesterAMDAnalystDayV2.pdf.Google Scholar
- IBM Corp. 2008. PowerXCell 8i processor specifications. IBM Corp.Google Scholar
- Intel Corp. 2000. Intel netburst architecture. http://www.intel.com/software/products/ documentation/vlin/mergedprojects/analyzer_ec/mergedprojects/ reference_olh/reference_hh/inbma.htm.Google Scholar
- Intel Corp. 2006. Inside Intel Core Microarchitecture. Intel Corp.Google Scholar
- Intel Corp. 2008a. Intel Architecture Software Developer’s Manual, Vol. 1: Basic Architecture. Intel Corp.Google Scholar
- Intel Corp. 2008b. Intel xeon processor 7041. http://processorfinder.intel.com/Details.aspx?sSpec=SL8UD.Google Scholar
- Intel Corp. 2008c. Intel xeon processor x3230. http://processorfinder.intel.com/Details.aspx?sSpec=SLACS.Google Scholar
- Intel Corp. 2008d. Mobile Intel Atom Processor N270 single core datasheet. Intel Corp.Google Scholar
- Intel Corp. 2008e. Product brief Intel Xeon processor 3000 sequence. Intel Corp.Google Scholar
- Lewins, L., Prager, K., Groves, G., and Vahey, M. 2007. World’s first polymorphic computer - monarch. In Proceedings of the 11th Annual High-Performance Embedded Computing Workshop.Google Scholar
- Mathstar, Inc. 2007a. Arrix Family FPOA Architecture Guide. Mathstar, Inc.Google Scholar
- Mathstar, Inc. 2007b. Arrix Family Product Data Sheet and Design Guide. Mathstar, Inc.Google Scholar
- Nvidia Corp. 2006. Nvidia GeForce 8800 GPU architecture overview. Nvidia Corp.Google Scholar
- Nvidia Corp. 2007. Nvidia CUDA Compute Unified Device Architecture Programming Guide. Nvidia Corp.Google Scholar
- Nvidia Corp. 2008. Nvidia tesla c870 specifications. http://www.nvidia.com/object/tesla_c870.html.Google Scholar
- Radunovic, B. and Milutinovic, V. M. 1998. A survey of reconfigurable computing architectures. In Proceedings of the 8th International Workshop on Field-Programmable Logic and Applications (FPL’98). Springer-Verlag, Berlin, Germany, 376--385. Google Scholar
Digital Library
- Saulsbury, A., Pong, F., and Nowatzyk, A. 1996. Missing the memory wall: The case for processor/memory integration. In Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA’96). ACM, New York, 90--101. Google Scholar
Digital Library
- Sawitzki, S. and Spallek, R. G. 1999. A concept for an evaluation framework for reconfigurable systems. In Proceedings of the 9th International Workshop on Field-Programmable Logic and Applications (FPL’99). Springer-Verlag, Berlin, Germany, 475--480. Google Scholar
Digital Library
- Shimpi, A. L. 2008. Intel’s silverthorne unveiled: Detailing baby centrino. http://www.anandtech.com/showdoc.aspx?i=3230&p=3.Google Scholar
- Sima, M., Vassiliadis, S., Cotofana, S., van Eijndhoven, J. T. J., and Vissers, K. A. 2002. Field-programmable custom computing machines - A taxonomy. In Proceedings of the 12th International Conference on Field-Programmable Logic and Applications (FPL’02). Springer-Verlag, Berlin, Germany, 79--88. Google Scholar
Digital Library
- Sohi, G. S. and Franklin, M. 1991. High-bandwidth data memory systems for superscalar processors. SIGOPS Operat. Syst. Rev. 25 (Special Issue), 53--62. Google Scholar
Digital Library
- Strenski, D. 2007. Fpga floating point performance -- a pencil and paper evaluation. HPC Wire.Google Scholar
- Tilera Corp. 2008. TILE64 processor product brief. Tilera Corp.Google Scholar
- Underwood, K. D. and Hemmert, K. S. 2004. Closing the gap: CPU and FPGA trends in sustainable floating-point blas performance. In Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’04). IEEE Computer Society, Los Alamistos, CA, 219--228. Google Scholar
Digital Library
- Wang, D. T. 2005. ISSCC 2005: The cell microprocessor. http://www.realworldtech.com/page.cfm?ArticleID=RWT021005084318&p=2.Google Scholar
- Wulf, W. A. and McKee, S. A. 1995. Hitting the memory wall: Implications of the obvious. SIGARCH Comput. Architect. News 23, 1, 20--24. Google Scholar
Digital Library
- X-bit Laboratories 2006. AMD’s next generation microarchitecture preview: From k8 to k8l. http://www.xbitlabs.com/articles/cpu/display/amd-k8l.html.Google Scholar
- X-bit Laboratories 2007. AMD K10 micro-architecture. http://www.xbitlabs.com/articles/cpu/display/amd-k10.html.Google Scholar
- Xilinx, Inc. 2007. Virtex-4 family overview. Xilinx, Inc.Google Scholar
- Xilinx, Inc. 2008. Virtex-5 family overview. Xilinx, Inc.Google Scholar
Index Terms
Characterization of Fixed and Reconfigurable Multi-Core Devices for Application Acceleration
Recommendations
An application-centric evaluation of OpenCL on multi-core CPUs
Although designed as a cross-platform parallel programming model, OpenCL remains mainly used for GPU programming. Nevertheless, a large amount of applications are parallelized, implemented, and eventually optimized in OpenCL. Thus, in this paper, we ...
GPU Acceleration for Simulating Massively Parallel Many-Core Platforms
Emerging massively parallel architectures such as a general-purpose processor plus many-core programmable accelerators are creating an increasing demand for novel methods to perform their architectural simulation. Most state-of-the-art simulation ...
Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil
Wave propagation forward modeling is a widely used computational method in oil and gas exploration. The iterative stencil loops in such problems have broad applications in scientific computing. However, executing such loops can be highly time-consuming, ...






Comments