Abstract
In this article, we evaluate the performance scalability of architectures called adaptive processors, which dynamically configure an application-specific pipelined datapath and perform a data-flow streaming execution. Previous works have examined the basics of the following: (1) a computational model that supports the swap-in/out of a partial datapath—namely, a virtual hardware is realized by hardware, without a host processor and its software; (2) an architecture that has shown a minimum pipeline requirement and a minimum component requirement; and (3) the characteristics of the execution phase and a stack shift that realizes the swap-in/out. However, these works did not explore the design space, particularly with respect to the following: (1) the clock cycle time on the adaptive processor, which must depend on a wire delay that is primarily used for the global communication of requests, acknowledgments, acquirements, releases, and so forth, and (2) a revised control system that can handle the out-of-order acknowledgment and in-order acquirement that guarantee the correct datapath configuration with a conditional branch for the configurations. This article explores the scaling of the ALU resources versus pipelining of the wires.
- Altera, Corp. 2015a. Arria 10 Core Fabric and General Purpose I/Os Handbook.Google Scholar
- Altera, Corp. 2015b. Hyper-Retiming for Stratix 10 Designs.Google Scholar
- J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren. 1983. Conversion of control dependence to data dependence. In Proceedings of the 10th Symposium on Principles of Programming Languages (POPL’83). 177--189. Google Scholar
Digital Library
- Arvind and R. S. Nikhil. 1990. Executing a program on the MIT tagged-token dataflow architecture. IEEE Trans. Comput. 39, 3 (1990), 300--318. Google Scholar
Digital Library
- G. J. Brebner. 1996. A virtual hardware operating system for the xilinx XC6200. In Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers (FPL’96). Springer-Verlag, London, UK, 327--336. http://dl.acm.org/citation.cfm?id=647923.741195 Google Scholar
Digital Library
- G. J. Brebner. 1997. The swappable logic unit: A paradigm for virtual hardware. In Proceedings of the 5th Symposium on Field-Programmable Custom Computing Machines (FCCM’97). 77--86. Google Scholar
Digital Library
- D. Burger, S. W. Keckler, K. S. McKinley, M. Dahlin, L. K. John, C. Lin, C. R. Moore, J. Burrill, R. G. McDonald, and W. Yoder. 2004. Scaling to the end of silicon with EDGE architectures. IEEE Comput. 37, 7 (2004), 44--55. Google Scholar
Digital Library
- K. Compton, J. Cooley, S. Knol, and S. Hauck. 2002. Configuration relocation and defragmentation for FPGAs. IEEE Trans. VLSI 10, 3 (2002), 209--220. Google Scholar
Digital Library
- P. J. Denning. 1968. The working set model for program behavior. Commun. ACM 11, 5 (May 1968), 323--333. Google Scholar
Digital Library
- GCC. 2016. GCC, the GNU Compiler Collection. https://gcc.gnu.org/.Google Scholar
- V. Govindaraju, C.-H. Ho, and K. Sankaralingam. 2011. Dynamically specialized datapaths for energy efficient computing. In Proceedings of the 17th International Conference on High Performance Computer Architecture (HPCA). 503--514. Google Scholar
Digital Library
- V. Govindaraju, T. Nowatzki, and K. Sankaralingam. 2013. Breaking SIMD shackles: Liberating accelerators by exposing flexible microarchitectural mechanisms. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT’13).Google Scholar
- T. R. Gross and J. L. Hennessy. 1982. Optimizing delayed branches. In Proceedings of the 15th Annual Workshop on Microprogramming. 114--120. Google Scholar
Digital Library
- J. R Gurd, C. C Kirkham, and I. Watson. 1985. The manchester prototype dataflow computer. Commun. ACM 28, 1 (January 1985), 34--52. Google Scholar
Digital Library
- A. Hartstein and T. R. Puzak. 2002. The optimum pipeline depth for a microprocessor. In Proceedings of the 29th International Symposium on Computer Architecture (ISCA’02). 7--13. Google Scholar
Digital Library
- D. R. Helgemo. 2003. Digital signal processing at 1 GHz in a field-programmable object array. In Proceedings of the IEEE International SOC {Systems-on-Chip} Conference, 2003. 57--60.Google Scholar
Cross Ref
- R. C. Holt. 1972. Some deadlock properties of computer systems. Comput. Surv. 4, 3 (September 1972), 179--196. Google Scholar
Digital Library
- International Technology Roadmap for Semiconductors. 2007. ITRS.Google Scholar
- J. A. Jacob and P. Chow. 1999. Memory interfacing and instruction specification for reconfigurable processors. In Proceedings of ACM International Symposium on Field-Programmable Gate Arrays (FPGA’99). ACM, 145--154. Google Scholar
Digital Library
- I. Kuon and J. Rose. 2007. Measuring the gap between FPGAs and ASICs. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 26, 2 (February 2007), 203--215. Google Scholar
Digital Library
- LLVM. 2016. The LLVM Compiler Infrastructure. http://llvm.org/.Google Scholar
- D. Matzke. 1997. Will physical scalability sabotage performance gains? IEEE Comput. 30, 9 (1997), 37--39. Google Scholar
Digital Library
- NXP Semiconductors. 2010. AN10913 DSP Library for LPC1700 and LPC1300.Google Scholar
- K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang. 1996. The case for a single-chip multiprocessor. In Proceedings of the 7th International Symposium on Architectural Support for Parallel Languages and Operating Systems (ASPLOS’96). 2--11. Google Scholar
Digital Library
- S. Palacharla, N. P. Jouppi, and J. E. Smith. 1997. Complexity-effective superscalar processors. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA’97). 206--218. Google Scholar
Digital Library
- A. Parashar, M. Pellauer, M. Adler, B. Ahsan, N. Crago, D. Lustig, V. Pavlov, A. Zhai, M. Gambhir, A. Jaleel, R. Allmon, R. Rayess, S. Maresh, and J. Emer. 2013. Triggered instructions: A control paradigm for spatially-programmed architectures. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). ACM, 142--153. Google Scholar
Digital Library
- Recore Systems BV. 2007. Montium Reconfigurable Digital Signal Processing Tile Processor (TP).Google Scholar
- S. Rixner, W. J. Dally, B. Khailany, P. Mattson, U. J. Kapasi, and J. D. Owens. 2000. Register organization for media processing. In Proceedings of the 6th International Symposium on High-Performance Computer Architecture (HPCA’00). 375--386.Google Scholar
- R. M. Russell. 1978. The CRAY-1 computer system. Commun. ACM 21, 1 (January 1978), 63--72. Google Scholar
Digital Library
- K. Sankaralingam, R. Nagarajan, R. McDonald, R. Desikan, S. Drolia, M. S. Govindan, P. Gratz, D. Gulati, H. Hanson, C. Kim, H. Liu, N. Ranganathan, S. Sethumadhavan, S. Sharif, P. Shivakumar, S. W. Keckler, and D. Burger. 2006. Distributed microarchitectural protocols in the TRIPS prototype processor. In Proceedings of the 39th International Symposium on Microarchitecture (MICRO). 480--491. Google Scholar
Digital Library
- A. Smith, R. Nagarajan, K. Sankaralingam, R. McDonald, D. Burger, S. W. Keckler, and K. S. McKinley. 2006. Dataflow predication. In Proceedings of the 39th International Symposium on Microarchitecture (MICRO). 89--102. Google Scholar
Digital Library
- I. E. Sutherland. 1989. Micropipelines. Commun. ACM 32, 6 (June 1989), 720--738. Google Scholar
Digital Library
- M. Suzuki, Y. Hasegawa, Y. Yamada, N. Kaneko, K. Deguchi, H. Amano, K. Anjo, M. Motomura, K. Wakabayashi, T. Toi, and T. Awashima. 2004. Stream applications on the dynamically reconfigurable processor. In Proceedings of the 2004 IEEE International Conference on Field-Programmable Technology, 2004. 137--144.Google Scholar
- S. Swanson, K. Michelson, A Schwerin, and M. Oskin. 2003. Wavescalar. In Proceedings of the 36th International Symposium on Microarchitecture (MICRO’03). 291--302. Google Scholar
Digital Library
- S. Takano. 2012. Design and analysis of adaptive processor. ACM Trans. Reconfig. Technol. Syst. (TRETS) 5, 1 (2012), 5:1--5:34. Google Scholar
Digital Library
- M. B. Taylor, J. Psota, A Saraf, N. Shnidman, V. Strumpen, M. Frank, S. Amarasinghe, A. Agarwal, W. Lee, J. Miller, D. Wentzlaff, I. Bratt, B. Greenwald, H. Hoffmann, P. Johnson, and J. Kim. 2004. Evaluation of the raw microprocessor: An exposed-wire-delay architecture for ILP and streams. In Proceedings of the 31st International Symposium on Computer Architecture (ISCA’04). 2--13. Google Scholar
Digital Library
- Tensilica. 2012. ConnX Vectra LX DSP Engine.Google Scholar
- Texas Instruments. 2012. Comparing TIs TMS320C6671 DSP with ADIs ADSP-TS201S TigerSHARC Processor.Google Scholar
- K. R. Traub. 1986. A Compiler for the MIT Tagged-Token Dataflow Architecture. Technical Report TR-370. Massachusetts Institute of Technology, Cambridge, MA. Google Scholar
Digital Library
- D. Tullsen, S. Eggers, and H. Levy. 1995. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22rd Annual International Symposium on Computer Architecture (ISCA’95). 392--403. Google Scholar
Digital Library
- G. S. Tyson. 1994. The effects of predicated execution on branch prediction. In Proceedings of the 27th Annual International Symposium on Microarchitecture (MICRO’94). 196--206. Google Scholar
Digital Library
- Z. ul Abdin and B. Svensson. 2005. Compiling stream-language applications to a reconfigurable array processor. In Engineering of Reconfigurable Systems and Algorithms (ERSA’05). 274--275.Google Scholar
- D. Wentzlaff, P. Griffin, H. Hoffmann, Liewei Bao, B. Edwards, C. Ramey, M. Mattina, C.-C. Miao, J. F. Brown, and A. Agarwal. 2007. On-chip interconnection architecture of the tile processor. IEEE Micro 27, 5 (September 2007), 15--31. Google Scholar
Digital Library
- M. J. Wirthlin and B. L. Hutchings. 1995. A dynamic instruction set computer. In Proceedings of the Symposium on FPGAs for Custom Computing Machines (FCCM’95). 99--107. Google Scholar
Digital Library
- W. A. Wulf and S. A. McKee. 1995. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Comput. Arch. News 23, 1 (1995), 20--24. Google Scholar
Digital Library
- Xilinx, Inc. 2015. Zynq-7000 All Programmable SoC Overview.Google Scholar
Index Terms
Performance Scalability of Adaptive Processor Architecture
Recommendations
High-Performance and Low-Cost Dual-Thread VLIW Processor Using Weld Architecture Paradigm
This paper presents a cost-effective and high-performance dual-thread VLIW processor model. The dual-thread VLIW processor model is a low-cost subset of the Weld architecture paradigm. It supports one main thread and one speculative thread running ...
A high performance processor architecture for multimedia applications
In this paper, an efficient sub-word parallelism (SWP)-enabled Reduced instruction-set Computer (RISC) architecture is proposed. The proposed architecture can perform efficiently for both conventional and multimedia-oriented applications. Speed-up for ...
The Superthreaded Processor Architecture
The common single-threaded execution model limits processors to exploiting only the relatively small amount of instruction-level parallelism available in application programs. The superthreaded processor, on the other hand, is a concurrent multithreaded ...






Comments