skip to main content
research-article

Performance Scalability of Adaptive Processor Architecture

Published:11 April 2017Publication History
Skip Abstract Section

Abstract

In this article, we evaluate the performance scalability of architectures called adaptive processors, which dynamically configure an application-specific pipelined datapath and perform a data-flow streaming execution. Previous works have examined the basics of the following: (1) a computational model that supports the swap-in/out of a partial datapath—namely, a virtual hardware is realized by hardware, without a host processor and its software; (2) an architecture that has shown a minimum pipeline requirement and a minimum component requirement; and (3) the characteristics of the execution phase and a stack shift that realizes the swap-in/out. However, these works did not explore the design space, particularly with respect to the following: (1) the clock cycle time on the adaptive processor, which must depend on a wire delay that is primarily used for the global communication of requests, acknowledgments, acquirements, releases, and so forth, and (2) a revised control system that can handle the out-of-order acknowledgment and in-order acquirement that guarantee the correct datapath configuration with a conditional branch for the configurations. This article explores the scaling of the ALU resources versus pipelining of the wires.

References

  1. Altera, Corp. 2015a. Arria 10 Core Fabric and General Purpose I/Os Handbook.Google ScholarGoogle Scholar
  2. Altera, Corp. 2015b. Hyper-Retiming for Stratix 10 Designs.Google ScholarGoogle Scholar
  3. J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren. 1983. Conversion of control dependence to data dependence. In Proceedings of the 10th Symposium on Principles of Programming Languages (POPL’83). 177--189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Arvind and R. S. Nikhil. 1990. Executing a program on the MIT tagged-token dataflow architecture. IEEE Trans. Comput. 39, 3 (1990), 300--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. J. Brebner. 1996. A virtual hardware operating system for the xilinx XC6200. In Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers (FPL’96). Springer-Verlag, London, UK, 327--336. http://dl.acm.org/citation.cfm?id=647923.741195 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. J. Brebner. 1997. The swappable logic unit: A paradigm for virtual hardware. In Proceedings of the 5th Symposium on Field-Programmable Custom Computing Machines (FCCM’97). 77--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Burger, S. W. Keckler, K. S. McKinley, M. Dahlin, L. K. John, C. Lin, C. R. Moore, J. Burrill, R. G. McDonald, and W. Yoder. 2004. Scaling to the end of silicon with EDGE architectures. IEEE Comput. 37, 7 (2004), 44--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Compton, J. Cooley, S. Knol, and S. Hauck. 2002. Configuration relocation and defragmentation for FPGAs. IEEE Trans. VLSI 10, 3 (2002), 209--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. J. Denning. 1968. The working set model for program behavior. Commun. ACM 11, 5 (May 1968), 323--333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. GCC. 2016. GCC, the GNU Compiler Collection. https://gcc.gnu.org/.Google ScholarGoogle Scholar
  11. V. Govindaraju, C.-H. Ho, and K. Sankaralingam. 2011. Dynamically specialized datapaths for energy efficient computing. In Proceedings of the 17th International Conference on High Performance Computer Architecture (HPCA). 503--514. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. V. Govindaraju, T. Nowatzki, and K. Sankaralingam. 2013. Breaking SIMD shackles: Liberating accelerators by exposing flexible microarchitectural mechanisms. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT’13).Google ScholarGoogle Scholar
  13. T. R. Gross and J. L. Hennessy. 1982. Optimizing delayed branches. In Proceedings of the 15th Annual Workshop on Microprogramming. 114--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. R Gurd, C. C Kirkham, and I. Watson. 1985. The manchester prototype dataflow computer. Commun. ACM 28, 1 (January 1985), 34--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Hartstein and T. R. Puzak. 2002. The optimum pipeline depth for a microprocessor. In Proceedings of the 29th International Symposium on Computer Architecture (ISCA’02). 7--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. R. Helgemo. 2003. Digital signal processing at 1 GHz in a field-programmable object array. In Proceedings of the IEEE International SOC {Systems-on-Chip} Conference, 2003. 57--60.Google ScholarGoogle ScholarCross RefCross Ref
  17. R. C. Holt. 1972. Some deadlock properties of computer systems. Comput. Surv. 4, 3 (September 1972), 179--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. International Technology Roadmap for Semiconductors. 2007. ITRS.Google ScholarGoogle Scholar
  19. J. A. Jacob and P. Chow. 1999. Memory interfacing and instruction specification for reconfigurable processors. In Proceedings of ACM International Symposium on Field-Programmable Gate Arrays (FPGA’99). ACM, 145--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. I. Kuon and J. Rose. 2007. Measuring the gap between FPGAs and ASICs. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 26, 2 (February 2007), 203--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. LLVM. 2016. The LLVM Compiler Infrastructure. http://llvm.org/.Google ScholarGoogle Scholar
  22. D. Matzke. 1997. Will physical scalability sabotage performance gains? IEEE Comput. 30, 9 (1997), 37--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. NXP Semiconductors. 2010. AN10913 DSP Library for LPC1700 and LPC1300.Google ScholarGoogle Scholar
  24. K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang. 1996. The case for a single-chip multiprocessor. In Proceedings of the 7th International Symposium on Architectural Support for Parallel Languages and Operating Systems (ASPLOS’96). 2--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Palacharla, N. P. Jouppi, and J. E. Smith. 1997. Complexity-effective superscalar processors. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA’97). 206--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Parashar, M. Pellauer, M. Adler, B. Ahsan, N. Crago, D. Lustig, V. Pavlov, A. Zhai, M. Gambhir, A. Jaleel, R. Allmon, R. Rayess, S. Maresh, and J. Emer. 2013. Triggered instructions: A control paradigm for spatially-programmed architectures. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). ACM, 142--153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Recore Systems BV. 2007. Montium Reconfigurable Digital Signal Processing Tile Processor (TP).Google ScholarGoogle Scholar
  28. S. Rixner, W. J. Dally, B. Khailany, P. Mattson, U. J. Kapasi, and J. D. Owens. 2000. Register organization for media processing. In Proceedings of the 6th International Symposium on High-Performance Computer Architecture (HPCA’00). 375--386.Google ScholarGoogle Scholar
  29. R. M. Russell. 1978. The CRAY-1 computer system. Commun. ACM 21, 1 (January 1978), 63--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. Sankaralingam, R. Nagarajan, R. McDonald, R. Desikan, S. Drolia, M. S. Govindan, P. Gratz, D. Gulati, H. Hanson, C. Kim, H. Liu, N. Ranganathan, S. Sethumadhavan, S. Sharif, P. Shivakumar, S. W. Keckler, and D. Burger. 2006. Distributed microarchitectural protocols in the TRIPS prototype processor. In Proceedings of the 39th International Symposium on Microarchitecture (MICRO). 480--491. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Smith, R. Nagarajan, K. Sankaralingam, R. McDonald, D. Burger, S. W. Keckler, and K. S. McKinley. 2006. Dataflow predication. In Proceedings of the 39th International Symposium on Microarchitecture (MICRO). 89--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. I. E. Sutherland. 1989. Micropipelines. Commun. ACM 32, 6 (June 1989), 720--738. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Suzuki, Y. Hasegawa, Y. Yamada, N. Kaneko, K. Deguchi, H. Amano, K. Anjo, M. Motomura, K. Wakabayashi, T. Toi, and T. Awashima. 2004. Stream applications on the dynamically reconfigurable processor. In Proceedings of the 2004 IEEE International Conference on Field-Programmable Technology, 2004. 137--144.Google ScholarGoogle Scholar
  34. S. Swanson, K. Michelson, A Schwerin, and M. Oskin. 2003. Wavescalar. In Proceedings of the 36th International Symposium on Microarchitecture (MICRO’03). 291--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Takano. 2012. Design and analysis of adaptive processor. ACM Trans. Reconfig. Technol. Syst. (TRETS) 5, 1 (2012), 5:1--5:34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. B. Taylor, J. Psota, A Saraf, N. Shnidman, V. Strumpen, M. Frank, S. Amarasinghe, A. Agarwal, W. Lee, J. Miller, D. Wentzlaff, I. Bratt, B. Greenwald, H. Hoffmann, P. Johnson, and J. Kim. 2004. Evaluation of the raw microprocessor: An exposed-wire-delay architecture for ILP and streams. In Proceedings of the 31st International Symposium on Computer Architecture (ISCA’04). 2--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Tensilica. 2012. ConnX Vectra LX DSP Engine.Google ScholarGoogle Scholar
  38. Texas Instruments. 2012. Comparing TIs TMS320C6671 DSP with ADIs ADSP-TS201S TigerSHARC Processor.Google ScholarGoogle Scholar
  39. K. R. Traub. 1986. A Compiler for the MIT Tagged-Token Dataflow Architecture. Technical Report TR-370. Massachusetts Institute of Technology, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. D. Tullsen, S. Eggers, and H. Levy. 1995. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22rd Annual International Symposium on Computer Architecture (ISCA’95). 392--403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. G. S. Tyson. 1994. The effects of predicated execution on branch prediction. In Proceedings of the 27th Annual International Symposium on Microarchitecture (MICRO’94). 196--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Z. ul Abdin and B. Svensson. 2005. Compiling stream-language applications to a reconfigurable array processor. In Engineering of Reconfigurable Systems and Algorithms (ERSA’05). 274--275.Google ScholarGoogle Scholar
  43. D. Wentzlaff, P. Griffin, H. Hoffmann, Liewei Bao, B. Edwards, C. Ramey, M. Mattina, C.-C. Miao, J. F. Brown, and A. Agarwal. 2007. On-chip interconnection architecture of the tile processor. IEEE Micro 27, 5 (September 2007), 15--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. J. Wirthlin and B. L. Hutchings. 1995. A dynamic instruction set computer. In Proceedings of the Symposium on FPGAs for Custom Computing Machines (FCCM’95). 99--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. W. A. Wulf and S. A. McKee. 1995. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Comput. Arch. News 23, 1 (1995), 20--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Xilinx, Inc. 2015. Zynq-7000 All Programmable SoC Overview.Google ScholarGoogle Scholar

Index Terms

  1. Performance Scalability of Adaptive Processor Architecture

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Reconfigurable Technology and Systems
      ACM Transactions on Reconfigurable Technology and Systems  Volume 10, Issue 2
      Special Section on Field Programmable Logic and Applications 2015 and Regular Papers
      June 2017
      133 pages
      ISSN:1936-7406
      EISSN:1936-7414
      DOI:10.1145/3068424
      • Editor:
      • Steve Wilton
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 April 2017
      • Accepted: 1 October 2016
      • Revised: 1 September 2016
      • Received: 1 January 2016
      Published in trets Volume 10, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)7
      • Downloads (Last 6 weeks)1

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!