skip to main content
research-article

Design and analysis of adaptive processor

Published:23 March 2012Publication History
Skip Abstract Section

Abstract

A new computation model called CACHE (Cache Architecture for Configurable Hardware Engine) is proposed in this paper. This model does not require a dedicated host processor and its software to harness the reconfiguration. Autonomous reconfiguration is performed within a working-set of application datapaths. The CACHE model has lots of side effects; caching, resource allocation and assignment, placement and routing, and defragmentation, with a processing array itself and a special register called a working-set register file. The model aims to reduce three major workloads: (1) the processor and application design workload, (2) runtime resource management and scheduling workload, and (3) reconfiguration workload. In order to reduce these workloads, processor architecture is definitely different from traditional computing model and its microprocessor architecture. There are three major ideas to construct the computing system: (1) an on-chip working-set model mainly in order to control load and store of streams, namely to control traffics introducing overheads, (2) an on-chip deadlock properties model mainly in order to manage resources and to continuously configure datapaths corresponding to a working-set window, (3) a cache memory technique to work for these models, the mechanism is equivalent to the working-set window, and the cache memory's procedure is equivalent to resource request, acquirement, and release of deadlock properties. The first model focuses onto streaming applications, for example vector and matrix operations, filters, and so on, which takes coarser grained operations such as integer operations of C-language. Regarding performance compared with DSPs, that comes from constant throughput across different scale of the applications. In addition, extended model, we call Instant model that automatically generates instance of a datapath, outperforms the DSPs. This paper shows its computation model, architecture, low-level design, and analyses about basic characteristics of the execution.

References

  1. Ainsworth, T. W. and Pinkston, T. M. 2007. Characterizing the cell eib on-chip network. IEEE Micro 27, 5, 6--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Asaovic, K. 1998. Vector microprocessors. Ph.D. thesis, University of California, Berkeley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bobda, C. 2007. Introduction to Reconfigurable Computing: Architectures, Algorithms, and Applications. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bondalapati, K. and Prasanna, V. K. 2002. Reconfigurable computing systems. Proc. IEEE. 1201--1217.Google ScholarGoogle Scholar
  5. Brebner, G. 1996. A virtual hardware operating system for the Xilinx XC6200. In Proceedings of the 6th International Workshop on Field-Programmable Logic and Applications (FPL'96). Springer, 327--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Briggs, P. 1992. Register allocation via graph coloring. Ph.D. thesis, Rice University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Brown, S. D., Francis, R., Rose, J., and Vranesic, Z. 1992. Field-Programmable Gate Arrays. Kluwer Academic Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Buell, D., El-Ghazawi, T., Gai, K., and Kindratenko, V. 2007. High-performance reconfigurable computing. IEEE Comput. 40, 3, 23--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Burns, J., Donlin, A., Hogg, L, Singh, S., and De Wit, M. 1997. A dynamic reconfiguration run-time system. In Proceedings of the 5th Annual IEEE Symposium on FPGAs for Custom Computing Machines. IEEE Computer Society Press, 66--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chaitin, G. 2004. Register allocation and spilling via graph coloring. SIGPLAN Not. 39, 4, 66--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chen, G., Li, F., Son, S., and Kandemir, M. 2008. Application mapping for chip multiprocessors. In Proceedings of the 45th Design Automation Conference (DAC'08). ACM/IEEE. 620--625. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Compton, K., Cooley, L, Knol, S., and Hauck, S. 2002. Configuration relocation and defragmentation for fpgas. IEEE Trans. VLSI 10, 3, 209--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. DeHon, A. 1996. Reconfigurable architectures for general-purpose computing. Tech. rep. Massachusetts Institute of Technology Artificial Intelligence Laboratory. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Denning, P. J. 1968. The working set model for program behavior. Comm. ACM 11, 5, 323--333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Espasa, R. 1997. Advanced vector microprocessors. Ph.D. thesis, Universitat Po1itecnica de Catalunya.Google ScholarGoogle Scholar
  16. Espasa, R., Valero, M., Padua, D., and Jimenez, M. 1995. Quantitative analysis of vector code. In Proceedings of the Euromicro Workshop on Parallel and Distributed Processing (PDP'95). IEEE Computer Society Press, 452--461. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hammond, L., Nayfeh, B. A., and Olukotun, K. 1997. A single-chip multiprocessor. Comput. 30, 9, 79--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hauser, J. and Wawrzynek, J. 1997. Garp: A mips processor with a reconfigurable coprocessor. In Proceedings of the 5th IEEE Symposium on FPGAs for Custom Computing Machines (FCCM'97). IEEE Computer Society, Los Alamitos, CA, 12--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Holt, R. C. 1972. Some deadlock properties of computer systems. ACM Comput. Surv. 4, 3, 179--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Howard, J., Dighe, S., et al. 2011. A 48-core ia-32 processor in 45 nm cmos using on-die message-passing and dvfs for performance and power scaling. IEEE J. Solid-State Circ. 46, 1 173--183.Google ScholarGoogle ScholarCross RefCross Ref
  21. Huang, I.-J. and Peng, T.-C. 2002. Analysis of x86 instruction set usage for dos/windows application and its implication on superscalar design. IEICE Trans. Inf. Syst. E85-D, 6, 929--939.Google ScholarGoogle Scholar
  22. Khailany, B., Dally, W. J., Rixner, S., Kapasi, U. J., Mattson, P., Namkoong, J., Owens, J. D., Towles, B., and Chang, A. 2001. Imagine: Media processing with streams. IEEE Micro 21, 2, 35--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kozyrakis, C. 1999. A media-enhanced vector architecture for embedded memory systems. Tech. rep. UCB-CSD-99-1059, University of California, Berkeley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ludden, J. M., Roesner, W., et al. 2002. Functional verification of the power4 microprocessor and power4 multiprocessor systems. IBM J. Resear. Devel. 46, 1, 53--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Maestre, R., Fernandez, M., Kurdahi, F. J., Bagherzadeh, N., and Singh, H. 2000. Configuration management in multi-context reconfigurable systems for simultaneous performance and power optimization. In Proceedings of the International Symposium on System Synthesis. 107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Mangione-Smith, W., Hutchings, B., et al. 1997. Seeking solutions in configurable computing. Comput. 30, 12, 38--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Manolios, P. 2005. Refinement maps for efficient verification of processor models. In Proceedings of the Conference on Design Automation and Test in Europe (DATE'05). IEEE Computer Society Press, 1304--1309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Mattson, R. L., Gecsei, 1., Slutz, D. R., and Trainger, 1. L. 1970. Evaluation techniques for storage hierarchies. IBM Syst. J. 9, 2, 78--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Matzke, D. 1997. Will physical scalability sabotage performance gains? IEEE Comput. 30, 9, 37--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Moore, G. E. 1995. Lithography and the future of Moore's law. In Advances in Resist Technology and Processing XII, R. D. Allen, Ed., 2--17.Google ScholarGoogle Scholar
  31. Mueller, S. M., Paul, W. J., and Kroening, D. 1999. Proving the correctness of processors with delayed branch using delayed PC. http://www-wjp.cs.uni-saarland.de/publikationen/KMP99a.pdf.Google ScholarGoogle Scholar
  32. Murray, J., Salett, R., Hetherington, R., and McKeen, F. 1990. Micro-architecture of the VAX 9000. In Proceedings of the 35th IEEE Computer Society International Conference, Digest of Papers, 44--53.Google ScholarGoogle ScholarCross RefCross Ref
  33. Nagarajan, R., Sankaralingam, K., Burger, D., and Keckler, S. W. 2001. A design space evaluation of grid processor architectures. In Proceedings of the 4th Annual International Symposium on Microarchitecture. IEEE Computer Society, Los Alamitos, CA, 40--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Olukotun, K., Hammond, L., and Laudon, J. 2007. Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency Vol. 2. Morgan & Claypool Publishers, San Rafael, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Palacharla, S., Jouppi, N. P., and Smith, J. E. 1997. Complexity-effective superscaJar processors. SIGARCH Comput. Archit. News 25, 2, 206--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Qi, S., Zhang, M., Li, J., Zhao, T., Zhang, C., and Li, S. 2010. A high performance router with dynamic buffer allocation for on-chip interconnect networks. In Proceedings of the IEEE International Conference on Computer Design. 462--467.Google ScholarGoogle Scholar
  37. Rixner, S., Dally, W. J., Khailany, B., Mattson, P., Kapasi, U. J., and Owens, J. D. 2000. Register organization for media processing. In Proceedings of the 6th International Symposium on High-Performance Computer Architecture (HPCA'00). IEEE Computer Society, 375--386.Google ScholarGoogle Scholar
  38. Sankaralingam, K., Nagarajan, R., et al. 2006. The distributed microarchitecture of the trips prototype processor. In Proceedings of the 39th International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Schmit, H. 1997. Incremental reconfiguration for pipelined applications. In Proceedings of the 5th IEEE Symposium on FPGAsfor Custom Computing Machines (FCCM'97). IEEE Computer Society, Los Alamitos, CA, 47--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Seiler, L., Carmean, D., et al. 2008, Larrabee; a many-core x86 architecture for visual computing. ACM Trans. Graph. 27, 3, I--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Sima, D. 2000. The design space of register renaming techniques. IEEE Micro. 20, 5, 70--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. SLDS. 2010. Lpdsp (low power dsp). http://semicon.sanyo.comlslds/product/lpdsp.html.Google ScholarGoogle Scholar
  43. Smith, J. E. and Sohi, G. S. 1995. The microarchitecture of superscalar processors. Proc. IEEE.Google ScholarGoogle Scholar
  44. Takano, S. 2004. Adaptive processor: A model of stream processing. In Proceedings of the IEEE Reconfigurable Architectures Workshop (RAW'04). associated with the 18th International Parallel and Distributed Processing Symposium, (IPDPS'04).Google ScholarGoogle ScholarCross RefCross Ref
  45. Tomasulo, R. M. 1967. An efficient algorithm for exploiti~ multiple arithmetic units. IBM J. Resear. Devel. 11, 1, 25--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Tran, A., Truong, D., and Baas, B. 2009. A GALS many-core heterogeneous DSP platform with sourcesynchronous on-chip interconnection network. In Proceedings of the 3rd ACM/IEEE International Symposium on Networks-on-Chip. 214--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Trimberger, S., Carberry, D., Johnson, A., and Wong, J. 1997. A time-multiplexed fpga. In Proceedings of the 5th IEEE Symposium on FPGAs for Custom Computing Machines, (FCCM'97). IEEE Computer Society, 22--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Tullsen, D., Eggers, S., and Levy, H. 1998. Simultaneous multithreading: maximizing on-chip parallelism. In ISCA'98: 25 Years of the International Symposia on Computer Architecture (Selected Papers). ACM, New York, NY, 533--544. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Victor, D. W., Ludden, J. M., et al. 2005. Functional verification of the power5 microprocessor and power5 multiprocessor systems. IBM J. Resear. Devel. 49, 4/5, 541--552. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Vuillemin, J., Bertin, P., Roncin, D., Shand, M., Touati, H., and Boucard, P. 1996. Programmable active memories: Reconfigurable systems come of age. IEEE Trans. VLSI Syst. 4, 56--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Wall, D. W. 1993. Limits of instruction-level parallelism. Resear. rep. 93/6. Compaq Computer Corp.Google ScholarGoogle Scholar
  52. Weiss, S. and Smith, J. E. 1984. Instruction issue logic for pipelined supercomputers. SIGARCH Comput. Archit. News 12, 3, 110--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Wentzlaff, D., Griffin, P., Hoffmann, H., Bao, L., Edwards, B., Ramey, C., Mattina, M., Miao, C.-C., III, J. F. B., and Agarwal, A. 2007. On-chip interconnection architecture of the tile processor. IEEE Micro 27, 5, 15--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Wigley, G. and Kearney, D. 2001. The development of an operating system for reconfigurable computing. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM). IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Wirthin, M. J. and Hutchings, B. L. 1995. A dynamic instruction set computer. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM'95). IEEE Computer Society, 99--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Wulf, W. A. and McKee, S. A. 1995. Hitting the memory wall: Implications of the obvious. Comput. Archit. News 23, 20--24. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Design and analysis of adaptive processor

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!