skip to main content
research-article

A domain specific interconnect for reconfigurable computing

Published:12 June 2008Publication History
Skip Abstract Section

Abstract

Affine Control Loops (ACLs) occur frequently in data- and computeintensive applications. Implementing ACLs directly on dedicated hardware has the potential for spectacular performance improvement in area, time and energy. An important challenge for such direct hardware compilation of ACLs is the interconnection between the different processing elements, which may be non-local as well as dynamic. We propose a generic, reconfigurable interconnection fabric which can realize the data-path of any ACL and be dynamically reconfigured in constant time. We have applied for a patent for this technology.

References

  1. F. de Dinechin, "The price of routing in FPGAs," Journal of Universal Computer Science, vol. 6, pp. 227--239, Feb. 2000.Google ScholarGoogle Scholar
  2. K. Asanovic, R. Bodik, B. C. Catanzaro, P. Gebis, J. J. abd Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick, "The landscape of parallel computing research: A view from berkeley," EECS Tech Report EECE-2006-183, UC Berkeley, Decembeer 2006. www.eecs.berkeley.edu/Pubs/..TechRpts/2006/EECS-2006-183.pdf.Google ScholarGoogle Scholar
  3. R. Schreiber, S. Aditya, S. Mahlke, V. Kathail, B. Ramakrishna-Rau, D. Conquist, and M. Sivaraman, "PICO-NPA: High level synthesis of nonprogrammable hardware accelerators," Journal of VLSI SIgnal Processing, pp. 127?142, June 2002. (preliminary version presented at ASAP 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Synfora, "PICO Express." see also: Technology Whitepaper at www.synfora.com/about/files/PICO Technology whitepaper v1.0.pdf.Google ScholarGoogle Scholar
  5. E. R. Bart Kienhuis and E. F. Deprettere, "Compaan: Deriving process networks from matlab for embedded signal processing architectures," in 8th International Workshop on Hardware/Software Codesign (CODES?2000), May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A.-C. Guillou, F. Quiller, P. Quinton, S. Rajopadhye, and T. Risset, "Hardware design methodology with the Alpha language," in Forum on Design Languages, (Lyon, France), Sept 2001.Google ScholarGoogle Scholar
  7. C. Bastoul, A. Cohen, A. Girbal, S. Sharma, and O. Temam, "Putting polyhedral loop transformations to work," in LCPC?16 International Workshop on Languages and Compilers for Parallel Computers, LNCS 2958, (College Station), pp. 209--225, october 2003. see also http://www.inria.fr/rrrt/rr-4902.html for a detailed version.Google ScholarGoogle Scholar
  8. L. Renganarayana and S. Rajopadhye, "Switched memory architectures: Moving beyond systolic arrays," in ASAP 2003: IEEE International Conference on Application-Specific Systems, Architectures and Processors, (Den Hague, the Netherlands), pp. 28--39, IEEE Press, June 2003.Google ScholarGoogle Scholar
  9. C. E. D. C. Green and P. Franklin, "RaPiD ? reconfigurable pipelined datapath," in Field-Programmable Logic: Smart Applications, New Paradigms, and Compilers. 6th International Workshop on Field-Programmable Logic and Applications (R. W. Hartenstein and M. Glesner, eds.), (Darmstadt, Germany), pp. 126?135, Springer-Verlag, Sept. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. M. Quénot, I. Kralji?, J. Sérot, and B. Zavidovique, "A reconfigurable compute engine for real-time vision automata prototyping," in Proceedings of IEEE Workshop on FPGAs for Custom Computing Machings (D. A. Buell and K. L. Pocek, eds.), (Napa, CA), pp. 91--100, Apr. 1994.Google ScholarGoogle Scholar
  11. E. Mirsky and A. DeHon, "MATRIX: A reconfigurable computing architecture with configurable instruction distribution and deployable resources," in Proceedings of IEEE Workshop on FPGAs for Custom Computing Machines (J. Arnold and K. L. Pocek, eds.), (Napa, CA), pp. 157--166, Apr. 1996.Google ScholarGoogle Scholar
  12. S. C. Goldstein, H. Schmit, M. Budiu, S. Cadambi, M. Moe, and R. Taylor, "Piperench: A reconfigurable architecture and compiler," IEE Computer, vol. 33, April 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. "IDT Peripheral Bus: Intermodule connection technology enables broad range of system-level integration." IDT Inc., 2000.Google ScholarGoogle Scholar
  14. "Wishbone: System-onChip (SoC) interconnect architecture for portable IP cores, revision B.3." Silicore Inc., September 2002.Google ScholarGoogle Scholar
  15. "Silicon micronetworks technical overview." Sonics Inc., January 2002.Google ScholarGoogle Scholar
  16. "The CoreConnect bus architecture." International Business Machines, Inc., September 1999.Google ScholarGoogle Scholar
  17. W. J. Dally and B. Towles, "Route packets, net wires: on-chip inteconnectoin networks," in DAC ?01: Proceedings of the 38th conference on Design automation, (New York, NY, USA), pp. 684--689, ACM Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. B. Taylor, W. Lee, S. P. Amarasinghe, and A. Agarwal, "Scalar operand networks," IEEE Transactions on Parallel and Distributed Ssystems, vol. 16, pp. 145--162, February 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Shoemaker, F. Honore, C. Metcalf, and S. Ward, "Numesh: an architecture optimized for scheduled communication," J. Supercomput., vol. 10, no. 3, pp. 285--302, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Darte, Y. Robert, and F. Vivien, Scheduling and Automatic Parallelization. Birkhuser, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. W. Lim, G. I. Cheong, and M. S. Lam, "An affine partitioning algorithm to maximize parallelism and minimize communication.," in International Conference on Supercomputing, pp. 228--237, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. Pugh, "A practical algorithm for exact array dependence analysis," Commun. ACM, vol. 35, no. 8, pp. 102--114, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Feautrier, "Dataflow analysis of array and scalar references," International Journal of Parallel Programming, vol. 20, no. 1, pp. 23--53, 1991.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. V. Rajopadhye, S. Purushothaman, and R. M. Fujimoto, "On synthesizing systolic arrays from recurrence equations with linear dependencies," in Proceedings, Sixth Conference on Foundations of Software Technology and Theoretical Computer Science, (New Delhi, India), pp. 488--503, Springer Verlag, LNCS 241, December 1986. Later appeared in Parallel Computing, June 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Quinton and V. Van Dongen, "The mapping of linear recurrence equations on regular arrays," Journal of VLSI Signal Processing, vol. 1, no. 2, pp. 95--113, 1989.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Feautrier, "Some efficient solutions to the affine scheduling problem, Part I, one-dimensional time," Tech. Rep. 28, Labaratoire MASI, Institut Blaise Pascal, Apr. 1992.Google ScholarGoogle Scholar
  27. P. Feautrier, "Some efficient solutions to the affine scheduling problem, Part II, multidimensional time," Tech. Rep. 78, Labaratoire MASI, Institut Blaise Pascal, Oct. 1992.Google ScholarGoogle Scholar
  28. E. De Greef, F. Catthoor, and H. De Man, "Memory size reduction through storage order optimization for embedded parallel multimedia applications," in Parallel Processing and Multimedia, (Geneva, Switzerland), July 1997.Google ScholarGoogle Scholar
  29. V. Lefebvre and P. Feautrier, "Optimizing storage size for static control programs in automatic parallelizers," in Euro-Par?97 (Lengauer, Griebl, and Gorlatch, eds.), vol. 1300 of Lecture Notes in Computer Science, Springer-Verlag, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. F. Quilleré and S. Rajopadhye, "Optimizing memory usage in the polyhedral model," ACM Transactions on Programming Languages and Systems, vol. 22, pp. 773--815, September 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. W. Lim, S.-W. Liao, and M. S. Lam, "Blocking and array contraction across arbitrarily nested loops using affine partitioning," in PPoPP ?01: Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming, (New York, NY, USA), pp. 103--112, ACM Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. W. Thies, F. Vivien, J. Sheldon, and S. P. Amarasinghe, "A unified framework for schedule and storage optimization.," in PLDI, pp. 232--242, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Darte, R. Schreiber, and G. Villard, "Lattice-based memory allocation," IEEE Transactions on Computers, vol. 54, pp. 1242--1257, October 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. F. Quilleré, S. Rajopadhye, and D. Wilde, "Generation of efficient nested loops from polyhedra," Int. J. Parallel Program., vol. 28, no. 5, pp. 469--498, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. W. Kelly, W. Pugh, and E. Rosser, "Code generation for multiple mappings," in Frontiers 95: 5th Symposium on the Frontiers of Massivelly Parallel Computation, (McLean, VA), 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. Bastoul, "Code generation in the polyhedral model is easier than you think.," in IEEE PACT, pp. 7--16, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. G. Bleloch, Vector Models for Data Parallel Computing. The MIT Press, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. V. Roychowdhury, L. Thiele, S. K. Rao, and T. Kailath, "On the localization of algorithms for VLSI processor arrays," in VLSI Signal Processing, III (R. W. Brodersen and H. S. Moscovitz, eds.), (Monterey, Ca), pp. 459--470, IEEE Accoustics, Speech and Signal Processing Society, IEEE Press, November 1988. A detailed version is submitted to IEEE Transactions on Computers.Google ScholarGoogle Scholar
  39. Y. Yaacoby and P. R. Cappello, "Converting affine recurrence equations to quasi-uniform recurrence equations," in AWOC 1988: Third International Workshop on Parallel Computation and VLSI Theory, Springer Verlag, June 1988. See also, UCSB Technical Report TRCS87-18, February 1988.Google ScholarGoogle Scholar
  40. P. Lenders and S. V. Rajopadhye, "Multirate VLSI arrays and their synthesis," Tech. Rep. 94-70-01, Oregon State University, Computer Science Dept, Corvallis OR 97331, December 1994. (use citation sanjay-mra95: to appear in IEEE Transactions on Computers). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. F. C. Wong and J.-M. Delosme, "Broadcast removal in systolic algorithms," in International Conference on Systolic Arrays, (San Diego, CA), pp. 403--412, May 1988.Google ScholarGoogle Scholar
  42. H. J. Siegel, Interconnection networks for large-scale parallel processing: theory and case studies. New York, NY, USA: Lexington Books, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J.-M. Parcerisa, J. Sahuquillo, A. Gonzalez, and J. Duato, "Onchip interconnects and instruction steering schemes for clustered microarchitectures," IEEE Transactions on Parallel and Distributed Ssystems, vol. 16, pp. 130--144, February 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. Liang, A. Laffely, S. Srinivasan, and R. Tessier, "An architecture and compiler for scalable on-chip communication," IEEE Transactions on VLSI Ssystems, vol. 12, pp. 711--726, July 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. W. H. Ho and T. M. Pinkston, "A methodology for designing efficient on-chip interconnects on well-behaved communication patterns," in HPCA ?03: Proceedings of the 9th International Symposium on High-Performance Computer Architecture, (Washington, DC, USA), p. 377, IEEE Computer Society, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. N. K. Bambha and S. S. Bhattacharyya, "Interconnect synthesis for systems on chip," in IEEE International Workshop on System on Chip for Real Time Processing, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A domain specific interconnect for reconfigurable computing

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 43, Issue 7
          LCTES '08
          July 2008
          167 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/1379023
          Issue’s Table of Contents
          • cover image ACM Conferences
            LCTES '08: Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
            June 2008
            180 pages
            ISBN:9781605581040
            DOI:10.1145/1375657

          Copyright © 2008 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 June 2008

          Check for updates

          Qualifiers

          • research-article
        • Article Metrics

          • Downloads (Last 12 months)3
          • Downloads (Last 6 weeks)0

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!