Abstract
Affine Control Loops (ACLs) occur frequently in data- and computeintensive applications. Implementing ACLs directly on dedicated hardware has the potential for spectacular performance improvement in area, time and energy. An important challenge for such direct hardware compilation of ACLs is the interconnection between the different processing elements, which may be non-local as well as dynamic. We propose a generic, reconfigurable interconnection fabric which can realize the data-path of any ACL and be dynamically reconfigured in constant time. We have applied for a patent for this technology.
- F. de Dinechin, "The price of routing in FPGAs," Journal of Universal Computer Science, vol. 6, pp. 227--239, Feb. 2000.Google Scholar
- K. Asanovic, R. Bodik, B. C. Catanzaro, P. Gebis, J. J. abd Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick, "The landscape of parallel computing research: A view from berkeley," EECS Tech Report EECE-2006-183, UC Berkeley, Decembeer 2006. www.eecs.berkeley.edu/Pubs/..TechRpts/2006/EECS-2006-183.pdf.Google Scholar
- R. Schreiber, S. Aditya, S. Mahlke, V. Kathail, B. Ramakrishna-Rau, D. Conquist, and M. Sivaraman, "PICO-NPA: High level synthesis of nonprogrammable hardware accelerators," Journal of VLSI SIgnal Processing, pp. 127?142, June 2002. (preliminary version presented at ASAP 2000. Google Scholar
Digital Library
- Synfora, "PICO Express." see also: Technology Whitepaper at www.synfora.com/about/files/PICO Technology whitepaper v1.0.pdf.Google Scholar
- E. R. Bart Kienhuis and E. F. Deprettere, "Compaan: Deriving process networks from matlab for embedded signal processing architectures," in 8th International Workshop on Hardware/Software Codesign (CODES?2000), May 2000. Google Scholar
Digital Library
- A.-C. Guillou, F. Quiller, P. Quinton, S. Rajopadhye, and T. Risset, "Hardware design methodology with the Alpha language," in Forum on Design Languages, (Lyon, France), Sept 2001.Google Scholar
- C. Bastoul, A. Cohen, A. Girbal, S. Sharma, and O. Temam, "Putting polyhedral loop transformations to work," in LCPC?16 International Workshop on Languages and Compilers for Parallel Computers, LNCS 2958, (College Station), pp. 209--225, october 2003. see also http://www.inria.fr/rrrt/rr-4902.html for a detailed version.Google Scholar
- L. Renganarayana and S. Rajopadhye, "Switched memory architectures: Moving beyond systolic arrays," in ASAP 2003: IEEE International Conference on Application-Specific Systems, Architectures and Processors, (Den Hague, the Netherlands), pp. 28--39, IEEE Press, June 2003.Google Scholar
- C. E. D. C. Green and P. Franklin, "RaPiD ? reconfigurable pipelined datapath," in Field-Programmable Logic: Smart Applications, New Paradigms, and Compilers. 6th International Workshop on Field-Programmable Logic and Applications (R. W. Hartenstein and M. Glesner, eds.), (Darmstadt, Germany), pp. 126?135, Springer-Verlag, Sept. 1996. Google Scholar
Digital Library
- G. M. Quénot, I. Kralji?, J. Sérot, and B. Zavidovique, "A reconfigurable compute engine for real-time vision automata prototyping," in Proceedings of IEEE Workshop on FPGAs for Custom Computing Machings (D. A. Buell and K. L. Pocek, eds.), (Napa, CA), pp. 91--100, Apr. 1994.Google Scholar
- E. Mirsky and A. DeHon, "MATRIX: A reconfigurable computing architecture with configurable instruction distribution and deployable resources," in Proceedings of IEEE Workshop on FPGAs for Custom Computing Machines (J. Arnold and K. L. Pocek, eds.), (Napa, CA), pp. 157--166, Apr. 1996.Google Scholar
- S. C. Goldstein, H. Schmit, M. Budiu, S. Cadambi, M. Moe, and R. Taylor, "Piperench: A reconfigurable architecture and compiler," IEE Computer, vol. 33, April 2000. Google Scholar
Digital Library
- "IDT Peripheral Bus: Intermodule connection technology enables broad range of system-level integration." IDT Inc., 2000.Google Scholar
- "Wishbone: System-onChip (SoC) interconnect architecture for portable IP cores, revision B.3." Silicore Inc., September 2002.Google Scholar
- "Silicon micronetworks technical overview." Sonics Inc., January 2002.Google Scholar
- "The CoreConnect bus architecture." International Business Machines, Inc., September 1999.Google Scholar
- W. J. Dally and B. Towles, "Route packets, net wires: on-chip inteconnectoin networks," in DAC ?01: Proceedings of the 38th conference on Design automation, (New York, NY, USA), pp. 684--689, ACM Press, 2001. Google Scholar
Digital Library
- M. B. Taylor, W. Lee, S. P. Amarasinghe, and A. Agarwal, "Scalar operand networks," IEEE Transactions on Parallel and Distributed Ssystems, vol. 16, pp. 145--162, February 2005. Google Scholar
Digital Library
- D. Shoemaker, F. Honore, C. Metcalf, and S. Ward, "Numesh: an architecture optimized for scheduled communication," J. Supercomput., vol. 10, no. 3, pp. 285--302, 1996. Google Scholar
Digital Library
- A. Darte, Y. Robert, and F. Vivien, Scheduling and Automatic Parallelization. Birkhuser, 2000. Google Scholar
Digital Library
- A. W. Lim, G. I. Cheong, and M. S. Lam, "An affine partitioning algorithm to maximize parallelism and minimize communication.," in International Conference on Supercomputing, pp. 228--237, 1999. Google Scholar
Digital Library
- W. Pugh, "A practical algorithm for exact array dependence analysis," Commun. ACM, vol. 35, no. 8, pp. 102--114, 1992. Google Scholar
Digital Library
- P. Feautrier, "Dataflow analysis of array and scalar references," International Journal of Parallel Programming, vol. 20, no. 1, pp. 23--53, 1991.Google Scholar
Digital Library
- S. V. Rajopadhye, S. Purushothaman, and R. M. Fujimoto, "On synthesizing systolic arrays from recurrence equations with linear dependencies," in Proceedings, Sixth Conference on Foundations of Software Technology and Theoretical Computer Science, (New Delhi, India), pp. 488--503, Springer Verlag, LNCS 241, December 1986. Later appeared in Parallel Computing, June 1990. Google Scholar
Digital Library
- P. Quinton and V. Van Dongen, "The mapping of linear recurrence equations on regular arrays," Journal of VLSI Signal Processing, vol. 1, no. 2, pp. 95--113, 1989.Google Scholar
Digital Library
- P. Feautrier, "Some efficient solutions to the affine scheduling problem, Part I, one-dimensional time," Tech. Rep. 28, Labaratoire MASI, Institut Blaise Pascal, Apr. 1992.Google Scholar
- P. Feautrier, "Some efficient solutions to the affine scheduling problem, Part II, multidimensional time," Tech. Rep. 78, Labaratoire MASI, Institut Blaise Pascal, Oct. 1992.Google Scholar
- E. De Greef, F. Catthoor, and H. De Man, "Memory size reduction through storage order optimization for embedded parallel multimedia applications," in Parallel Processing and Multimedia, (Geneva, Switzerland), July 1997.Google Scholar
- V. Lefebvre and P. Feautrier, "Optimizing storage size for static control programs in automatic parallelizers," in Euro-Par?97 (Lengauer, Griebl, and Gorlatch, eds.), vol. 1300 of Lecture Notes in Computer Science, Springer-Verlag, 1997. Google Scholar
Digital Library
- F. Quilleré and S. Rajopadhye, "Optimizing memory usage in the polyhedral model," ACM Transactions on Programming Languages and Systems, vol. 22, pp. 773--815, September 2000. Google Scholar
Digital Library
- A. W. Lim, S.-W. Liao, and M. S. Lam, "Blocking and array contraction across arbitrarily nested loops using affine partitioning," in PPoPP ?01: Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming, (New York, NY, USA), pp. 103--112, ACM Press, 2001. Google Scholar
Digital Library
- W. Thies, F. Vivien, J. Sheldon, and S. P. Amarasinghe, "A unified framework for schedule and storage optimization.," in PLDI, pp. 232--242, 2001. Google Scholar
Digital Library
- A. Darte, R. Schreiber, and G. Villard, "Lattice-based memory allocation," IEEE Transactions on Computers, vol. 54, pp. 1242--1257, October 2005. Google Scholar
Digital Library
- F. Quilleré, S. Rajopadhye, and D. Wilde, "Generation of efficient nested loops from polyhedra," Int. J. Parallel Program., vol. 28, no. 5, pp. 469--498, 2000. Google Scholar
Digital Library
- W. Kelly, W. Pugh, and E. Rosser, "Code generation for multiple mappings," in Frontiers 95: 5th Symposium on the Frontiers of Massivelly Parallel Computation, (McLean, VA), 1995. Google Scholar
Digital Library
- C. Bastoul, "Code generation in the polyhedral model is easier than you think.," in IEEE PACT, pp. 7--16, 2004. Google Scholar
Digital Library
- G. Bleloch, Vector Models for Data Parallel Computing. The MIT Press, 1990. Google Scholar
Digital Library
- V. Roychowdhury, L. Thiele, S. K. Rao, and T. Kailath, "On the localization of algorithms for VLSI processor arrays," in VLSI Signal Processing, III (R. W. Brodersen and H. S. Moscovitz, eds.), (Monterey, Ca), pp. 459--470, IEEE Accoustics, Speech and Signal Processing Society, IEEE Press, November 1988. A detailed version is submitted to IEEE Transactions on Computers.Google Scholar
- Y. Yaacoby and P. R. Cappello, "Converting affine recurrence equations to quasi-uniform recurrence equations," in AWOC 1988: Third International Workshop on Parallel Computation and VLSI Theory, Springer Verlag, June 1988. See also, UCSB Technical Report TRCS87-18, February 1988.Google Scholar
- P. Lenders and S. V. Rajopadhye, "Multirate VLSI arrays and their synthesis," Tech. Rep. 94-70-01, Oregon State University, Computer Science Dept, Corvallis OR 97331, December 1994. (use citation sanjay-mra95: to appear in IEEE Transactions on Computers). Google Scholar
Digital Library
- F. C. Wong and J.-M. Delosme, "Broadcast removal in systolic algorithms," in International Conference on Systolic Arrays, (San Diego, CA), pp. 403--412, May 1988.Google Scholar
- H. J. Siegel, Interconnection networks for large-scale parallel processing: theory and case studies. New York, NY, USA: Lexington Books, 1985. Google Scholar
Digital Library
- J.-M. Parcerisa, J. Sahuquillo, A. Gonzalez, and J. Duato, "Onchip interconnects and instruction steering schemes for clustered microarchitectures," IEEE Transactions on Parallel and Distributed Ssystems, vol. 16, pp. 130--144, February 2005. Google Scholar
Digital Library
- J. Liang, A. Laffely, S. Srinivasan, and R. Tessier, "An architecture and compiler for scalable on-chip communication," IEEE Transactions on VLSI Ssystems, vol. 12, pp. 711--726, July 2004. Google Scholar
Digital Library
- W. H. Ho and T. M. Pinkston, "A methodology for designing efficient on-chip interconnects on well-behaved communication patterns," in HPCA ?03: Proceedings of the 9th International Symposium on High-Performance Computer Architecture, (Washington, DC, USA), p. 377, IEEE Computer Society, 2003. Google Scholar
Digital Library
- N. K. Bambha and S. S. Bhattacharyya, "Interconnect synthesis for systems on chip," in IEEE International Workshop on System on Chip for Real Time Processing, 2004. Google Scholar
Digital Library
Index Terms
A domain specific interconnect for reconfigurable computing
Recommendations
A domain specific interconnect for reconfigurable computing
LCTES '08: Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systemsAffine Control Loops (ACLs) occur frequently in data- and computeintensive applications. Implementing ACLs directly on dedicated hardware has the potential for spectacular performance improvement in area, time and energy. An important challenge for such ...
Reconfigurable computing: a survey of systems and software
Due to its potential to greatly accelerate a wide variety of applications, reconfigurable computing has become a subject of a great deal of research. Its key feature is the ability to perform computations in hardware to increase performance, while ...
In-Package Domain-Specific ASICs for Intel® Stratix® 10 FPGAs: A Case Study of Accelerating Deep Learning Using TensorTile ASIC(Abstract Only)
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFPGAs or ASICs? There is a long-running debate on this. FPGAs are extremely flexible while ASICs offer top efficiency but inflexible. We believe that FPGAs and ASICs are better together, to offer both flexible and efficient solutions. We propose single-...







Comments