Abstract

Specialized execution using spatial architectures provides energy efficient computation, but requires effective algorithms for spatially scheduling the computation. Generally, this has been solved with architecture-specific heuristics, an approach which suffers from poor compiler/architect productivity, lack of insight on optimality, and inhibits migration of techniques between architectures.
Our goal is to develop a scheduling framework usable for all spatial architectures. To this end, we expresses spatial scheduling as a constraint satisfaction problem using Integer Linear Programming (ILP). We observe that architecture primitives and scheduler responsibilities can be related through five abstractions: placement of computation, routing of data, managing event timing, managing resource utilization, and forming the optimization objectives. We encode these responsibilities as 20 general ILP constraints, which are used to create schedulers for the disparate TRIPS, DySER, and PLUG architectures. Our results show that a general declarative approach using ILP is implementable, practical, and typically matches or outperforms specialized schedulers.
- Trips toolchain, http://www.cs.utexas.edu/ trips/dist/.Google Scholar
- A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools.Google Scholar
- S. Amarasinghe, D. R. Karger, W. Lee, and V. S. Mirrokni. A theoretical and practical approach to instruction scheduling on spatial architectures. Technical report, MIT, 2002.Google Scholar
- S. Amellal and B. Kaminska. Functional synthesis of digital systems with tass. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 13(5):537--552, 1994. Google Scholar
Digital Library
- C. Ancourt and F. Irigoin. Scanning polyhedra with do loops. In PPOPP 1991. Google Scholar
Digital Library
- O. Azizi, A. Mahesri, B. C. Lee, S. J. Patel, and M. Horowitz. Energyperformance tradeoffs in processor architecture and circuit design: a marginal cost analysis. In ISCA 2010. Google Scholar
Digital Library
- S. S. Battacharyya, E. A. Lee, and P. K. Murthy. Software Synthesis from Dataflow Graphs. Kluwer Academic Publishers, 1996. Google Scholar
Digital Library
- S. Borkar and A. A. Chien. The future of microprocessors. Commun. ACM, 54(5):67--77, 2011. Google Scholar
Digital Library
- D. Burger, S. W. Keckler, K. S. McKinley, M. Dahlin, L. K. John, C. Lin, C. R. Moore, J. Burrill, R. G. McDonald, W. Yoder, and the TRIPS Team. Scaling to the end of silicon with EDGE architectures. IEEE Computer, 37(7):44--55, 2004. Google Scholar
Digital Library
- N. Clark, M. Kudlur, H. Park, S. Mahlke, and K. Flautner. Applicationspecific processing on a general-purpose core via transparent instruction set customization. In MICRO 2004. Google Scholar
Digital Library
- J. Cong, K. Gururaj, G. Han, and W. Jiang. Synthesis algorithm for application-specific homogeneous processor networks. IEEE Trans. Very Large Scale Integr. Syst., 17(9), Sept. 2009. Google Scholar
Digital Library
- K. Coons, X. Chen, S. Kushwaha, K. S. McKinley, and D. Burger. A Spatial Path Scheduling Algorithm for EDGE Architectures. In ASPLOS 2006. Google Scholar
Digital Library
- L. De Carli, Y. Pan, A. Kumar, C. Estan, and K. Sankaralingam. Plug: Flexible lookup modules for rapid deployment of new protocols in high-speed routers. In SIGCOMM 2009. Google Scholar
Digital Library
- L. de Moura and N. Bjørner. Z3: An efficient SMT solver. In TACAS, 2008. Google Scholar
Digital Library
- A. Deb, J. M. Codina, and A. Gonzales. Softhv: A hw/sw co-designed processor with horizontal and vertical fusion. In International Conference on Computing Frontiers 2011. Google Scholar
Digital Library
- A. E. Eichenberger and E. S. Davidson. Efficient formulation for optimal modulo schedulers. In PLDI 1997. Google Scholar
Digital Library
- J. R. Ellis. Bulldog: a compiler for vliw architectures. PhD thesis, 1985. Google Scholar
Digital Library
- D. W. Engels, J. Feldman, D. R. Karger, and M. Ruhl. Parallel processor scheduling with delay constraints. In SODA 2001. Google Scholar
Digital Library
- H. Esmaeilzadeh, E. Blem, R. S. Amant, K. Sankaralingam, and D. Burger. Dark Silicon and the End of Multicore Scaling. In ISCA 2011. Google Scholar
Digital Library
- H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. Neural acceleration for general-purpose approximate programs. In MICRO 2012. Google Scholar
Digital Library
- K. Fan, H. h. Park, M. Kudlur, and S. o. Mahlke. Modulo scheduling for highly customized datapaths to increase hardware reusability. In CGO 2008. Google Scholar
Digital Library
- P. Feautrier. Some efficient solutions to the affine scheduling problem. International Journal of Parallel Programming, 21:313--347, 1992. Google Scholar
Digital Library
- M. Gebhart, B. A. Maher, K. E. Coons, J. Diamond, P. Gratz, M. Marino, N. Ranganathan, B. Robatmili, A. Smith, J. Burrill, S. W. Keckler, D. Burger, and K. S. McKinley. An evaluation of the trips computer system. In ASPLOS 2009. Google Scholar
Digital Library
- G. J. Gordon, S. A. Hong, and M. Dudík. First-order mixed integer linear programming. In UAI 2009. Google Scholar
Digital Library
- V. Govindaraju, C.-H. Ho, T. Nowatzki, J. Chhugani, N. Satish, K. Sankaralingam, and C. Kim. Dyser: Unifying functionality and parallelism specialization for energy efficient computing. IEEE Micro, 33(5), 2012. Google Scholar
Digital Library
- V. Govindaraju, C.-H. Ho, and K. Sankaralingam. Dynamically specialized datapaths for energy efficient computing. In HPCA 2011. Google Scholar
Digital Library
- S. Gupta, S. Feng, A. Ansari, S. Mahlke, and D. August. Bundled execution of recurring traces for energy-efficient general purpose processing. In MICRO 2011. Google Scholar
Digital Library
- N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. Toward dark silicon in servers. IEEE Micro, 31(4):6--15, 2011. Google Scholar
Digital Library
- J. N. Hooker. Logic, optimization and constraint programming. INFORMS Journal on Computing, 14:295--321, 2002. Google Scholar
Digital Library
- J. N. Hooker and M. A. Osorio. Mixed logical-linear programming. Discrete Appl. Math., 96-97(1), Oct. 1999. Google Scholar
Digital Library
- Z. Huang, S. Malik, N. Moreano, and G. Araujo. The design of dynamically reconfigurable datapath coprocessors. ACM Trans. Embed. Comput. Syst., 3(2):361--384, May 2004. Google Scholar
Digital Library
- R. Joshi, G. Nelson, and K. Randall. Denali: a goal-directed superoptimizer. In PLDI 2002. Google Scholar
Digital Library
- K. Kailas and A. Agrawala. Cars: A new code generation framework for clustered ilp processors. In HPCA 2001. Google Scholar
Digital Library
- M. Kudlur and S. Mahlke. Orchestrating the execution of stream programs on multicore platforms. In PLDI 2008. Google Scholar
Digital Library
- A. Kumar, L. De Carli, S. J. Kim, M. de Kruijf, K. Sankaralingam, C. Estan, and S. Jha. Design and implementation of the plug architecture for programmable and efficient network lookups. In PACT 2010. Google Scholar
Digital Library
- W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Babb, V. Sarkar, and S. Amarasinghe. Space-time scheduling of instruction-level parallelism on a raw machine. In ASPLOS 1998. Google Scholar
Digital Library
- M. Mercaldi, S. Swanson, A. Petersen, A. Putnam, A. Schwerin, M. Oskin, and S. J. Eggers. Instruction scheduling for a tiled dataflow architecture. In ASPLOS 2006. Google Scholar
Digital Library
- M. Mercaldi, S. Swanson, A. Petersen, A. Putnam, A. Schwerin, M. Oskin, and S. J. Eggers. Modeling instruction placement on a spatial architecture. In SPAA 2006. Google Scholar
Digital Library
- M. Mishra, T. J. Callahan, T. Chelcea, G. Venkataramani, M. Budiu, and S. C. Goldstein. Tartan: Evaluating spatial computation for whole program execution. In ASPLOS 2006. Google Scholar
Digital Library
- R. Nagarajan, S. K. Kushwaha, D. Burger, K. S. McKinley, C. Lin, and S. W. Keckler. Static placement, dynamic issue (spdi) scheduling for edge architectures. In PACT 2004. Google Scholar
Digital Library
- E. Özer, S. Banerjia, and T. M. Conte. Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures. In MICRO 31.Google Scholar
- J. Palsberg and M. Naik. Ilp-based resource-aware compilation, 2004.Google Scholar
- H. Park, K. Fan, S. A. Mahlke, T. Oh, H. Kim, and H.-s. Kim. Edgecentric modulo scheduling for coarse-grained reconfigurable architectures. In PACT 2008. Google Scholar
Digital Library
- W. Pugh. The omega test: a fast and practical integer programming algorithm for dependence analysis. In Supercomputing 1991. Google Scholar
Digital Library
- N. Satish, K. Ravindran, and K. Keutzer. A decomposition-based constraint optimization approach for statically scheduling task graphs with communication delays to multiprocessors. In DATE 2007. Google Scholar
Digital Library
- S. Swanson, K. Michelson, A. Schwerin, and M. Oskin. Wavescalar. In MICRO 2003. Google Scholar
Digital Library
- M. Thuresson, M. Sjalander, M. Bjork, L. Svensson, P. Larsson- Edefors, and P. Stenstrom. Flexcore: Utilizing exposed datapath control for efficient computing. In IC-SAMOS 2007.Google Scholar
- G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, S. Swanson, and M. B. Taylor. Conservation cores: reducing the energy of mature computations. In ASPLOS 2010. Google Scholar
Digital Library
- H. M. Wagner. An integer linear-programming model for machine scheduling. Naval Research Logistics Quarterly, 6(2):131--140, 1959.Google Scholar
Cross Ref
- E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring It All to Software: RAW Machines. Computer, 30(9):86--93, 1997. Google Scholar
Digital Library
- M. Watkins, M. Cianchetti, and D. Albonesi. Shared reconfigurable architectures for cmps. In FPGA 2008.Google Scholar
Cross Ref
- L. A. Wolsey and G. L. Nemhauser. Integer and Combinatorial Optimization. Google Scholar
Digital Library
Index Terms
A general constraint-centric scheduling framework for spatial architectures
Recommendations
A general constraint-centric scheduling framework for spatial architectures
PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and ImplementationSpecialized execution using spatial architectures provides energy efficient computation, but requires effective algorithms for spatially scheduling the computation. Generally, this has been solved with architecture-specific heuristics, an approach which ...
A Scheduling Framework for Spatial Architectures Across Multiple Constraint-Solving Theories
Spatial architectures provide energy-efficient computation but require effective scheduling algorithms. Existing heuristic-based approaches offer low compiler/architect productivity, little optimality insight, and low architectural portability.
We seek ...
A Heuristic Ceiling Point Algorithm for General Integer Linear Programming
This paper first examines the role of ceiling points in solving a pure, general integer linear programming problem P. Several kinds of ceiling points are defined and analyzed and one kind called "feasible 1-ceiling points" proves to be of special ...







Comments