Abstract
In this article, a heuristic custom instruction (CI) selection algorithm is presented. The proposed algorithm, which is called OPLE for “Optimization based on Partitioning and Local Exploration,” uses a combination of greedy and optimal optimization methods. It searches for the near-optimal solution by reducing the search space based on partitioning the identified CI set. The partitioning of the identified set guarantees the success of the algorithm independent of the size of the identified set. First, the algorithm finds the near-optimal CIs from the candidate CIs for each part. Next, the suggested CIs from different parts are combined to determine the final selected CI set. To improve the set of the selected CIs, the solution is evolved by calling the algorithm iteratively. The efficacy of the algorithm is assessed by comparing its performance to those of optimal and nonoptimal methods. A comparative study is performed for a number of benchmarks under different area budgets and I/O constraints. The results reveal higher speedups for the OPLE algorithm, especially for larger identified candidate sets and/or small area budgets compared to those of the nonoptimal solutions. Compared to the nonoptimal techniques, the proposed algorithm provides 30% higher speedup improvement on average. The maximum improvement is 117%. The results also demonstrate that in many cases OPLE is able to find the optimal solution.
- N. Arora, K. Chandramohan, N. Pothineni, and A. Kumar. 2010. Instruction selection in ASIP synthesis using functional matching. In Proceedings of International Conference on VLSI Design. 146--151. Google Scholar
Digital Library
- K. Atasu, C. Ozturan, G. Dundar, O. Mencer, and W. Luk. 2008. CHIPS custom hardware instruction processor synthesis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 27, 3 (2008), 528--541. Google Scholar
Digital Library
- K. Atasu, W. Luk, O. Mencer, C. Özturan, and G. Dündar. 2012. FISH: Fast instruction SyntHesis for custom processors. IEEE Transactions on VLSI Systems 20, 1 (2012), 52--65. Google Scholar
Digital Library
- P. Biswas, N. D. Dutt, L. Pozzi, and P. Ienne. 2007. Introduction of architecturally visible storage in instruction set extensions. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 26, 3 (2007), 435--446. Google Scholar
Digital Library
- P. Bonzini and L. Pozzi. 2007. Polynomial-time subgraph enumeration for automated instruction set extension. In Proceedings of Design, Automation & Test in Europe Conference & Exhibition (DATE’ & Exhibition (DATE’’07). 1--6. Google Scholar
Digital Library
- P. Bonzini and L. Pozzi. 2008. Recurrence-aware instruction set selection for extensible embedded processors. IEEE Transactions on Very Large Scale Integrations (VLSI) Systems 16, 10 (2008), 1259--1267. Google Scholar
Digital Library
- N. Clark, A. Hormati, S. Mahlke, and S. Yehia. 2006. Scalable subgraph mapping for acyclic computation accelerators. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems. 147--157. Google Scholar
Digital Library
- N. T. Clark, H. Zhong, and S. A. Mahlke. 2005. Automated custom instruction generation for domain-specific processor acceleration. IEEE Transactions on Computers 54, 10 (2005), 1258--1270. Google Scholar
Digital Library
- C. M. Fiduccia and R. M. Mattheyses. 1982. A linear-time for improving network partitions. In Proceedings of the Design Automation Conference (DAC’82). 175--181. Google Scholar
Digital Library
- FreePDK. 2010. A Free OpenAccess 45nm PDK and Cell Library for university. http://www.eda.ncsu.edu.Google Scholar
- C. Galuzzi and K. Bertels. 2011. The instruction-set extension problem: A survey. ACM Transactions on Reconfigurable Technology and Systems 4, 18 (2011), 1--28. Google Scholar
Digital Library
- R. E. Gonzalez. 2000. XTENSA: A configurable and extensible processor. IEEE Micro 20, 2 (2000), 60--70. Google Scholar
Digital Library
- Gurobi. 2015. Gurobi Optimization. http://www.gurobi.com/.Google Scholar
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the International Workshop on Workload Characterization. 3--14. Google Scholar
Digital Library
- M. Kamal, N. Kazemian-Amiri, A. Kamran, S. A. Hoseini, M. Dehyadegari, and H. Noori. 2010. Dual-purpose custom instruction identification algorithm based on particle swarm optimization. In Proceedings of the 21st IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP’10). 159--166.Google Scholar
- M. Kamal, A. Afzali-Kusha, and M. Pedram. 2011. Timing variation-aware custom instruction extension technique. In Proceedings of the Design, Automation and Test in Europe (DATE’11). 1517--1520.Google Scholar
- K. Karuri, A. Chattopadhyay, M. Hohenauer, R. Leupers, G. Ascheid, and H. Meyr. 2007. Increasing data-bandwidth to instruction-set extensions through register clustering. In Proceedings of the International Conference on Computer Aided Design (ICCAD’07). 166--171. Google Scholar
Digital Library
- K. Keutzer, S. Malik, and A. R. Newton. 2002. From ASIC to ASIP: The next design discontinuity. In Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors. 84--90. Google Scholar
Digital Library
- S. K. Lam, T. Srikanthan, and C. T. Clarke. 2009. Selecting profitable custom instructions for area-time-efficient realization on reconfigurable architectures. IEEE Transactions on Industrial Electronics 56, 10 (2009), 3998--4005.Google Scholar
Cross Ref
- C. Lee, M. Potkonjak, and W. H. Mangione-Smith. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture. 330--335. Google Scholar
Digital Library
- T. Li, W. Jigang, S. Lam, T. Srikanthan, and X. Lu. 2009. Efficient heuristic algorithm for rapid custom-instruction selection. In Proceedings of the IEEE/ACIS International Conference on Computer and Information Science. 266--270. Google Scholar
Digital Library
- S. Liao and S. Devadas. 1997. Solving covering problems using LPR-based lower bounds. In Proceedings of the 34th Annual Conference on Design Automation (DAC’97). 117--120. Google Scholar
Digital Library
- Y. S. Lu, L. Shen, L. B. Huang, Z. Y. Wang, and N. Xiao. 2009. Optimal subgraph covering for customisable VLIW processors. IET Computer and Digital Techniques 3 (2009), 14--23.Google Scholar
Cross Ref
- Y. Pan and T. Mitra. 2004. Characterizing embedded applications for instruction-set extensible processors. In Proceedings of the Design Automation Conference (DAC’04). 723--728. Google Scholar
Digital Library
- A. Peymandoust, L. Pozzi, P. Ienne, and G. De Micheli. 2003. Automatic instruction set extension and utilization for embedded processors. In Proceedings of the Application-Specific Systems, Architectures, and Processors (ASAP’03). 108--118.Google Scholar
- N. Pothineni, A. Kumar, and K. Paul. 2008. Exhaustive enumeration of legal custom instructions for extensible processors. In Proceedings of International Conference on VLSI Design. 261--266. Google Scholar
Digital Library
- L. Pozzi, K. Atasu, and P. Ienne. 2006. Exact and approximate algorithms for the extension of embedded processor instruction sets. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 25, 7 (2006), 1209--1229. Google Scholar
Digital Library
- R. Ramaswamy and T. Wolf. 2003. PacketBench: A tool for workload characterization of network processing. In Proceedings of the IEEE International Workshop on Workload Characterization. 42--50.Google Scholar
- J. Reddington and K. Atasu. 2012. Complexity of Computing Convex Subgraphs in Custom Instruction Synthesis. IEEE Transactions on VLSI Systems 20, 12 (2012), 2337--2341. Google Scholar
Digital Library
- H. Scharwaechter, D. Kammler, R. Leupers, G. Ascheid, and H. Meyr. 2011. A retargetable framework for compiler/architecture co-development. Design Automation for Embedded Systems 15 (2011), 1--32. Google Scholar
Digital Library
- D. C. Schmidt and L. E. Druffel. 1976. A fast backtracking algorithm to test directed graphs for isomorphism using distance matrices. Journal of the ACM 23, 3 (1976), 433--445. Google Scholar
Digital Library
- SNU. 2015. SNU Real Time Benchmarks. http://www.cprover.org/goto-cc/examples/snu.html.Google Scholar
- A. Verma, P. Brisk, and P. Ienne. 2007. Rethinking custom ISE identification: A new processor-agnostic method. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES’07). 125--134. Google Scholar
Digital Library
- A. Verma, P. Brisk, and P. Ienne. 2010. Fast, nearly optimal ISE identification with I/O serialization through maximal clique enumeration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 29, 3 (2010), 341--354. Google Scholar
Digital Library
- C. Xiao and E. Casseau. 2011. An efficient algorithm for custom instruction enumeration. In Proceedings of the 21st Edition of the Great Lakes Symposium on VLSI. 187--192. Google Scholar
Digital Library
Index Terms
OPLE: A Heuristic Custom Instruction Selection Algorithm Based on Partitioning and Local Exploration of Application Dataflow Graphs
Recommendations
Automatic custom instruction identification for application-specific instruction set processors
The application-specific instruction set processors (ASIPs) have received more and more attention in recent years. ASIPs make trade-offs between flexibility and performance by extending the base instruction set of a general-purpose processor with custom ...
Code Size Reduction in Heterogeneous-Connectivity-Based DSPs Using Instruction Set Extensions
Existing trend of processors shows a progress toward customizable and reconfigurable architectures. In this paper, we study the benefit of combining the architectural design of a VLIW DSP and the concepts of modern customizable processors like ASIPs (...
A new merit function for custom instruction selection under an area budget constraint
This paper presents a new merit function for custom instruction selection phase of the design flow of application-specific instruction-set processors (ASIPs) in the presence of an area budget constraint. In contrast to nearly all of the previously ...






Comments