skip to main content
research-article

OPLE: A Heuristic Custom Instruction Selection Algorithm Based on Partitioning and Local Exploration of Application Dataflow Graphs

Published:09 September 2015Publication History
Skip Abstract Section

Abstract

In this article, a heuristic custom instruction (CI) selection algorithm is presented. The proposed algorithm, which is called OPLE for “Optimization based on Partitioning and Local Exploration,” uses a combination of greedy and optimal optimization methods. It searches for the near-optimal solution by reducing the search space based on partitioning the identified CI set. The partitioning of the identified set guarantees the success of the algorithm independent of the size of the identified set. First, the algorithm finds the near-optimal CIs from the candidate CIs for each part. Next, the suggested CIs from different parts are combined to determine the final selected CI set. To improve the set of the selected CIs, the solution is evolved by calling the algorithm iteratively. The efficacy of the algorithm is assessed by comparing its performance to those of optimal and nonoptimal methods. A comparative study is performed for a number of benchmarks under different area budgets and I/O constraints. The results reveal higher speedups for the OPLE algorithm, especially for larger identified candidate sets and/or small area budgets compared to those of the nonoptimal solutions. Compared to the nonoptimal techniques, the proposed algorithm provides 30% higher speedup improvement on average. The maximum improvement is 117%. The results also demonstrate that in many cases OPLE is able to find the optimal solution.

References

  1. N. Arora, K. Chandramohan, N. Pothineni, and A. Kumar. 2010. Instruction selection in ASIP synthesis using functional matching. In Proceedings of International Conference on VLSI Design. 146--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Atasu, C. Ozturan, G. Dundar, O. Mencer, and W. Luk. 2008. CHIPS custom hardware instruction processor synthesis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 27, 3 (2008), 528--541. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Atasu, W. Luk, O. Mencer, C. Özturan, and G. Dündar. 2012. FISH: Fast instruction SyntHesis for custom processors. IEEE Transactions on VLSI Systems 20, 1 (2012), 52--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Biswas, N. D. Dutt, L. Pozzi, and P. Ienne. 2007. Introduction of architecturally visible storage in instruction set extensions. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 26, 3 (2007), 435--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Bonzini and L. Pozzi. 2007. Polynomial-time subgraph enumeration for automated instruction set extension. In Proceedings of Design, Automation & Test in Europe Conference & Exhibition (DATE’ & Exhibition (DATE’’07). 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Bonzini and L. Pozzi. 2008. Recurrence-aware instruction set selection for extensible embedded processors. IEEE Transactions on Very Large Scale Integrations (VLSI) Systems 16, 10 (2008), 1259--1267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. Clark, A. Hormati, S. Mahlke, and S. Yehia. 2006. Scalable subgraph mapping for acyclic computation accelerators. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems. 147--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. T. Clark, H. Zhong, and S. A. Mahlke. 2005. Automated custom instruction generation for domain-specific processor acceleration. IEEE Transactions on Computers 54, 10 (2005), 1258--1270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. M. Fiduccia and R. M. Mattheyses. 1982. A linear-time for improving network partitions. In Proceedings of the Design Automation Conference (DAC’82). 175--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. FreePDK. 2010. A Free OpenAccess 45nm PDK and Cell Library for university. http://www.eda.ncsu.edu.Google ScholarGoogle Scholar
  11. C. Galuzzi and K. Bertels. 2011. The instruction-set extension problem: A survey. ACM Transactions on Reconfigurable Technology and Systems 4, 18 (2011), 1--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. E. Gonzalez. 2000. XTENSA: A configurable and extensible processor. IEEE Micro 20, 2 (2000), 60--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gurobi. 2015. Gurobi Optimization. http://www.gurobi.com/.Google ScholarGoogle Scholar
  14. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the International Workshop on Workload Characterization. 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Kamal, N. Kazemian-Amiri, A. Kamran, S. A. Hoseini, M. Dehyadegari, and H. Noori. 2010. Dual-purpose custom instruction identification algorithm based on particle swarm optimization. In Proceedings of the 21st IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP’10). 159--166.Google ScholarGoogle Scholar
  16. M. Kamal, A. Afzali-Kusha, and M. Pedram. 2011. Timing variation-aware custom instruction extension technique. In Proceedings of the Design, Automation and Test in Europe (DATE’11). 1517--1520.Google ScholarGoogle Scholar
  17. K. Karuri, A. Chattopadhyay, M. Hohenauer, R. Leupers, G. Ascheid, and H. Meyr. 2007. Increasing data-bandwidth to instruction-set extensions through register clustering. In Proceedings of the International Conference on Computer Aided Design (ICCAD’07). 166--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Keutzer, S. Malik, and A. R. Newton. 2002. From ASIC to ASIP: The next design discontinuity. In Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors. 84--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. K. Lam, T. Srikanthan, and C. T. Clarke. 2009. Selecting profitable custom instructions for area-time-efficient realization on reconfigurable architectures. IEEE Transactions on Industrial Electronics 56, 10 (2009), 3998--4005.Google ScholarGoogle ScholarCross RefCross Ref
  20. C. Lee, M. Potkonjak, and W. H. Mangione-Smith. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture. 330--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Li, W. Jigang, S. Lam, T. Srikanthan, and X. Lu. 2009. Efficient heuristic algorithm for rapid custom-instruction selection. In Proceedings of the IEEE/ACIS International Conference on Computer and Information Science. 266--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Liao and S. Devadas. 1997. Solving covering problems using LPR-based lower bounds. In Proceedings of the 34th Annual Conference on Design Automation (DAC’97). 117--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Y. S. Lu, L. Shen, L. B. Huang, Z. Y. Wang, and N. Xiao. 2009. Optimal subgraph covering for customisable VLIW processors. IET Computer and Digital Techniques 3 (2009), 14--23.Google ScholarGoogle ScholarCross RefCross Ref
  24. Y. Pan and T. Mitra. 2004. Characterizing embedded applications for instruction-set extensible processors. In Proceedings of the Design Automation Conference (DAC’04). 723--728. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Peymandoust, L. Pozzi, P. Ienne, and G. De Micheli. 2003. Automatic instruction set extension and utilization for embedded processors. In Proceedings of the Application-Specific Systems, Architectures, and Processors (ASAP’03). 108--118.Google ScholarGoogle Scholar
  26. N. Pothineni, A. Kumar, and K. Paul. 2008. Exhaustive enumeration of legal custom instructions for extensible processors. In Proceedings of International Conference on VLSI Design. 261--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. Pozzi, K. Atasu, and P. Ienne. 2006. Exact and approximate algorithms for the extension of embedded processor instruction sets. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 25, 7 (2006), 1209--1229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Ramaswamy and T. Wolf. 2003. PacketBench: A tool for workload characterization of network processing. In Proceedings of the IEEE International Workshop on Workload Characterization. 42--50.Google ScholarGoogle Scholar
  29. J. Reddington and K. Atasu. 2012. Complexity of Computing Convex Subgraphs in Custom Instruction Synthesis. IEEE Transactions on VLSI Systems 20, 12 (2012), 2337--2341. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. Scharwaechter, D. Kammler, R. Leupers, G. Ascheid, and H. Meyr. 2011. A retargetable framework for compiler/architecture co-development. Design Automation for Embedded Systems 15 (2011), 1--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. C. Schmidt and L. E. Druffel. 1976. A fast backtracking algorithm to test directed graphs for isomorphism using distance matrices. Journal of the ACM 23, 3 (1976), 433--445. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. SNU. 2015. SNU Real Time Benchmarks. http://www.cprover.org/goto-cc/examples/snu.html.Google ScholarGoogle Scholar
  33. A. Verma, P. Brisk, and P. Ienne. 2007. Rethinking custom ISE identification: A new processor-agnostic method. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES’07). 125--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Verma, P. Brisk, and P. Ienne. 2010. Fast, nearly optimal ISE identification with I/O serialization through maximal clique enumeration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 29, 3 (2010), 341--354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. Xiao and E. Casseau. 2011. An efficient algorithm for custom instruction enumeration. In Proceedings of the 21st Edition of the Great Lakes Symposium on VLSI. 187--192. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. OPLE: A Heuristic Custom Instruction Selection Algorithm Based on Partitioning and Local Exploration of Application Dataflow Graphs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!