research-article

Scratchpad memory allocation for data aggregates via interval coloring in superperfect graphs

Abstract

Existing methods place data or code in scratchpad memory (SPM) by relying on heuristics or resorting to integer programming or mapping it to a graph-coloring problem. In this article, the SPM allocation problem for arrays is formulated as an interval coloring problem. The key observation is that in many embedded C programs, two arrays can be modeled such that either their live ranges do not interfere or one contains the other (with good accuracy). As a result, array interference graphs often form a special class of superperfect graphs (known as comparability graphs), and their optimal interval colorings become efficiently solvable. This insight has led to the development of an SPM allocation algorithm that places arrays in an interference graph in SPM by examining its maximal cliques. If the SPM is no smaller than the clique number of an interference graph, then all arrays in the graph can be placed in SPM optimally. Otherwise, we rely on containment-motivated heuristics to split or spill array live ranges until the resulting graph is optimally colorable. We have implemented our algorithm in SUIF/machSUIF and evaluated it using a set of embedded C benchmarks from MediaBench and MiBench. Compared to a graph-coloring algorithm and an optimal ILP algorithm (when it runs to completion), our algorithm achieves close-to-optimal results and is superior to graph coloring for the benchmarks tested.

References

  1. Andersson, C. 2003. Register allocation by optimal graph coloring. In Proceedings of the 12th International Conference on Compiler Construction (CC'03). Springer-Verlag, Berlin. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Avissar, O., Barua, R., and Stewart, D. 2002. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embedded Comput. Syst. 1, 1, 6--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad memory: design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign (CODES'02). ACM, New York, 73--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bergner, P., Dahl, P., Engebretsen, D., and O'Keefe, M. 1997. Spill code minimization via interference region spilling. In Proceedings of the Conference on Programming + Language Design and Implementation (PLDI'97). ACM, New York, 287--295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bouchez, F. 2005. Allocation de registres et vidage en mémoire. M.S. thesis, ENS Lyon.Google ScholarGoogle Scholar
  6. Bouchez, F., Darte, A., and Rastello, F. 2007. On the complexity of register coalescing. In Proceedings of the International Symposium on Code Generation and Optimization (CGO '07). IEEE, Los Alamitos, CA, 102--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chaitin, G. J. 1982. Register allocation & spilling via graph coloring. In Proceedings of the Symposium on Compiler Construction (SIGPLAN'82). ACM, New York, 98--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chow, F. C. and Hennessy, J. L. 1990. The priority-based coloring approach to register allocation. ACM Trans. Program. Lang. Syst. 12, 4, 501--536. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Confessore, G., Dell'Olmo, P., and Giordani, S. 2002. An approximation result for the interval coloring problem on claw-free chordal graphs. Discrete Appl. Math. 120, 1-3, 73--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cooper, K. D. and Simpson, L. T. 1998. Live range splitting in a graph coloring register allocator. In Proceedings of the 7th International Conference on Compiler Construction (CC'98). Springer-Verlag, Berlin, 174--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., and Zadeck, F. K. 1989. An efficient method of computing static single assignment form. In Proceedings of the 16th Symposium on Principles of Programming Languages (POPL'89). ACM, New York, 25--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Detlefs, D. and Agesen, O. 1999. Inlining of virtual methods. In 13th European Conference on Object-Oriented Programming. Springer-Verlag, Berlin, 258--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Fabri, J. 1979. Automatic storage optimization. In Proceedings of the SIGPLAN Symposium on Compiler Construction (SIGPLAN'79). ACM, New York, 83--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Feng, H. 2007. Ilp formulation for spm allocation. M.S. thesis, University of New South Wales.Google ScholarGoogle Scholar
  15. Garey, M. R. and Johnson, D. S. 1976. The complexity of near-optimal graph coloring. J. ACM 23, 1, 43--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Garey, M. R. and Johnson, D. S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. George, L. and Appel, A. W. 1996. Iterated register coalescing. ACM Trans. Program. Lang. Syst. 18, 3, 300--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Gergov, J. 1999. Algorithms for compile-time memory optimization. In Proceedings of the 10th Annual Symposium on Discrete Algorithms (SODA'99). Society for Industrial and Applied Mathematics, Philadelphia, 907--908. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Golumbic, M. C. 2004. Algorithmic graph theory and perfect graphs. Ann. Discrete Math. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. Mibench: A free, commercially representative embedded benchmark suite. In Proceedings of the International Workshop on Workload Characterization (WWC '01). IEEE, Los Alamitos, CA, 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hack, S., Grund, D., and Goos, G. 2006. Register allocation for programs in ssa-form. In Proceedings of the 15th International Conference on Compiler Construction (CC'06). Springer-Verlag, Berlin. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hiser, J. D. and Davidson, J. W. 2004. Embarc: an efficient memory bank assignment algorithm for retargetable compilers. In Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'04). ACM, New York, 182--191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Huang, Q., Xue, J., and Vera, X. 2003. Code tiling for improving the cache performance of PDE solvers. In International Conference on Parallel Processing. IEEE, Los Alamitos, CA, 615--625.Google ScholarGoogle Scholar
  24. Kandemir, M., Ramanujam, J., Irwin, J., Vijaykrishnan, N., Kadayif, I., and Parikh, A. 2001. Dynamic management of scratch-pad memory space. In Proceedings of the 38th Conference on Design Automation (DAC'01). ACM, New York, 690--695. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kierstead, H. A. 1991. A polynomial time approximation algorithm for dynamic storage allocation. Discrete Math. 87, 2-3, 231--237. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In International Symposium on Microarchitecture. IEEE, Los Alamitos, CA, 330--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Li, L., Gao, L., and Xue, J. 2005. Memory coloring: a compiler approach for scratchpad memory management. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05). IEEE, Los Alamitos, CA, 329--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Li, L., Nguyen, Q. H., and Xue, J. 2007. Scratchpad allocation for data aggregates in superperfect graphs. SIGPLAN Not. 42, 7, 207--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Panda, P. R., Dutt, N. D., and Nicolau, A. 1997a. Architectural exploration and optimization of local memory in embedded systems. In Proceedings of the 10th International Symposium on System Synthesis (ISSS'97). IEEE, Los Alamitos, CA, 90--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Panda, P. R., Dutt, N. D., and Nicolau, A. 1997b. Efficient utilization of scratchpad memory in embedded processor applications. In Proceedings of the European Conference on Design and Test (EDTC'97). IEEE, Los Alamitos, CA, 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Panda, P. R., Dutt, N. D., and Nicolau, A. 2000. On-chip vs. off-chip memory: The data partitioning problem in embedded processor-based systems. ACM Trans. Des. Autom. Electron. Syst. 5, 3, 682--704. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Park, J. and Moon, S.-M. 2004. Optimistic register coalescing. ACM Trans. Program. Lang. Syst. 26, 4, 735--765. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Pemmaraju, S. V., Penumatcha, S., and Raman, R. 2005. Approximating interval coloring and max-coloring in chordal graphs. J. Exp. Algorithmics 10, 2.8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Pereira, F. M. Q. and Palsberg, J. 2005. Register allocation via coloring of chordal graphs. In Proceedings of the 3rd Asia Symposium on Programming Languages and Systems (APLAS'05). Springer, Berlin, 315--329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ravindran, R. A., Nagarkar, P. D., Dasika, G. S., Marsman, E. D., Senger, R. M., Mahlke, S. A., and Brown, R. B. 2005. Compiler managed dynamic instruction placement in a low-power cod e cache. In Proceedings of the 3rd International Symposium on Code Generation and Optimization. IEEE, Los Alamitos, CA, 179--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sjödin, J. and von Platen, C. 2001. Storage allocation for embedded processors. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'01). ACM, New York, 15--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Steinke, S., Wehmeyer, L., Lee, B., and Marwedel, P. 2002. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'02). IEEE, Los Alamitos, CA, 409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Udayakumaran, S. and Barua, R. 2003. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'03). ACM, New York, 276--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Udayakumaran, S., Dominguez, A., and Barua, R. 2006. Dynamic allocation for scratch-pad memory using compile-time decisions. ACM Trans. Embedded Comput. Syst. 5, 2, 472--511. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Verma, M., Wehmeyer, L., and Marwedel, P. 2004a. Cache-aware scratchpad allocation algorithm. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'04). IEEE, Los Alamitos, CA, 21264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Verma, M., Wehmeyer, L., and Marwedel, P. 2004b. Dynamic overlay of scratchpad memory for energy minimization. In Proceedings of the 2nd International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'04). ACM, New York, 104--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Wolfe, M. 1989. Iteration space tiling for memory hierarchies. In Proceedings of the 3rd SIAM Conference on Parallel Processing for Scientific Computing. Society for Industrial and Applied Mathematics, Philadelphia, 357--361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Xue, J. 1997. On tiling as a loop transformation. Parallel Process. Lett. 7, 4, 409--424.Google ScholarGoogle ScholarCross RefCross Ref
  44. Xue, J. 2000. Loop Tiling for Parallelism. Kluwer Academic Publishers, Boston. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Zeitlhofer, T. and Wess, B. 2003. List-coloring of interval graphs with application to register assignment for heterogeneous register-set architectures. ACM Signal Process. 83, 7, 1411--1425. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scratchpad memory allocation for data aggregates via interval coloring in superperfect graphs

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!