skip to main content
article

Parallelizing load/stores on dual-bank memory embedded processors

Published:01 August 2006Publication History
Skip Abstract Section

Abstract

Many modern embedded processors such as DSPs support partitioned memory banks (also called X--Y memory or dual-bank memory) along with parallel load/store instructions to achieve higher code density and performance. In order to effectively utilize the parallel load/store instructions, the compiler must partition the memory-resident values and assign them to X or Y bank. This paper gives a postregister allocation solution to merge the generated load/store instructions into their parallel counterparts. Simultaneously, our framework performs allocation of values to X or Y memory banks. We first remove as many load/stores and register--register moves as possible through an excellent iterated coalescing based register allocator by Appel and George [1996]. We then attempt to parallelize the generated load/stores using a multipass approach. The basic phase of our approach attempts the merger of load/stores without duplication and web splitting. We model this problem as a graph-coloring problem in which each value is colored as either X or Y. We then construct a motion scheduling graph (MSG), based on the range of motion for each load/store instruction. MSG reflects potential instructions that could be merged. We propose a notion of pseudofixed boundaries so that the load/store movement is less affected by register dependencies. We prove that the coloring problem for MSG is NP-complete and solve it with two different heuristic algorithms with different complexity. We then propose a two-level iterative process to attempt instruction duplication, variable duplication, web splitting, and local conflict elimination to effectively merge the remaining load/stores. Finally, we clean up some multiple-aliased load/stores. To improve the performance, we combine profiling information with each stage coupled with some modifications to the algorithm. We show that our framework results in parallelization of a large number of load/stores without much growth in data and code segments. The average speedup for our optimization pass reaches roughly 13% if no profile information is available and 17% with profile information. The average code and data segment growth is controlled within 13%.

References

  1. Aarts, E. and Korst, K. 1989. Simulated annealing and Boltzmann Machines, Courier Int'l. Google ScholarGoogle Scholar
  2. Aho, A. V., Sethi, R. and Ullman, J. D. 1986. Compilers Principles, Techniques and Tools, Addison-Wesley, Reading, MA. Google ScholarGoogle Scholar
  3. Briggs, P., Cooper, K., and Torczon, L. 1994. Improvements to graph coloring register allocation. ACM Transactions on Programming Languages and Systems. Google ScholarGoogle Scholar
  4. Chaitin, G.J., Auslander, M. A., Chandra, A. K., Cocke, J., Hopkins, M. E., and Markstein, P. 1981. Register allocation via coloring. Computer Language, 6, 47--57.Google ScholarGoogle Scholar
  5. Cho, J., Paek, Y., and Whalley, D. 2002. Register and memory assignment for non-orthogonal architectures via graph coloring and MST algorithms. Proc. of LCTES'02 (June), 130--138. Google ScholarGoogle Scholar
  6. Cooper, K. D. and Harvey, T. J. 1998. Compiler-controlled memory. In 8th ASPLOS (Oct.) Google ScholarGoogle Scholar
  7. Cooper, K. D. and Mcintosh, N. 1999. Enhanced code compression for embedded RISC processors. Proc. SIGPLAN '1999 Conf. Programming Language Design and Implementation (May), 139--149. Google ScholarGoogle Scholar
  8. Davidson, J. W. and Jinturkar, S. 1994. Memory access coalescing: A technique for eliminating redundant memory accesses. Proc. SIGPLAN '94 Conf. Programming Language Design and Implementation (June) 186--195. Google ScholarGoogle Scholar
  9. George, L. and Appel, A. W. 1996. Iterated register coalescing. In Proc. SIGPLAN '96 Conf. Programming Language Design and Implementation. Google ScholarGoogle Scholar
  10. Gross, J. and Yellen, J. 1999. Graph theory and its applications. CRC Press. Boca Raton, FL. Google ScholarGoogle Scholar
  11. Knoop, J., Ruthing, O., and Steffen, B. 1992. Lazy code motion, Proc. SIGPLAN '1992 Conf. Programming Language Design and Implementation (July). Google ScholarGoogle Scholar
  12. Leupers, R. and Kotte, D. 2001. Variable partitioning for dual memory bank DSPs. ICASSP (May). Google ScholarGoogle Scholar
  13. Mach-SUIF Backend Compiler, 2000. The Machine-SUIF 2.1 compiler documentation set. Harvard University, Sept. http://ececs.harvard.edu/hube/research/machsuif.html.Google ScholarGoogle Scholar
  14. Papadimitriou, C.H. and Steiglitz, K. 1998. Combinatorial optimization Algorithms and Complexity, Dover Publications, 1998. Google ScholarGoogle Scholar
  15. Powell, B., Lee, E.A., and Newman, W.C. 1992. Direct synthesis of optimized DSP assembly code from signal flow block diagrams. Proceedings International Conference on Acoustics, Speech, and Signal Processing. 553--556.Google ScholarGoogle Scholar
  16. Saghir, M. A. R., Chow, P., and Lee, C. G. 1996. Exploiting dual data-memory banks in digital signal processors. Proc. of the 8th International Conference on Architectural Support for Programming Languages and Operation Systems, 234--243. Google ScholarGoogle Scholar
  17. Stanford SUIF Compiler Infrastructure, 2000. The SUIF 2 Compiler Documentation Set, Stanford University, Sep. http://suif.stanford.edu/suif/index.html.Google ScholarGoogle Scholar
  18. Sudarsanam, A. and Malik, S. 2000. Simultaneous reference allocation in code generation for dual data memory bank ASIPs. ACM Trans. on Design Automation of Electronic Systems, Vol. 5, 242--264 (Apr.). Google ScholarGoogle Scholar
  19. Zhuang, X., Pande, S., and Greenland J. S. Jr. 2002. A framework for parallelizing load/stores on embedded processors. In Proc. of International Conference on Parallel Architectures and Compilation Techniques, 68--70 (Sep.). Google ScholarGoogle Scholar

Index Terms

  1. Parallelizing load/stores on dual-bank memory embedded processors

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Article Metrics

                • Downloads (Last 12 months)6
                • Downloads (Last 6 weeks)1

                Other Metrics

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader
              About Cookies On This Site

              We use cookies to ensure that we give you the best experience on our website.

              Learn more

              Got it!