Abstract
We have devised an algorithm for minimal placement of bank selections in partitioned memory architectures. This algorithm is parameterizable for a chosen metric, such as speed, space, or energy. Bank switching is a technique that increases the code and data memory in microcontrollers without extending the address buses. Given a program in which variables have been assigned to data banks, we present a novel optimization technique that minimizes the overhead of bank switching through cost-effective placement of bank selection instructions. The placement is controlled by a number of different objectives, such as runtime, low power, small code size or a combination of these parameters. We have formulated the minimal placement of bank selection instructions as a discrete optimization problem that is mapped to a partitioned boolean quadratic programming (PBQP) problem. We implemented the optimization as part of a PIC Microchip backend and evaluated the approach for several optimization objectives. Our benchmark suite comprises programs from MiBench and DSPStone plus a microcontroller real-time kernel and drivers for microcontroller hardware devices. Our optimization achieved a reduction in program memory space of between 2.7 and 18.2%, and an overall improvement with respect to instruction cycles between 5.0 and 28.8%. Our optimization achieved the minimal solution for all benchmark programs. We investigated the scalability of our approach toward the requirements of future generations of microcontrollers. This study was conducted as a worst-case analysis on the entire MiBench suite. Our results show that our optimization (1) scales well to larger numbers of memory banks, (2) scales well to the larger problem sizes that will become feasible with future microcontrollers, and (3) achieves minimal placement for more than 72% of all functions from MiBench.
- Banakar, R., Steinke, S., Lee, B., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign (CODES'02). ACM Press, New York. 73--78. Google Scholar
Digital Library
- Bryant, R. E. and O'Halloran, D. R. 2003. Computer Systems: A Programmer's Perspective. Prentice-Hall, Englewood Cliffs, NJ. Google Scholar
Digital Library
- Cai, Q. and Xue, J. 2003. Optimal and efficient speculation-based partial redundancy elimination. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'03). IEEE Computer Society, Los Alamitos, CA. 91--102. Google Scholar
Digital Library
- Cho, J., Paek, Y., and Whalley, D. 2004. Fast memory bank assignment for fixed-point digital signal processors. ACM Transactions on Design Automation of Electronic Systems 9, 1, 52--74. Google Scholar
Digital Library
- Dattalo, T. S. 2006. The Gpsim SW simulator for PIC microcontrollers. http://www.dattalo.com/gnupic/gpsim.html.Google Scholar
- Delaluz, V., Kandemir, M., Vijaykrishnan, N., and Irwin, M. J. 2000. Energy-oriented compiler optimizations for partitioned memory architectures. In Proceedings of the 2000 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'00). ACM Press, New York. 138--147. Google Scholar
Digital Library
- Eckstein, E. 2003. Code optimizations for digital signal processors. Ph.D. thesis, Institute of Computer Languages, Compilers and Languages Group, Vienna University of Technology.Google Scholar
- Fursin, G., Cavazos, J., O'Boyle, M., and Temam, O. 2007. MiDataSets: Creating the conditions for a more realistic evaluation of iterative optimization. In Proceedings of the International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2007). Vol. 4367. Springer LNCS, 245--260. Google Scholar
Digital Library
- Gartner Dataquest. 2004. 2003 microcontroller market share and unit shipments.Google Scholar
- Gartner Dataquest. 2005. Top companies revenue from shipments of 8-bit mcu---all applications.Google Scholar
- Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE 4th Annual Workshop on Workload Characterization. IEEE Computer Society, Los Alamitos, CA. 3--14. Google Scholar
Digital Library
- Hames, L. and Scholz, B. 2006. Nearly optimal register allocation with PBQP. In Proceedings of the 7th Joint Modular Languages Conference (JMLC'06). LNCS, vol. 4228. Springer, New York. 346--361. Google Scholar
Digital Library
- Hempstead, M., Tripathi, N., Mauro, P., Wei, G.-Y., and Brooks, D. 2005. An ultra low power system architecture for sensor network applications. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA'05). IEEE Computer Society, Los Alamitos, CA. 208--219. Google Scholar
Digital Library
- Hempstead, M., Wei, G., and Brooks, D. 2006. Architecture and circuit techniques for low-throughput, energy-constrained systems across technology generations. In Proceedings of the 2006 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'06). ACM Press, New York. 368--378. Google Scholar
Digital Library
- HI-TECH Software. 2006. PICC ANSI C Compiler. http://www.htsoft.com/.Google Scholar
- Kiyohara, T., Mahlke, S., Chen, W., Bringmann, R., Hank, R., Anik, S., and Hwu, W.-M. 1993. Register connection: A new approach to adding registers into instruction set architectures. In Proceedings of the 20th Annual International Symposium on Computer Architecture. ACM Press, New York. 247--256. Google Scholar
Digital Library
- Kleinberg, J. M. and Tardos, E. 1999. Approximation algorithms for classification problems with pairwise relationships: Metric labeling and markov random fields. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science (FOCS'99). IEEE Computer Society, Los Alamitos, CA. 14--23. Google Scholar
Digital Library
- Knoop, J., Rüthing, O., and Steffen, B. 1994. Optimal code motion: Theory and practice. ACM Trans. Program. Lang. Syst. 16, 4, 1117--1155. Google Scholar
Digital Library
- Leupers, R. and Kotte, D. 2001. Variable partitioning for dual memory bank DSPs. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 1121--1124. Google Scholar
Digital Library
- Li, L., Gao, L., and Xue, J. 2005. Memory coloring: A compiler approach for scratchpad memory management. In Proceedings of the 2005 International Conference on Parallel Architectures and Compilation Techniques. 329--338. Google Scholar
Digital Library
- Microchip Technology Inc. 1997. PICmicro mid-range MCU family reference manual.Google Scholar
- Microchip Technology Inc. 2003. PIC16F87XA data sheet.Google Scholar
- Microchip Technology Inc. 2006. PIC18F97J60 family data sheet, advance information.Google Scholar
- MicrochipC.com. 2006. PIC micros and C. http://www.microchipc.com/.Google Scholar
- Muchnick, S. S. 1997. Advanced Compiler Design and Implementation. Morgan Kaufmann, San Francisco, CA. Google Scholar
Digital Library
- Nazhandali, L., Minuth, M., Zhai, B., Olson, J., Austin, T., and Blaauw, D. 2005. A second-generation sensor network processor with application-driven memory optimizations and out-of-order execution. In Proceedings of the 2005 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'05). ACM Press, New York. 249--256. Google Scholar
Digital Library
- Nystrom, E. and Eichenberger, A. E. 1998. Effective cluster assignment for modulo scheduling. In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture. 103--114. Google Scholar
Digital Library
- Panda, P. R., Dutt, N. D., and Nicolau, A. 2000. On-chip vs. off-chip memory: The data partitioning problem in embedded processor-based systems. ACM Transactions on Design Automation of Electronic Systems 5, 3, 682--704. Google Scholar
Digital Library
- Panda, P. R., Catthoor, F., Dutt, N. D., Danckaert, K., Brockmeyer, E., Kulkarni, C., Vandercappelle, A., and Kjeldsberg, P. G. 2001. Data and memory optimization techniques for embedded systems. ACM Transactions on Design Automation of Electronic Systems 6, 2, 149--206. Google Scholar
Digital Library
- Ravindran, R. A., Senger, R. M., Marsman, E. D., Dasika, G. S., Guthaus, M. R., Mahlke, S. A., and Brown, R. B. 2005. Partitioning variables across register windows to reduce spill code in a low-power processor. IEEE Trans. Comput. 54, 8, 998--1012. Google Scholar
Digital Library
- Saghir, M. A. R., Chow, P., and Lee, C. G. 1996. Exploiting dual data-memory banks in digital signal processors. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM Press, New York. 234--243. Google Scholar
Digital Library
- Scholz, B. and Eckstein, E. 2002. Register allocation for irregular architectures. In Proceedings of the Joint Conference on Languages, Compilers and Tools for Embedded Systems (LCTES'02). ACM, New York. 139--148. Google Scholar
Digital Library
- Scholz, B., Horspool, N., and Knoop, J. 2004. Optimizing for space and time usage with speculative partial redundancy elimination. SIGPLAN Notices 39, 7, 221--230. Google Scholar
Digital Library
- Sudarsanam, A. and Malik, S. 1995. Memory bank and register allocation in software synthesis for ASIPs. In Proceedings of the 1995 IEEE/ACM International Conference on Computer-Aided Design (ICCAD'95). 388--392. Google Scholar
Digital Library
- Udayakumaran, S. and Barua, R. 2003. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In Proceedings of the 2003 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'03). ACM Press, New York. 276--286. Google Scholar
Digital Library
- Verma, M., Wehmeyer, L., and Marwedel, P. 2004. Cache-aware scratchpad allocation algorithm. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'04). IEEE Computer Society, Los Alamitos, CA. 1264--1269. Google Scholar
Digital Library
- Zhuang, X., Pande, S., and Jr., J. S. G. 2002. A framework for parallelizing load/stores on embedded processors. In Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT'02). IEEE Computer Society, Los Alamitos, CA. 68--79. Google Scholar
Digital Library
- Zhuge, Q., Xiao, B., and Sha, E. H.-M. 2002. Variable partitioning and scheduling of multiple memory architectures for DSP. In Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS'02). IEEE Computer Society, Los Alamitos, CA. 332. Google Scholar
Digital Library
Index Terms
Minimal placement of bank selection instructions for partitioned memory architectures
Recommendations
Minimizing bank selection instructions for partitioned memory architecture
CASES '06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systemsBank switching is a technique that increases the code and data memory in microcontrollers without extending the address buses. Given a program in which variables have been assigned to data banks, we present a novel optimization technique that minimizes ...
Analysis and approximation for bank selection instruction minimization on partitioned memory architecture
LCTES '10A large number of embedded systems include 8-bit microcontrollers for their energy efficiency and low cost. Multi-bank memory architecture is commonly applied in 8-bit microcontrollers to increase the size of memory without extending address buses. To ...
Joint variable partitioning and bank selection instruction optimization for partitioned memory architectures
About 55% of all CPUs sold in the world are 8-bit microcontrollers or microprocessors which can only access limited memory space without extending address buses. Partitioned memory with bank switching is a technique to increase memory size without ...






Comments