Abstract
Embedded systems are often constrained in terms of both code size and execution time, because of a limited amount of available memory and real-time nature of applications. A dual instruction set processor, which supports a reduced instruction set (16 bits/instruction), in addition to a full instruction set (32 bits/instruction), allows an opportunity for a tradeoff between these two design criteria. Specifically, while the reduced instruction set can be used to reduce code size by providing smaller instructions, a program compiled into the reduced instruction set typically runs slower than the same program compiled into the full instruction set. Motivated by this observation, we propose a code generation technique that exploits this tradeoff relationship by selectively using the two instruction sets for different sections in the program. The proposed technique, called selective code transformation, not only provides a mechanism to enable a flexible tradeoff between a program's code size and its execution time, but also facilitates program optimization toward enhancing its worst case performance. The results from our experiments show that our proposed technique can be effectively used to fine-tune an application program on a spectrum of code size and execution performance, which, in turn, enables a system-wide optimization on memory space and execution speed involving multiple applications.
- Advanced RISC Machines Ltd. 1995. ARM7TDMI Data Sheet. Advanced RISC Machines Ltd.Google Scholar
- Aho, A. V., Sethi, R., and Ullman, J. D. 1986. Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, MA. Google Scholar
Digital Library
- ARC Cores (http://www.arc.com). The ARCtangent-A5 Processor.Google Scholar
- ARM Limited. ARM Developer Suite Developer Guide.Google Scholar
- ARM Linux Project. http://www.arm.linux.org.uk.Google Scholar
- Ball, T. and Larus, J. R. 1996. Efficient path profiling. In Proceedings of the 29th Annual IEEE/ACM Symposium on Microarchitecture. Paris. 46--57. Google Scholar
Digital Library
- Benitez, M. E. and Davidson, J. W. 1994. Target-specific global code improvement: Principles and applications. Tech. Rep. CS-94-42, Department of Computer Science, University of Virginia. April. Google Scholar
Digital Library
- Davidson, J. W. and Fraser, C. W. 1984. Code selection through object code optimization. ACM Transactions on Programming Languages and Systems 6, 4 (Oct.), 505--526. Google Scholar
Digital Library
- Davidson, J. W. and Whalley, D. B. 1991. A design environment for addressing architecture and compiler interactions. Microprocessors and Microsystems 15, 9 (Nov.), 459--472.Google Scholar
Cross Ref
- Furber, S. 1996. ARM System Architecture. Addison-Wesley. ISBN 0-201-40352-8. Google Scholar
Digital Library
- Goudge, L. and Segars, S. 1996. Thumb: Reducing the cost of 32-bit RISC performance in portable and consumer applications. In Proceedings of COMPCON. Google Scholar
Digital Library
- Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the 4th IEEE Annual Workshop on Workload Characterization. Austin, TX. Google Scholar
Digital Library
- Halambi, A., Shrivastava, A., Biswas, P., Dutt, N., and Nicolau, A. 2002. An efficient compiler technique for code size reduction using reduced bit-width ISAs. In Proceedings of the Design, Automation and Test in Europe (DATE). Paris. Google Scholar
Digital Library
- Kirner, R. 2003. Extending optimising compilation to support worst-case execution time analysis. Ph.D. thesis, Vienna University of Technology.Google Scholar
- Kissel, K. 1997. MIPS16: High-density MIPS for the embedded market. Tech. rep., Silicon Graphics MIPS Group.Google Scholar
- Krishnaswamy, A. and Gupta, R. 2002. Profile guided selection of ARM and Thumb instructions. In Proceedings of LCTES/SCOPES. Berlin. Google Scholar
Digital Library
- Krishnaswamy, A. and Gupta, R. 2003a. Enhancing the performance of 16-bit code using augmenting instructions. In Proceedings of the ACM SIGPLAN Conferece on Languages, Compilers, and Tools for Embedded Systems (LCTES). San Diego, CA. 254--264. Google Scholar
Digital Library
- Krishnaswamy, A. and Gupta, R. 2003b. Mixed width instruction sets. Communications of the ACM 46, 8 (Aug.), 47--52. Google Scholar
Digital Library
- Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th Annual International Symposium on Microarchitecture. 330--335. Google Scholar
Digital Library
- Lee, S., Lee, J., Min, S. L., Hiser, J., and Davidson, J. W. 2003. Code generation for a dual instruction set processor based on selective code transformation. In Proceedings of the 7th International Workshop on Software and Compilers for Embedded Systems (SCOPES). Vienna. 33--48.Google Scholar
- Lee, S., Lee, J., Park, C. Y., and Min, S. L. 2004. A flexible tradeoff between code size and WCET using a dual instruction set processor. In Proceedings of the 8th International Workshop on Software and Compilers for Embedded Systems (SCOPES). Amsterdam. 244--258.Google Scholar
- Lim, S.-S., Bae, Y. H., Jang, G. T., Rhee, B.-D., Min, S. L., Park, C. Y., Shin, H., Park, K., Moon, S.-M., and Kim, C. S. 1995. An accurate worst case timing analysis for RISC processors. IEEE Transactions on Software Engineering 21, 7, 593--694. Google Scholar
Digital Library
- Lim, S.-S., Kim, J., and Min, S. L. 1998. A worst case timing analysis technique for optimized programs. In Proceedings of the 5th International Conference on Real-Time Computing Systems and Applications (RTCSA). Hiroshima. 151--157. Google Scholar
Digital Library
- Park, C. Y. and Shaw, A. C. 1990. Experiments with a program timing tool based on source-level timing schema. In Proceedings of the 11th Real-Time Systems Symposium (RTSS). 72--81.Google Scholar
- Shin, I., Lee, I., and Min, S. L. 2002. Embedded system design framework for minimizing code size and guaranteeing real-time requirements. In Proceedings of the 23rd IEEE Real-Time Systems Symposium (RTSS). Austin, TX. 201--211. Google Scholar
Digital Library
- Shrivastava, A. and Dutt, N. 2004. Energy effcient code generation exploiting reduced bit-width instruction set architectures (rISA). In Proceedings of the 9th Asia and South Pacific Design Automation Conference (ASPCDAC). Yokohama. 475--477. Google Scholar
Digital Library
- SNU Real-Time Benchmark Suite. http://archi.snu.ac.kr/realtime/benchmark.Google Scholar
- Tamches, A. and Miller, B. P. 2001. Dynamic kernel code optimization. In Proceedings of the 3rd Workshop on Binary Translation. Barcelona.Google Scholar
- Zhao, W., Kulkarni, P., Whalley, D., Healy, C., Mueller, F., and Uh, G.-R. 2004. Tuning the WCET of embedded applications. In Proceedings of the 10th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). Toronto. 472--481. Google Scholar
Digital Library
Index Terms
Selective code transformation for dual instruction set processors
Recommendations
Automatic custom instruction identification for application-specific instruction set processors
The application-specific instruction set processors (ASIPs) have received more and more attention in recent years. ASIPs make trade-offs between flexibility and performance by extending the base instruction set of a general-purpose processor with custom ...
An evaluation of speculative instruction execution on simultaneous multithreaded processors
Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
Dual-IS: Instruction Set Modality for Efficient Instruction Level Parallelism
Architecture of Computing SystemsAbstractExploiting instruction level parallelism (ILP) is a widely used method for increasing performance of processors. While traditional very long instruction word (VLIW) processors can exploit ILP energy-efficiently thanks to static instruction ...






Comments