Abstract
Preemptive multitasking is widely used in many low-cost and real-time embedded applications for its superior hardware utilization. The frequent and asynchronous context switches, however, require the preservation and restoration of the task state, thus resulting in a large number of memory transfer instructions. As a consequence, task responsiveness and application throughput can be significantly deteriorated. To address this problem we propose a cross-layer customization framework which through the close cooperation of compiler, OS, and hardware architecture achieves rapid and low-cost task switch. Application information extracted during compile-time regarding state liveness is exploited in order to preserve a minimal amount of task state on task preemption. We introduce two complementary techniques to implement the application-aware state preservation. The first technique utilizes compiler-generated custom routines which preserve/restore an extremely small live context at judiciously selected points in the application code. The second technique requires more sophisticated hardware support. It employs an OS-controlled register file mapping to achieve a rapid context switch. By mapping a small fraction of the register file in a single clock cycle, a context switch is achieved requiring no memory transfers for the majority of cases to preserve/restore the live state. The effect of aggressively replicated register files, where each task is given its own replica, is achieved with the hardware cost of only adding from 25% to 50% extra physical registers. Through the utilization of these novel mechanisms, a significant improvement on task response time is achieved as the context-switch cost is minimized.
- Aho, A., Sethi, R., and Ullman, J. 1986. Compilers: Principles, Techniques and Tools. Addison-Wesley, Boston, MA. Google Scholar
Digital Library
- Albrecht, C., Hagenau, R., and Doring, A. 2004. Cooperative software multithreading to enhance utilization of embedded processors for network applications. In Proceedings of the 12th Euromicro Workshop on Parallel, Distributed and Network-Based Processing (PDP'04), IEEE, Los Alamitos, CA, 300--307.Google Scholar
- ARM Ltd. ARM920T technical reference manual. ARM Ltd.Google Scholar
- Baker, T., Snyder, J., and Whalley, D. 1995. Fast context switches: Compiler and architectural support for preemptive scheduling. In Microprocessors and Microsystems, 35--42.Google Scholar
- Barthelmann, V. 2002. Inter-task register-allocation for static operating systems. In Proceedings of the Joint Conference of Languages, Compilers, and Tools for Embedded Systems and Software and Compilers for Embedded Systems (LCTES-SCOPES). ACM, New York, 149--154. Google Scholar
Digital Library
- Bhatti, S., Carlson, J., Dai, H., Deng, J., Rose, J., Sheth, A., Shucker, B., Gruenwald, C., Torgerson, A., and Han, R. 2005. Mantis os: An embedded multithreaded operating system for wireless micro sensor platforms. Mob. Netw. Appl. (Special Issue on Wireless Sensor Networks) 10, 4, 563--579. Google Scholar
Digital Library
- Bovet, D. and Cesati, M. 2002. Understanding the Linux Kernel 2nd Ed. O'Reilly, Sebastopol, CA. Google Scholar
Digital Library
- Byrd, G. and Holliday, M. 1995. Multithreaded processor architectures. IEEE Spectrum. Google Scholar
Digital Library
- Chandra, A., Adler, M., Goyal, P., and Shenoy, P. 2000. Surplus fair scheduling: A proportional-share cpu scheduling algorithm for symmetric multiprocessors. In Proceedings of the Symposium on Operating System Design and Implementation. 45--58. Google Scholar
Digital Library
- Dean, A. G. 2000. Software thread integration for hardware to software migration. Ph.D. thesis, Carnege Mellon University. Google Scholar
Digital Library
- Dean, A. G. 2005. Software thread integration and synthesis for real-time applications. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE). IEEE, Los Alamitos, CA, 68--69. Google Scholar
Digital Library
- Faraboschi, P., Brown, G., Fisher, J., Desoli, G., and Homewood, F. 2000. Lx: A technology platform for customizable view embedded processing. In Proceedings of the International Symposium on Computer Architecture (ISCA). ACM, New York, 203--213. Google Scholar
Digital Library
- Fisher, J., Faraboschi, P., and Young, C. 2005. Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools. Morgan Kaufman, New York, NY.Google Scholar
- Guthaus, M., Ringenberg, J. S., Ernst, D., Austin, T., Mudge, T., and Brown, R. 2001. Mibench: A free, commercially representative embedded benchmark suite. In Proceedings of the 4th Annual Workshop on Workload Characterization (WWC-4). IEEE, Los Alamitos, CA, 3--14. Google Scholar
Digital Library
- Hansson, H., Lawson, L., Bridal, O., Eriksson, C., Larsson, S., Lon, H., and Stromberg, M. 1997. Basement: An architecture and methodology for distributed automotive real-time systems. IEEE Trans. on Comput. 46, 9, 1016--1027. Google Scholar
Digital Library
- Hill, J. and Culler, D. 2001. A wireless embedded sensor architecture for system-level optimization. Tech. rep. University of California, Berkeley.Google Scholar
- Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., and Roussel, P. 2001. The microarchitecture of the pentium 4 processor. Intel Tech. J.Google Scholar
- Intel Corporation. Intel XScale Microarchitecture. Intel Corporation.Google Scholar
- Kessler, R. 1999. The alpha 21264 microprocessor. IEEE Micro 19, 1, 24--36. Google Scholar
Digital Library
- Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th Annual International Symposium on Microarchitecture (MICRO'30). IEEE, Los Alamitos, CA, 330--335. Google Scholar
Digital Library
- Levis, P., Madden, S., Polastre, J., Szewczyk, R., Whitehouse, K., Woo, A., Gay, D., Hill, J., Welsh, M., Brewer, E., and Culler, D. 2005. Tinyos: An operating system for wireless sensor networks. Ambient Intelligence, Springer-Verlag, Berlin, Germany. Google Scholar
Digital Library
- Merten, M., Trick, A., and Barnes, R. 2001. An architectural framework for runtime optimization. IEEE Trans. Comput. 50, 6, 567--589. Google Scholar
Digital Library
- Nieh, J. and Lam, M. S. 2003. A smart scheduler for multimedia applications. ACM Trans. Comput. Syst. 21, 2, 117--163. Google Scholar
Digital Library
- Oehmke, D., Binkert, N., Mudge, T., and Reinhardt, S. 2005. How to fake 1000 registers. InProceedings of the 38th Annual International Symposium on Microarchitecture (MICRO'38), IEEE, Los Alamitos, CA, 7--18. Google Scholar
Digital Library
- Redstone, J., Eggers, S., and Levy, H. 2003. Mini-threads: Increasing tlp on small-scale smt processors. In Proceedings of the 9th International Symposium on High-Performance Computer Architecture (HPCA'03), IEEE, Los Alamitos, CA, 19--30. Google Scholar
Digital Library
- Sastry, D. C. and Demirci, M. 1995. The qnx operating system. Computer 28, 11, 75--77. Google Scholar
Digital Library
- Sherwood, T., Perelman, E., Sair, G. H. S., and Calder, B. 2003. Discovering and exploiting program phases. IEEE Micro 23, 6, 84--93. Google Scholar
Digital Library
- Shivshankar, S., Vangara, S., and Dean, A. 2005. Balancing register pressure and context-switching delays in asti systems. In Proceedings of the 2005 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, ACM, New York, 286--294. Google Scholar
Digital Library
- WINDRIVER. VxWorks, http://www.windriver.com.Google Scholar
Index Terms
Cross-layer customization for rapid and low-cost task preemption in multitasked embedded systems
Recommendations
Automating processor customisation: optimised memory access and resource sharing
DATE '06: Proceedings of the conference on Design, automation and test in Europe: ProceedingsWe propose a novel methodology to generate Application Specific Instruction Processors (ASIPs) including custom instructions. Our implementation balances performance and area requirements by making custom instructions reusable across similar pieces of ...
RV16: An Ultra-Low-Cost Embedded RISC-V Processor Core
AbstractEmbedded and Internet of Things (IoT) devices have extremely strict requirements on the area and power consumption of the processor because of the limitation on its working environment. To reduce the overhead of the embedded processor as much as ...
High-Performance and Low-Cost Dual-Thread VLIW Processor Using Weld Architecture Paradigm
This paper presents a cost-effective and high-performance dual-thread VLIW processor model. The dual-thread VLIW processor model is a low-cost subset of the Weld architecture paradigm. It supports one main thread and one speculative thread running ...






Comments