Abstract
Traditionally, runtime management involving CPU sharing, real-time scheduling, etc., is provided by the runtime environment (typically an operating system) using hardware support such as timers and interrupts. However, due to stringent performance requirements on network processors, neither OS nor hardware mechanisms are typically feasible/available. Mapping packet processing tasks on network processors involves complex trade-offs to maximize parallelism and pipelining. Due to an increase in the size of the code store and complexity of application requirements, network processors are being programmed with heterogeneous threads that may execute code belonging to different tasks on a given micro-engine. Also, most network applications are streaming applications that are typically processed in a pipelined fashion. Thus, the tasks on different micro-engines are pipelined in such a way as to maximize the throughput. Tasks themselves could have different runtime performance demands.
In this article, we focus on network processors on which hardware can only schedule threads in a round-robin fashion and no OS assistance is provided. We show that it is very difficult and inefficient for the programmer to meet the constraints of runtime management by coding them statically. Due to the infeasibility of hardware or OS solution (even in the near future), we undertake a compiler approach.
We propose a complete compiler solution to automatically insert explicit context switch (ctx) instructions provided on the network processor such that the execution of threads is better manipulated at runtime to meet their constraints. Two approaches are presented that can control programs’ runtime behavior with different applicability and overheads. We show that it is feasible and also opens new application domains that would need heterogeneous thread programming. Such approaches would in general become important for multicore processors.
Finally, our experiments show that the runtime constraints are enforced nearly ideally with minimal runtime degradation and small code growth.
- Abha, M. 1992. Voluntary preemption: A tool in the design of hard real-time systems. In Proceedings of the 2nd International Symposium on Formal Techniques in Real-Time and Fault-Tolerant Systems. 87--106. Google Scholar
Digital Library
- Chen, M., Li, X., Lian, R., Lin, J., Liu, L., Liu, T., and Ju, R. 2005. Shangri-La: Achieving high performance from compiled network applications while enabling ease of programming. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05). Google Scholar
Digital Library
- Clark, C., Lee, W., Schimmel, D., Contis, D., Kone, M., and Thomas, A. 2004. A hardware platform for network intrusion detection and prevention. In Proceedings of the 3rd Workshop on Network Processors and Applications (NP3). 178.Google Scholar
- Crowley, P. 2004. Supporting mixed real-time workloads in multithreaded processors with segmented instruction caches. In Proceedings of the 10th High Performance Computer Architecture Workshop on Network Processors and Applications (HPCA’10). 1--13.Google Scholar
- Dai, J., Huang, B., Li, L., and Harrison, L. 2005. Automatically partitioning packet processing applications for pipelined architectures. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05). Google Scholar
Digital Library
- Demers, A., Keshav, S., and Shenker, S. 1989. Analysis and simulation of a fair queueing algorithm. In Proceedings of the ACM Symposium on Communications Architectures and Protocols (SIGCOMM’89). Google Scholar
Digital Library
- George, L. and Blume, M. 2003. Taming the IXP network processor. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’03). Google Scholar
Digital Library
- Hicks, M., Kakkar, P., Moore, J., Gunter, C., and Nettles, S. 1998. PLAN: A packet language for active networks. In Proceedings of the 3rd ACM SIGPLAN International Conference on Functional Programming (ICFP’98). Google Scholar
Digital Library
- Intel Corporation. 2001a. IXP 1200 Network processor: Programmer’s reference manual. Part No. 278304-010.Google Scholar
- Intel Corporation. 2001b. IXP 1200 Network processor family: Hardware reference manual. Part No. 278303-009.Google Scholar
- Liu, C. and Layland, J. 1973. Scheduling algorithms for multiprogramming in a hard real-time environment. J. ACM 20, 40--61. Google Scholar
Digital Library
- Liu, J., Kong, T., and Chow, F. 2002. Effective compilation support for variable instruction set architecture. In Proceedings of the 11th International Conference on Parallel Architectures and Compilation Techniques (PACT’02). Google Scholar
Digital Library
- Memik, G., Mangione-Smith, W., and Hu, W. 2001. NetBench: A benchmarking suite for network processors. In Proceedings of International Conference on Computer-Aided Design. ACM. Google Scholar
Digital Library
- Papadimitriou, C. and Yannakakis, M. 1991. Optimization, approximation and complexity classes. J. Comput. Syst. Sci. 43, 425--440.Google Scholar
Cross Ref
- Shi, W., Zhuang, X., Paul, I., and Schwan, K. 2002. Efficient implementation of packet scheduling algorithm on high-speed programmable network processors. In Proceedings of the 5th IFIP/IEEE International Conference on Management of Multimedia Networks and Services (MMNS’02). Google Scholar
Digital Library
- Spalink, T., Karlin, S., Peterson, L., and Gottlieb, Y. 2001. Building a robust software-based router using network processors. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP’01). Google Scholar
Digital Library
- Tullsen, D., Eggers, S., and Levy, H. 1995. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA’95). ACM, New York, 392--403. Google Scholar
Digital Library
- Wagner, J. and Leupers, R. 2001. C compiler design for an industrial network processor. In Proceedings of ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems (LCTES’01). Google Scholar
Digital Library
- Welfeld, J. 2001, Network processing in content inspection applications. In Proceedings of the 14th International Symposium on System Synthesis (ISSS’01). 197--201. Google Scholar
Digital Library
- West, R. and Poellabauer, C. 2000. Analysis of a window-constrained scheduler for real-time and best-effort packet streams. In Proceedings of the 21st IEEE Real-Time Systems Symposium (RTSS’00). Google Scholar
Digital Library
- West, R. and Schwan, K. 1999. Dynamic window-constrained scheduling for multimedia applications. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems (ICMCS’99). Google Scholar
Digital Library
- Wolf, T. and Franklin, M. CommBench--A telecommunication benchmark for network processors. In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’00). Google Scholar
Digital Library
- Zhuang, X. and Liu, J. 2002. WRAPS scheduling and its efficient implementation on network processors. In Proceedings of International Conference on High Performance Computing (HiPC’02). Google Scholar
Digital Library
- Zhuang, X. and Pande, S. 2003. Resolving register bank conflicts for a network processor. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT’03). Google Scholar
Digital Library
- Zhuang, X. and Pande, S. 2004. Balancing register allocation across threads for a multithreaded network processor. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’04). Google Scholar
Digital Library
- Zhuang, X. and Pande, S. 2006. A scalable priority queue architecture for high speed network processing. In Proceedings of 25th Conference on Computer Communications (INFOCOM’06). IEEE.Google Scholar
Index Terms
Compiler-Supported Thread Management for Multithreaded Network Processors
Recommendations
Effective thread management on network processors with compiler analysis
Proceedings of the 2006 LCTES ConferenceMapping packet processing tasks on network processor micro-engines involves complex tradeoffs that relating to maximizing parallelism and pipelining. Due to an increase in the size of the code store and complexity of the application requirements, ...
C Compiler Design for an Industrial Network Processor
OM '01: Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systemsOne important problem in code generation for embedded processors is the design of efficient compilers for ASIPs with application specific architectures. This paper outlines the design of a C compiler for an industrial ASIP for telecom applications. The ...
Thread Merging Schemes for Multithreaded Clustered VLIW Processors
ICPP '09: Proceedings of the 2009 International Conference on Parallel ProcessingSeveral multithreading techniques have been proposed to reduce the resource underutilization in Very Long Instruction Word (VLIW) processors. Simultaneous MultiThreading (SMT) is a popular technique which improves processor performance by issuing ...






Comments