skip to main content
research-article

Compiler-Supported Thread Management for Multithreaded Network Processors

Published:01 November 2011Publication History
Skip Abstract Section

Abstract

Traditionally, runtime management involving CPU sharing, real-time scheduling, etc., is provided by the runtime environment (typically an operating system) using hardware support such as timers and interrupts. However, due to stringent performance requirements on network processors, neither OS nor hardware mechanisms are typically feasible/available. Mapping packet processing tasks on network processors involves complex trade-offs to maximize parallelism and pipelining. Due to an increase in the size of the code store and complexity of application requirements, network processors are being programmed with heterogeneous threads that may execute code belonging to different tasks on a given micro-engine. Also, most network applications are streaming applications that are typically processed in a pipelined fashion. Thus, the tasks on different micro-engines are pipelined in such a way as to maximize the throughput. Tasks themselves could have different runtime performance demands.

In this article, we focus on network processors on which hardware can only schedule threads in a round-robin fashion and no OS assistance is provided. We show that it is very difficult and inefficient for the programmer to meet the constraints of runtime management by coding them statically. Due to the infeasibility of hardware or OS solution (even in the near future), we undertake a compiler approach.

We propose a complete compiler solution to automatically insert explicit context switch (ctx) instructions provided on the network processor such that the execution of threads is better manipulated at runtime to meet their constraints. Two approaches are presented that can control programs’ runtime behavior with different applicability and overheads. We show that it is feasible and also opens new application domains that would need heterogeneous thread programming. Such approaches would in general become important for multicore processors.

Finally, our experiments show that the runtime constraints are enforced nearly ideally with minimal runtime degradation and small code growth.

References

  1. Abha, M. 1992. Voluntary preemption: A tool in the design of hard real-time systems. In Proceedings of the 2nd International Symposium on Formal Techniques in Real-Time and Fault-Tolerant Systems. 87--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Chen, M., Li, X., Lian, R., Lin, J., Liu, L., Liu, T., and Ju, R. 2005. Shangri-La: Achieving high performance from compiled network applications while enabling ease of programming. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Clark, C., Lee, W., Schimmel, D., Contis, D., Kone, M., and Thomas, A. 2004. A hardware platform for network intrusion detection and prevention. In Proceedings of the 3rd Workshop on Network Processors and Applications (NP3). 178.Google ScholarGoogle Scholar
  4. Crowley, P. 2004. Supporting mixed real-time workloads in multithreaded processors with segmented instruction caches. In Proceedings of the 10th High Performance Computer Architecture Workshop on Network Processors and Applications (HPCA’10). 1--13.Google ScholarGoogle Scholar
  5. Dai, J., Huang, B., Li, L., and Harrison, L. 2005. Automatically partitioning packet processing applications for pipelined architectures. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Demers, A., Keshav, S., and Shenker, S. 1989. Analysis and simulation of a fair queueing algorithm. In Proceedings of the ACM Symposium on Communications Architectures and Protocols (SIGCOMM’89). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. George, L. and Blume, M. 2003. Taming the IXP network processor. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hicks, M., Kakkar, P., Moore, J., Gunter, C., and Nettles, S. 1998. PLAN: A packet language for active networks. In Proceedings of the 3rd ACM SIGPLAN International Conference on Functional Programming (ICFP’98). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Intel Corporation. 2001a. IXP 1200 Network processor: Programmer’s reference manual. Part No. 278304-010.Google ScholarGoogle Scholar
  10. Intel Corporation. 2001b. IXP 1200 Network processor family: Hardware reference manual. Part No. 278303-009.Google ScholarGoogle Scholar
  11. Liu, C. and Layland, J. 1973. Scheduling algorithms for multiprogramming in a hard real-time environment. J. ACM 20, 40--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Liu, J., Kong, T., and Chow, F. 2002. Effective compilation support for variable instruction set architecture. In Proceedings of the 11th International Conference on Parallel Architectures and Compilation Techniques (PACT’02). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Memik, G., Mangione-Smith, W., and Hu, W. 2001. NetBench: A benchmarking suite for network processors. In Proceedings of International Conference on Computer-Aided Design. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Papadimitriou, C. and Yannakakis, M. 1991. Optimization, approximation and complexity classes. J. Comput. Syst. Sci. 43, 425--440.Google ScholarGoogle ScholarCross RefCross Ref
  15. Shi, W., Zhuang, X., Paul, I., and Schwan, K. 2002. Efficient implementation of packet scheduling algorithm on high-speed programmable network processors. In Proceedings of the 5th IFIP/IEEE International Conference on Management of Multimedia Networks and Services (MMNS’02). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Spalink, T., Karlin, S., Peterson, L., and Gottlieb, Y. 2001. Building a robust software-based router using network processors. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP’01). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Tullsen, D., Eggers, S., and Levy, H. 1995. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA’95). ACM, New York, 392--403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Wagner, J. and Leupers, R. 2001. C compiler design for an industrial network processor. In Proceedings of ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems (LCTES’01). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Welfeld, J. 2001, Network processing in content inspection applications. In Proceedings of the 14th International Symposium on System Synthesis (ISSS’01). 197--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. West, R. and Poellabauer, C. 2000. Analysis of a window-constrained scheduler for real-time and best-effort packet streams. In Proceedings of the 21st IEEE Real-Time Systems Symposium (RTSS’00). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. West, R. and Schwan, K. 1999. Dynamic window-constrained scheduling for multimedia applications. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems (ICMCS’99). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Wolf, T. and Franklin, M. CommBench--A telecommunication benchmark for network processors. In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’00). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Zhuang, X. and Liu, J. 2002. WRAPS scheduling and its efficient implementation on network processors. In Proceedings of International Conference on High Performance Computing (HiPC’02). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Zhuang, X. and Pande, S. 2003. Resolving register bank conflicts for a network processor. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT’03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zhuang, X. and Pande, S. 2004. Balancing register allocation across threads for a multithreaded network processor. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Zhuang, X. and Pande, S. 2006. A scalable priority queue architecture for high speed network processing. In Proceedings of 25th Conference on Computer Communications (INFOCOM’06). IEEE.Google ScholarGoogle Scholar

Index Terms

  1. Compiler-Supported Thread Management for Multithreaded Network Processors

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Article Metrics

          • Downloads (Last 12 months)2
          • Downloads (Last 6 weeks)1

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!