skip to main content
article

Optimizing software cache performance of packet processing applications

Published:13 June 2007Publication History
Skip Abstract Section

Abstract

Network processors (NPs) are widely used in many types of networking equipment due to their high performance and flexibility. For most NPs, software cache is used instead of hardware cache due to the chip area, cost and power constraints. Therefore, programmers should take full responsibility for software cache management which is neither intuitive nor easy to most of them. Actually, without an effective use of it, long memory access latency will be a critical limiting factor to overall applications. Prior researches like hardware multi-threading, wide-word accesses and packet access combination for caching have already been applied to help programmers to overcome this bottleneck. However, most of them do not make enough use of the characteristics of packet processing applications and often perform intraprocedural optimizations only. As a result, the binary codes generated by those techniques often get lower performance than that comes from hand-tuned assembly programming for some applications. In this paper, we propose an algorithm including two techniques - Critical Path Based Analysis (CPBA) and Global Adaptive Localization (GAL), to optimize the software cache performance of packet processing applications. Packet processing applications usually have several hot paths and CPBA tries to insert localization instructions according to their execution frequencies. For further optimizations, GAL eliminates some redundant localization instructions by interprocedural analysis and optimizations. Our algorithm is applied on some representative applications. Experiment results show that it leads to an average speedup by a factor of 1.974.

References

  1. Product Brief -- Intel IXA SDK 4.3. http://download.intel.com/design/network/ProdBrf/30116605.pdf.Google ScholarGoogle Scholar
  2. Ageres PayloadPlus family of network processors. http://www.agere.com/telecom/network processors.html.Google ScholarGoogle Scholar
  3. AMCCs nP7xxx series of network processors. http://www.mmcnetworks.com/solutions/.Google ScholarGoogle Scholar
  4. J.-L. Baer and T.-F. Chen. Effective hardware-based data prefetching for high-performance processors. IEEE Trans. Comput., 44(5):609--623, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J.-L. Baer, D. Low, P. Crowley, and N. Sidhwaney. Memory hierarchy design for a multiprocessor look-up engine. In PACT'03: Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, page 206, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. cker Chiueh and P. Pradhan. High performance IP routing table lookup using CPU caching. In INFOCOM (3), pages 1421--1428, 1999.Google ScholarGoogle Scholar
  7. CPort network processor family. http://www.windriver.com/cgibin/partnerships/directory/viewProd.cgi?id=1371.Google ScholarGoogle Scholar
  8. J. Dai, B. Huang, L. Li, and L. Harrison. Automatically partitioning packet processing applications for pipelined architectures. In PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 237--248, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. W. Davidson and S. Jinturkar. Memory access coalescing: A technique for eliminating redundant memory accesses. In SIGPLAN Conference on Programming Language Design and Implementation, pages 186--195, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. C. J. J. E. Kohler, R. Morris and M. F. Kaashoek. The click modular router. In Transactions on Computer Systems, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. C. Feldemeir. Improving gateway performance with a routing-table cache. In Proceedings of IEEE INFOCOMM'88, March 1988.Google ScholarGoogle ScholarCross RefCross Ref
  12. D. Gay, P. Levis, R. von Behren, M. Welsh, E. Brewer, and D. Culler. The nesc language: A holistic approach to networked embedded systems, 2003.Google ScholarGoogle Scholar
  13. J. Hasan, S. Chandra, and T. Vijaykumar. Efficient use of memory bandwidth to improve network processor throughput, 2003.Google ScholarGoogle Scholar
  14. C. Hoare. Communicating sequential processes. In Prentice Hall International Series in Computer Science, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. IBM PowerNP network processors. http://www-3.ibm.com/chips/techlib/techlib.nsf/products/IBM PowerNP NP4GS3.Google ScholarGoogle Scholar
  16. Intel Internet Exchange Architecture Software Development Kit 4.3. http://www.intel.com/design/network/products/npfamily/sdk.htm.Google ScholarGoogle Scholar
  17. Intel IXP family of network processors. http://www.intel.com/design/network/products/npfamily/index.htm.Google ScholarGoogle Scholar
  18. Introduction to the Auto-Partitioning programming model. http://www.intel.com/design/network/papers/25411401.pdf.Google ScholarGoogle Scholar
  19. Intel C Compiler for Intel Network Processors -- Autopartitioning Mode User's Guide. http://www.intel.com.Google ScholarGoogle Scholar
  20. S. Iyer, R. Kompella, and N. McKeown. Analysis of a memory architecture for fast packet buffers, 2001.Google ScholarGoogle Scholar
  21. H. V. J. Mudigonda and R. Yavatkar. A Case for Data Caching in Network Processors. http://www.cs.utexas.edu/users/vin/pub/pdf/mudigonda04case.pdf.Google ScholarGoogle Scholar
  22. R. Jain. Characteristics of destination address locality in computer networks: A comparison of caching schemes. Computer Networks and ISDN Systems, 18(4):243--254, 1989/1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Li, B. Huang, J. Dai, and L. Harrison. Automatic multithreading and multiprocessing of c programs for ixp. In PPoPP '05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 132--141, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Liu, X.-F. Li, L. Liu, C.Wu, and R. Ju. Optimizing packet accesses for a domain specific language on network processors. In LCPC '05: International Workshop on Languages and Compilers for Parallel computing, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Luo, L. N. Bhuyan, and X. Chen. Shared memory multiprocessor architectures for software ip routers.Google ScholarGoogle Scholar
  26. E. B. M. Ruiz-Sanchez andW. Dabbous. Survey and Taxonomy of IP Address Lookup Algorithms. IEEE Network Magazine, March 2001.Google ScholarGoogle Scholar
  27. S. A. McKee, R. H. Klenke, K. L. Wright,W. A.Wulf, M. H. Salinas, J. H. Aylor, and A. P. Batson. Smarter memory: Improving bandwidth for streamed references. Computer, 31(7):54--63, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Mudigonda, H. M. Vin, and R. Yavatkar. Overcoming the memory wall in packet processing: hammers or ladders? In ANCS '05: Proceedings of the 2005 symposium on Architecture for networking and communications systems, pages 1--10, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Network Processing Forum. IPSec Forwarding Application-Level Benchmark. http://www.oiforum.com/public/documents/IPSec Forward BM IA.pdf.Google ScholarGoogle Scholar
  30. Network Processing Forum. IPv4 Forwarding Benchmark. http://www.oiforum.com/public/documents/IPv4IARev.pdf.Google ScholarGoogle Scholar
  31. T. Sherwood, G. Varghese, and B. Calder. A pipelined memory architecture for high throughput network processors, 2003.Google ScholarGoogle Scholar
  32. S. Udayakumaran, A. Dominguez, and R. Barua. Dynamic allocation for scratch-pad memory using compile-time decisions. Trans. on Embedded Computing Sys., 5(2):472--511, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. T. Wolf and M. A. Franklin. Design tradeoffs for embedded network processors. In ARCS, pages 149--164, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. W. A. Wulf and S. A. McKee. Hitting the memory wall: Implications of the obvious. Computer Architecture News, 23(1):20--24, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. W. Zhang, G. Chen, M. Kandemir, and M. Karakoy. Interprocedural optimizations for improving data cache performance of array intensive embedded applications. In DAC '03: Proceedings of the 40th conference on Design automation, pages 887--892, New York, NY, USA, 2003. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. B. Zheng, J.-Y. Tsai, B. Y. Zhang, T. Chen, B. Huang, J. H. Li, Y. H. Ding, J. Liang, Y. Zhen, P.-C. Yew, and C.-Q. Zhu. Designing the agassiz compiler for concurrent multithreaded architectures. In Languages and Compilers for Parallel Computing, pages 380--398, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimizing software cache performance of packet processing applications

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 42, Issue 7
      Proceedings of the 2007 LCTES conference
      July 2007
      241 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1273444
      Issue’s Table of Contents
      • cover image ACM Conferences
        LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
        June 2007
        258 pages
        ISBN:9781595936325
        DOI:10.1145/1254766

      Copyright © 2007 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 June 2007

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!