Abstract
FPGA-based data processing is becoming increasingly relevant in data centers, as the transformation of existing applications into dataflow architectures can bring significant throughput and power benefits. Furthermore, a tighter integration of computing and network is appealing, as it overcomes traditional bottlenecks between CPUs and network interfaces, and dramatically reduces latency.
In this article, we present the design of a novel hash table, a fundamental building block used in many applications, to enable data processing on FPGAs close to the network. We present a fully pipelined design capable of sustaining consistent 10Gbps line-rate processing by deploying a concurrent mechanism to handle hash collisions. We address additional design challenges such as support for a broad range of key sizes without stalling the pipeline through careful matching of lookup time with packet reception time. Finally, the design is based on a scalable architecture that can be easily parameterized to work with different memory types operating at different access speeds and latencies.
We have tested the proposed hash table in an FPGA-based memcached appliance implementing a main-memory key-value store in hardware. The hash table is used to index 2 million entries in 24GB of external DDR3 DRAM while sustaining 13 million requests per second, the maximum packet rate that can be achieved with UDP packets on a 10Gbps link for this application.
- Arvind Arasu, Spyros Blanas, Ken Eguro, Raghav Kaushik, Donald Kossmann, Ravi Ramamurthy, and Ramaratnam Venkatesan. 2013. Orthogonal security with Cipherbase. In Proceedings of the 6th Conference on Innovative Data Systems Research (CIDR).Google Scholar
- Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems. ACM, New York, NY, 53--64. Google Scholar
Digital Library
- Masanori Bando, N. Sertac Artan, and H. Jonathan Chao. 2009. Flashlook: 100-Gbps hash-tuned route lookup architecture. In Proceedings of the International Conference on High Performance Switching and Routing. IEEE, Los Alamitos, CA, 1--8. Google Scholar
Digital Library
- Michaela Blott, Kimon Karras, Ling Liu, Zsolt Istvan, Jeremia Baer, and Kees Vissers. 2013. Achieving 10Gbps line-rate key-value stores with FPGAs. In Proceedings of HotCloud’13: The 5th USENIX Workshop on Hot Topics in Cloud Computing.Google Scholar
- Andrei Broder and Michael Mitzenmacher. 2001. Using multiple hash functions to improve IP lookups. In Proceedings of INFOCOM 2001: The 20th Annual Joint Conference of the IEEE Computer and Communications Societies. IEEE, Los Alamitos, CA, 1454--1463.Google Scholar
Cross Ref
- Andrei Z. Broder and Anna R. Karlin. 1990. Multilevel adaptive hashing. In Proceedings of the 1st Annual ACM-SIAM Symposium on Discrete Algorithms. 43--53. Google Scholar
Digital Library
- Sai Rahul Chalamalasetti, Kevin Lim, Mitch Wright, Alvin AuYoung, Parthasarathy Ranganathan, and Martin Margala. 2013. An FPGA memcached appliance. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, New York, NY, 245--254. Google Scholar
Digital Library
- Convey. 2013. Ramping Up Web Server Memcached Capabilities with Hybrid-Core Computing. White Paper. Retrieved March 2, 2015, from http://www.conveycomputer.com/files/6113/7998/5068/CONV-13-047_MCD_whit epaper.pdf.Google Scholar
- Christopher Dennl, Daniel Ziener, and Jürgen Teich. 2013. Acceleration of SQL Restrictions and Aggregations through FPGA-Based Dynamic Partial Reconfiguration. In Proceedings of the IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, Los Alamitos, CA, 25--28. Google Scholar
Digital Library
- Brad Fitzpatrick. 2004. Distributed caching with memcached. Linux Journal 2004, 124, 72--74. Google Scholar
Digital Library
- Phil Francisco. 2011. The Netezza data appliance architecture: A platform for high performance data warehousing and analytics. IBM Redbook.Google Scholar
- Zsolt Istvan, Gustavo Alonso, Michaela Blott, and Kees Vissers. 2013. A flexible hash table design for 10Gbps key-value stores on FPGAs. In Proceedings of the 23rd International Conference on Field Programmable Logic and Applications (FPL). IEEE, Los Alamitos, CA, 1--8.Google Scholar
Cross Ref
- Bob Jenkins. 2006. Function for Producing 32bit Hashes for Hash Table Lookup. Retrieved March 2, 2015, from http://burtleburtle.net/bob/c/lookup3.c.Google Scholar
- Brian W. Kernighan, Dennis M. Ritchie, and Per Ejeklint. 1988. The C Programming Language, Vol. 2. Prentice Hall, Englewood Cliffs, NJ. Google Scholar
Digital Library
- Adam Kirsch, Michael Mitzenmacher, and Udi Wieder. 2009. More robust hashing: Cuckoo hashing with a stash. SIAM Journal on Computing 39, 4, 1543--1561.Google Scholar
Digital Library
- Maysam Lavasani, Hari Angepat, and Derek Chiou. 2013. An FPGA-based in-line accelerator for memcached. IEEE Computer Architecture Letters 2, 1.Google Scholar
- Memcached. 2013. Free and Open Source, High-Performance, Distributed Memory Object Caching System. Available at http://www.memcached.org/.Google Scholar
- Rene Mueller, Jens Teubner, and Gustavo Alonso. 2009. Streams on wires: A query compiler for FPGAs. Proceedings of the VLDB Endowment 2, 1, 229--240. Google Scholar
Digital Library
- Rasmus Pagh and Flemming Friche Rodler. 2004. Cuckoo hashing. Journal of Algorithms 51, 2, 122--144. Google Scholar
Digital Library
- Viktor Puš and Jan Korenek. 2009. Fast and scalable packet classification using perfect hash functions. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, New York, NY, 229--236. Google Scholar
Digital Library
- Ioannis Sourdis, Dionisios Pnevmatikatos, Stephan Wong, and Stamatis Vassiliadis. 2005. A reconfigurable perfect-hashing scheme for packet inspection. In Proceedings of the International Conference on Field Programmable Logic and Applications. IEEE, Los Alamitos, CA, 644--647.Google Scholar
Cross Ref
- Nicholas Weaver, Vern Paxson, and Jose M. Gonzalez. 2007. The Shunt: An FPGA-based accelerator for network intrusion prevention. In Proceedings of the 2007 ACM/SIGDA 15th International Symposium on Field Programmable Gate Arrays. ACM, New York, NY, 199--206. Google Scholar
Digital Library
- Alex Wiggins and Jimmy Langston. 2012. Enhancing the Scalability of Memcached. Retrieved March 2, 2015, from http://software.intel.com/en-us/articles/enhancing-the-scalability-of-memcached.Google Scholar
- Louis Woods, Zsolt Istvan, and Gustavo Alonso. 2014. Ibex: An intelligent storage engine with support for advanced SQL off-loading. Proceedings of the VLDB Endowment 7, 11, 963--974. Google Scholar
Digital Library
- Louis Woods, Jens Teubner, and Gustavo Alonso. 2010. Complex event detection at wire speed with FPGAs. Proceedings of the VLDB Endowment 3, 1 2, 660--669. Google Scholar
Digital Library
Index Terms
A Hash Table for Line-Rate Data Processing
Recommendations
Fast hash table lookup using extended bloom filter: an aid to network processing
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communicationsHash tables are fundamental components of several network processing algorithms and applications, including route lookup, packet classification, per-flow state management and network monitoring. These applications, which typically occur in the data-path ...
Leveraging reconfigurability in the hardware/software codesign process
Current technology allows designers to implement complete embedded computing systems on a single FPGA. Using an FPGA as the implementation platform introduces greater flexibility into the design process and allows a new approach to embedded system ...
Speedy FPGA-based packet classifiers with low on-chip memory requirements
FPGA '12: Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate ArraysThis article pursues speedy packet classification with low on-chip memory requirements realized on Xilinx Virtext-6 FPGA. Based on hashing round-down prefixes specified in filter rules (dubbed HaRP), our implemented classifier is demonstrated to exhibit ...






Comments