Abstract
Stream processing has become extremely popular for analyzing huge volumes of data for a variety of applications, including IoT, social networks, retail, and software logs analysis. Streams of data are produced continuously and are mined to extract patterns characterizing the data. A class of data mining algorithm, called generate-and-test, produces a set of candidate patterns that are then evaluated over data. The main challenges of these algorithms are to achieve high throughput, low latency, and reduced power consumption. In this article, we present a novel power-efficient, fast, and versatile hardware architecture whose objective is to monitor a set of target patterns to maintain their frequency over a stream of data. This accelerator can be used to accelerate data-mining algorithms, including itemsets and sequences mining.
The massive fine-grain reconfiguration capability of field-programmable gate array (FPGA) technologies is ideal to implement the high number of pattern-detection units needed for these intensive data-mining applications. We have thus designed and implemented an IP that features high-density FPGA occupation and high working frequency. We provide detailed description of the IP internal micro-architecture and its actual implementation and optimization for the targeted FPGA resources. We validate our architecture by developing a co-designed implementation of the Apriori Frequent Itemset Mining (FIM) algorithm, and perform numerous experiments against existing hardware and software solutions. We demonstrate that FIM hardware acceleration is particularly efficient for large and low-density datasets (i.e., long-tailed datasets). Our IP reaches a data throughput of 250 million items/s and monitors up to 11.6k patterns simultaneously, on a prototyping board that overall consumes 24W in the worst case. Furthermore, our hardware accelerator remains generic and can be integrated to other generate and test algorithms.
- Rakesh Agrawal, Ramakrishnan Srikant, et al. 1994. Fast algorithms for mining association rules. In Proceedings of the International Conference on Very Large Databases (VLDB’94), Vol. 1215. 487--499. Google Scholar
Digital Library
- Chris Anderson. 2006. The Long Tail: Why the Future of Business Is Selling Less of More. Hyperion. Google Scholar
Digital Library
- Z. K. Baker and V. K. Prasanna. 2005. Efficient hardware data mining with the Apriori algorithm on FPGAs. In Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. 3--12. Google Scholar
Digital Library
- Z. K. Baker and V. K. Prasanna. 2006. An architecture for efficient hardware data mining using reconfigurable computing systems. In Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. 67--75. Google Scholar
Digital Library
- Christian Borgelt. 2003. Efficient implementations of apriori and eclat. In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.Google Scholar
- Lázaro Bustio, René Cumplido, Raudel Hernández, José M. Bande, and Claudia Feregrino. 2016. Proceedings of the New Frontiers in Mining Complex Patterns: 4th International Workshop (NFMCP’16). Springer International Publishing, 32--45.Google Scholar
- Octavian Cret, Zsolt Mathe, Paul Ciobanu, Sonia Marginean, and Adrian Darabant. 2009. A hardware algorithm for the exact subsequence matching problem in DNA strings. Roman. J. Inform. Sci. Technol. 12, 1 (2009), 51--67.Google Scholar
- FIMI Repository. 2003. Frequent Itemset Mining Dataset Repository. Retrieved from http://fimi.ua.ac.be/data/.Google Scholar
- Xiaoqi Gu, Yongxin Zhu, Shengyan Zhou, Chaojun Wang, Meikang Qiu, and Guoxing Wang. 2016. A real-time FPGA-based accelerator for ECG analysis and diagnosis using association-rule mining. ACM Trans. Embed. Comput. Syst. 15, 2 (2016), 25. Google Scholar
Digital Library
- Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining frequent patterns without candidate generation. In ACM Sigmod Record, Vol. 29. ACM, 1--12. Google Scholar
Digital Library
- IBM. 2012. IBM Quest Synthetic Data Generator. (2012). Retrieved from http://sourceforge.net/projects/ibmquestdatagen/.Google Scholar
- M. Jacobsen, D. Richmond, M. Hogains, and R. Kastner. 2015. RIFFA 2.1: A reusable integration framework for FPGA accelerators. ACM Trans. Reconfig. Technol. Syst. 8, 4 (Sept. 2015). Google Scholar
Digital Library
- Micron, Inc. 2016. Micron Automata Developer Portal - Hardware. (2016). Retrieved from http://www.micronautomata.com/hardware.Google Scholar
- Micron Technology, Inc. 2013. Micron Automata Processor—A Brief Introduction.Google Scholar
- V. B. Nikam and B. B. Meshram. 2014. Scalable frequent itemset mining using heterogeneous computing: ParApriori algorithm. Int. J. Distrib. Parallel Syst. 5, 5 (2014), 13.Google Scholar
Cross Ref
- Shaobo Shi, Yue Qi, and Qin Wang. 2013. FPGA acceleration for intersection computation in frequent itemset mining. In Proceedings of the 2013 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery. 514--519. Google Scholar
Digital Library
- Song Sun, M. Steffen, and J. Zambreno. 2008. A reconfigurable platform for frequent pattern mining. In Proceedings of the International Conference on Reconfigurable Computing and FPGAs. 55--60. Google Scholar
Digital Library
- S. Sun and J. Zambreno. 2011. Design and analysis of a reconfigurable platform for frequent pattern mining. IEEE Trans. Parallel Distrib. Syst. 22, 9, 1497--1505. Google Scholar
Digital Library
- D. W. Thoni and Alfred Strey. 2009. Novel strategies for hardware acceleration of frequent itemset mining with the apriori algorithm. In Proceedings of the 2009 International Conference on Field Programmable Logic and Applications.Google Scholar
Cross Ref
- Takeaki Uno, Tatsuya Asai, Yuzo Uchida, and Hiroki Arimura. 2003. LCM: An efficient algorithm for enumerating frequent closed item sets. In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.Google Scholar
- Takeaki Uno, Masashi Kiyomi, and Hiroki Arimura. 2005. LCM Ver.3: Collaboration of array, bitmap and prefix tree for frequent itemset mining. In Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations (OSDM’05). ACM, 77--86. Google Scholar
Digital Library
- Ke Wang, Yanjun Qi, J. J. Fox, M. R. Stan, and K. Skadron. 2015. Association rule mining with the micron automata processor. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium. 689--699. Google Scholar
Digital Library
- Ying-Hsiang Wen, Jen-Wei Huang, and Ming-Syan Chen. 2008. Hardware-enhanced association rule mining with hashing and pipelining. IEEE Trans. Knowl. Data Eng. 20, 6, 784--795. Google Scholar
Digital Library
- Xilinx Inc. 2015. Device Reliability Report—First Half 2015. Technical Report.Google Scholar
- Osmar Zaiane. 2014. Rich Data: Risks, Issues, Controversies 8 Hype. Keynote speech at the International Conference on Advanced Data Mining and Applications.Google Scholar
- Mohammed J. Zaki. 2000. Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12, 3, 372--390. Google Scholar
Digital Library
- Mohammed J. Zaki, Srinivasan Parthasarathy, Mitsunori Ogihara, and Wei Li. 1997. New algorithms for fast discovery of association rules. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining. 283--286. Google Scholar
Digital Library
- Fan Zhang, Yan Zhang, and Jason D. Bakos. 2013a. Accelerating frequent itemset mining on graphics processing units. J. Supercomput. 66, 1, 94--117. Google Scholar
Digital Library
- Yan Zhang, Fan Zhang, Zheming Jin, and Jason D. Bakos. 2013b. An FPGA-based accelerator for frequent itemset mining. ACM Trans. Reconfig. Technol. Syst. 6, 1, Article 2. Google Scholar
Digital Library
Index Terms
Efficient and Versatile FPGA Acceleration of Support Counting for Stream Mining of Sequences and Frequent Itemsets
Recommendations
FPGA/GPU-based Acceleration for Frequent Itemsets Mining: A Comprehensive Review
In data mining, Frequent Itemsets Mining is a technique used in several domains with notable results. However, the large volume of data in modern datasets increases the processing time of Frequent Itemset Mining algorithms, making them unsuitable for many ...
An FPGA-Based Accelerator for Frequent Itemset Mining
In this article we describe a Field Programmable Gate Array (FPGA)-based coprocessor architecture for Frequent Itemset Mining (FIM). FIM is a common data mining task used to find frequently occurring subsets amongst a database of sets. FIM is a ...
Mining of frequent itemsets with JoinFI-mine algorithm
AIKED'11: Proceedings of the 10th WSEAS international conference on Artificial intelligence, knowledge engineering and data basesAssociation rule mining among frequent items has been widely studied in data mining field. Many researches have improved the algorithm for generation of all the frequent itemsets. In this paper, we proposed a new algorithm to mine all frequents itemsets ...






Comments