Abstract
In this article we describe a Field Programmable Gate Array (FPGA)-based coprocessor architecture for Frequent Itemset Mining (FIM). FIM is a common data mining task used to find frequently occurring subsets amongst a database of sets. FIM is a nonnumerical, data intensive computation and is used in machine learning and computational biology. FIM is particularly expensive---in terms of execution time and memory---when performed on large and/or sparse databases or when applied using a low appearance frequency threshold. Because of this, the development of increasingly efficient FIM algorithms and their mapping to parallel architectures is an active field. Previous attempts to accelerate FIM using FPGAs have relied on performance-limiting strategies such as iterative database loading and runtime logic unit reconfiguration. In this article, we present a novel architecture to implement Eclat, a well-known FIM algorithm. Unlike previous efforts, our technique does not impose limits on the maximum set size as a function of available FPGA logic resources and our design scales well to multiple FPGAs. In addition to a novel hardware design, we also present a corresponding compression scheme for intermediate results that are stored in on-chip memory. On a four-FPGA board, experimental results show up to 68X speedup compared to a highly optimized software implementation.
- Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases. 487--499. Google Scholar
Digital Library
- Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google Scholar
Digital Library
- Alachiotis, N., Berger S. A., and Stamatakis, A. 2011. Accelerating phylogeny-aware short DNA read alignment with FPGAs. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM’11). Google Scholar
Digital Library
- Baker, Z. K. and Prasanna, V. K. 2005. Efficient hardware data mining with the Apriori algorithm on FPGAs. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM’05). 3--12. Google Scholar
Digital Library
- Baker, Z. K. and Prasanna, V. K. 2006. An Architecture for efficient hardware data mining using reconfigurable computing systems. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM’06). Google Scholar
Digital Library
- Bodon, F. 2003. A fast apriori implementation. In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.Google Scholar
- Bodon, F. 2006 A survey on frequent itemset mining. Tech. rep., Budapest University of Technology and Economics.Google Scholar
- Bodon, F. and Ronyai, L. 2003. Trie: An alternative data structure for data mining algorithms. Math. Comput. Model. 38, 7--9, 739--751. Google Scholar
Digital Library
- Borgelt, C. 2003. Efficient implementations of Apriori and Eclat. In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.Google Scholar
- FIMI Repository. 2003, Frequent itemset mining dataset repository. http://fimi.ua.ac.be/data.Google Scholar
- Fukuzaki, M., Seki, M., Kashima, H., and Sese, J. 2010. Finding itemset-sharing patterns in a large itemset-associated graph. In Proceedings of the 14th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Vol. II, 147--159. Google Scholar
Digital Library
- Gidel Ltd. 2009. PROStarIII Data Book. Version 1.0.Google Scholar
- Goethals, B. 2002. Survey on frequent pattern mining. Tech. rep., Helsinki Institute for Information Technology.Google Scholar
- Goethals, B. and Zaki, M. J. 2003. Advances in frequent itemset mining implementations: Introduction to FIMI03. In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.Google Scholar
- Han, J., Pei, J., and Yin, Y. 2000. Mining frequent patterns without candidate generation. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’00). ACM, New York, NY, 1--12. Google Scholar
Digital Library
- Heighton, J. 2006. Designing signal processing systems for FPGAs. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE’06). Google Scholar
Digital Library
- IBM. 2012. IBM synthetic data generator. http://sourceforge.net/projects/ibmquestdatagen/.Google Scholar
- Park, J. S., Chen, M. S., and Yu, P. S. 1997. Using a hash-based method with transaction trimming for mining association rules. IEEE Trans. Knowl. Data Eng. 9, 5, 813--825. Google Scholar
Digital Library
- Sun, S. and Zambreno, J. 2011. Design and analysis of a reconfigurable platform for frequent pattern mining. IEEE Trans. Parallel Distrib. Syst. 22, 9, 1497--1505. Google Scholar
Digital Library
- Sun, S., Steffen, M., and Zambreno, J. 2008. A reconfigurable platform for frequent pattern. In Proceedings of the International Conference Reconfigurable Computing and FPGAs (ReConFig’08). Google Scholar
Digital Library
- Thoni, D. W. and Strey, A. 2009. Novel strategies for hardware acceleration of frequent itemset mining with the Apriori algorithm. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’09).Google Scholar
- Wen, Y. H., Huang, J. W., and Chen, M. S. 2008. Hardware-enhanced association rule mining with hashing and pipelining. IEEE Trans. Knowl. Data Eng. 20, 6. Google Scholar
Digital Library
- Witten, I. and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufman, 27. Google Scholar
Digital Library
- Zaki, M. J. 2000. Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12, 3. Google Scholar
Digital Library
- Zhang, Y., Zhang, F., and Bakos, J. 2011. Frequent Itemset mining on large-scale shared memory machines. In Proceedings of the IEEE International Conference on Cluster Computing. 585--589. Google Scholar
Digital Library
- Zhou, L. and Prasanna, V. K. 2008. Scalable hybrid designs for linear algebra on reconfigurable computing systems. IEEE Trans. Comput. 57, 12. Google Scholar
Digital Library
Index Terms
An FPGA-Based Accelerator for Frequent Itemset Mining
Recommendations
Efficient and Versatile FPGA Acceleration of Support Counting for Stream Mining of Sequences and Frequent Itemsets
Stream processing has become extremely popular for analyzing huge volumes of data for a variety of applications, including IoT, social networks, retail, and software logs analysis. Streams of data are produced continuously and are mined to extract ...
Frequent itemset mining using cellular learning automata
A core issue of the association rule extracting process in the data mining field is to find the frequent patterns in the database of operational transactions. If these patterns discovered, the decision making process and determining strategies in ...
Mining of frequent itemsets with JoinFI-mine algorithm
AIKED'11: Proceedings of the 10th WSEAS international conference on Artificial intelligence, knowledge engineering and data basesAssociation rule mining among frequent items has been widely studied in data mining field. Many researches have improved the algorithm for generation of all the frequent itemsets. In this paper, we proposed a new algorithm to mine all frequents itemsets ...






Comments