skip to main content
research-article

An FPGA-Based Accelerator for Frequent Itemset Mining

Published:01 May 2013Publication History
Skip Abstract Section

Abstract

In this article we describe a Field Programmable Gate Array (FPGA)-based coprocessor architecture for Frequent Itemset Mining (FIM). FIM is a common data mining task used to find frequently occurring subsets amongst a database of sets. FIM is a nonnumerical, data intensive computation and is used in machine learning and computational biology. FIM is particularly expensive---in terms of execution time and memory---when performed on large and/or sparse databases or when applied using a low appearance frequency threshold. Because of this, the development of increasingly efficient FIM algorithms and their mapping to parallel architectures is an active field. Previous attempts to accelerate FIM using FPGAs have relied on performance-limiting strategies such as iterative database loading and runtime logic unit reconfiguration. In this article, we present a novel architecture to implement Eclat, a well-known FIM algorithm. Unlike previous efforts, our technique does not impose limits on the maximum set size as a function of available FPGA logic resources and our design scales well to multiple FPGAs. In addition to a novel hardware design, we also present a corresponding compression scheme for intermediate results that are stored in on-chip memory. On a four-FPGA board, experimental results show up to 68X speedup compared to a highly optimized software implementation.

References

  1. Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases. 487--499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alachiotis, N., Berger S. A., and Stamatakis, A. 2011. Accelerating phylogeny-aware short DNA read alignment with FPGAs. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Baker, Z. K. and Prasanna, V. K. 2005. Efficient hardware data mining with the Apriori algorithm on FPGAs. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM’05). 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Baker, Z. K. and Prasanna, V. K. 2006. An Architecture for efficient hardware data mining using reconfigurable computing systems. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM’06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bodon, F. 2003. A fast apriori implementation. In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.Google ScholarGoogle Scholar
  7. Bodon, F. 2006 A survey on frequent itemset mining. Tech. rep., Budapest University of Technology and Economics.Google ScholarGoogle Scholar
  8. Bodon, F. and Ronyai, L. 2003. Trie: An alternative data structure for data mining algorithms. Math. Comput. Model. 38, 7--9, 739--751. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Borgelt, C. 2003. Efficient implementations of Apriori and Eclat. In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.Google ScholarGoogle Scholar
  10. FIMI Repository. 2003, Frequent itemset mining dataset repository. http://fimi.ua.ac.be/data.Google ScholarGoogle Scholar
  11. Fukuzaki, M., Seki, M., Kashima, H., and Sese, J. 2010. Finding itemset-sharing patterns in a large itemset-associated graph. In Proceedings of the 14th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Vol. II, 147--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gidel Ltd. 2009. PROStarIII Data Book. Version 1.0.Google ScholarGoogle Scholar
  13. Goethals, B. 2002. Survey on frequent pattern mining. Tech. rep., Helsinki Institute for Information Technology.Google ScholarGoogle Scholar
  14. Goethals, B. and Zaki, M. J. 2003. Advances in frequent itemset mining implementations: Introduction to FIMI03. In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.Google ScholarGoogle Scholar
  15. Han, J., Pei, J., and Yin, Y. 2000. Mining frequent patterns without candidate generation. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’00). ACM, New York, NY, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Heighton, J. 2006. Designing signal processing systems for FPGAs. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE’06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. IBM. 2012. IBM synthetic data generator. http://sourceforge.net/projects/ibmquestdatagen/.Google ScholarGoogle Scholar
  18. Park, J. S., Chen, M. S., and Yu, P. S. 1997. Using a hash-based method with transaction trimming for mining association rules. IEEE Trans. Knowl. Data Eng. 9, 5, 813--825. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sun, S. and Zambreno, J. 2011. Design and analysis of a reconfigurable platform for frequent pattern mining. IEEE Trans. Parallel Distrib. Syst. 22, 9, 1497--1505. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sun, S., Steffen, M., and Zambreno, J. 2008. A reconfigurable platform for frequent pattern. In Proceedings of the International Conference Reconfigurable Computing and FPGAs (ReConFig’08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Thoni, D. W. and Strey, A. 2009. Novel strategies for hardware acceleration of frequent itemset mining with the Apriori algorithm. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’09).Google ScholarGoogle Scholar
  22. Wen, Y. H., Huang, J. W., and Chen, M. S. 2008. Hardware-enhanced association rule mining with hashing and pipelining. IEEE Trans. Knowl. Data Eng. 20, 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Witten, I. and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufman, 27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Zaki, M. J. 2000. Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zhang, Y., Zhang, F., and Bakos, J. 2011. Frequent Itemset mining on large-scale shared memory machines. In Proceedings of the IEEE International Conference on Cluster Computing. 585--589. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Zhou, L. and Prasanna, V. K. 2008. Scalable hybrid designs for linear algebra on reconfigurable computing systems. IEEE Trans. Comput. 57, 12. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An FPGA-Based Accelerator for Frequent Itemset Mining

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in

                  Full Access

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader
                  About Cookies On This Site

                  We use cookies to ensure that we give you the best experience on our website.

                  Learn more

                  Got it!