skip to main content
research-article

Efficient and Versatile FPGA Acceleration of Support Counting for Stream Mining of Sequences and Frequent Itemsets

Authors Info & Claims
Published:27 May 2017Publication History
Skip Abstract Section

Abstract

Stream processing has become extremely popular for analyzing huge volumes of data for a variety of applications, including IoT, social networks, retail, and software logs analysis. Streams of data are produced continuously and are mined to extract patterns characterizing the data. A class of data mining algorithm, called generate-and-test, produces a set of candidate patterns that are then evaluated over data. The main challenges of these algorithms are to achieve high throughput, low latency, and reduced power consumption. In this article, we present a novel power-efficient, fast, and versatile hardware architecture whose objective is to monitor a set of target patterns to maintain their frequency over a stream of data. This accelerator can be used to accelerate data-mining algorithms, including itemsets and sequences mining.

The massive fine-grain reconfiguration capability of field-programmable gate array (FPGA) technologies is ideal to implement the high number of pattern-detection units needed for these intensive data-mining applications. We have thus designed and implemented an IP that features high-density FPGA occupation and high working frequency. We provide detailed description of the IP internal micro-architecture and its actual implementation and optimization for the targeted FPGA resources. We validate our architecture by developing a co-designed implementation of the Apriori Frequent Itemset Mining (FIM) algorithm, and perform numerous experiments against existing hardware and software solutions. We demonstrate that FIM hardware acceleration is particularly efficient for large and low-density datasets (i.e., long-tailed datasets). Our IP reaches a data throughput of 250 million items/s and monitors up to 11.6k patterns simultaneously, on a prototyping board that overall consumes 24W in the worst case. Furthermore, our hardware accelerator remains generic and can be integrated to other generate and test algorithms.

References

  1. Rakesh Agrawal, Ramakrishnan Srikant, et al. 1994. Fast algorithms for mining association rules. In Proceedings of the International Conference on Very Large Databases (VLDB’94), Vol. 1215. 487--499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Chris Anderson. 2006. The Long Tail: Why the Future of Business Is Selling Less of More. Hyperion. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Z. K. Baker and V. K. Prasanna. 2005. Efficient hardware data mining with the Apriori algorithm on FPGAs. In Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Z. K. Baker and V. K. Prasanna. 2006. An architecture for efficient hardware data mining using reconfigurable computing systems. In Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. 67--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Christian Borgelt. 2003. Efficient implementations of apriori and eclat. In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.Google ScholarGoogle Scholar
  6. Lázaro Bustio, René Cumplido, Raudel Hernández, José M. Bande, and Claudia Feregrino. 2016. Proceedings of the New Frontiers in Mining Complex Patterns: 4th International Workshop (NFMCP’16). Springer International Publishing, 32--45.Google ScholarGoogle Scholar
  7. Octavian Cret, Zsolt Mathe, Paul Ciobanu, Sonia Marginean, and Adrian Darabant. 2009. A hardware algorithm for the exact subsequence matching problem in DNA strings. Roman. J. Inform. Sci. Technol. 12, 1 (2009), 51--67.Google ScholarGoogle Scholar
  8. FIMI Repository. 2003. Frequent Itemset Mining Dataset Repository. Retrieved from http://fimi.ua.ac.be/data/.Google ScholarGoogle Scholar
  9. Xiaoqi Gu, Yongxin Zhu, Shengyan Zhou, Chaojun Wang, Meikang Qiu, and Guoxing Wang. 2016. A real-time FPGA-based accelerator for ECG analysis and diagnosis using association-rule mining. ACM Trans. Embed. Comput. Syst. 15, 2 (2016), 25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining frequent patterns without candidate generation. In ACM Sigmod Record, Vol. 29. ACM, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. IBM. 2012. IBM Quest Synthetic Data Generator. (2012). Retrieved from http://sourceforge.net/projects/ibmquestdatagen/.Google ScholarGoogle Scholar
  12. M. Jacobsen, D. Richmond, M. Hogains, and R. Kastner. 2015. RIFFA 2.1: A reusable integration framework for FPGA accelerators. ACM Trans. Reconfig. Technol. Syst. 8, 4 (Sept. 2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Micron, Inc. 2016. Micron Automata Developer Portal - Hardware. (2016). Retrieved from http://www.micronautomata.com/hardware.Google ScholarGoogle Scholar
  14. Micron Technology, Inc. 2013. Micron Automata Processor—A Brief Introduction.Google ScholarGoogle Scholar
  15. V. B. Nikam and B. B. Meshram. 2014. Scalable frequent itemset mining using heterogeneous computing: ParApriori algorithm. Int. J. Distrib. Parallel Syst. 5, 5 (2014), 13.Google ScholarGoogle ScholarCross RefCross Ref
  16. Shaobo Shi, Yue Qi, and Qin Wang. 2013. FPGA acceleration for intersection computation in frequent itemset mining. In Proceedings of the 2013 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery. 514--519. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Song Sun, M. Steffen, and J. Zambreno. 2008. A reconfigurable platform for frequent pattern mining. In Proceedings of the International Conference on Reconfigurable Computing and FPGAs. 55--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Sun and J. Zambreno. 2011. Design and analysis of a reconfigurable platform for frequent pattern mining. IEEE Trans. Parallel Distrib. Syst. 22, 9, 1497--1505. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. W. Thoni and Alfred Strey. 2009. Novel strategies for hardware acceleration of frequent itemset mining with the apriori algorithm. In Proceedings of the 2009 International Conference on Field Programmable Logic and Applications.Google ScholarGoogle ScholarCross RefCross Ref
  20. Takeaki Uno, Tatsuya Asai, Yuzo Uchida, and Hiroki Arimura. 2003. LCM: An efficient algorithm for enumerating frequent closed item sets. In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations.Google ScholarGoogle Scholar
  21. Takeaki Uno, Masashi Kiyomi, and Hiroki Arimura. 2005. LCM Ver.3: Collaboration of array, bitmap and prefix tree for frequent itemset mining. In Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations (OSDM’05). ACM, 77--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ke Wang, Yanjun Qi, J. J. Fox, M. R. Stan, and K. Skadron. 2015. Association rule mining with the micron automata processor. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium. 689--699. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ying-Hsiang Wen, Jen-Wei Huang, and Ming-Syan Chen. 2008. Hardware-enhanced association rule mining with hashing and pipelining. IEEE Trans. Knowl. Data Eng. 20, 6, 784--795. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xilinx Inc. 2015. Device Reliability Report—First Half 2015. Technical Report.Google ScholarGoogle Scholar
  25. Osmar Zaiane. 2014. Rich Data: Risks, Issues, Controversies 8 Hype. Keynote speech at the International Conference on Advanced Data Mining and Applications.Google ScholarGoogle Scholar
  26. Mohammed J. Zaki. 2000. Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12, 3, 372--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Mohammed J. Zaki, Srinivasan Parthasarathy, Mitsunori Ogihara, and Wei Li. 1997. New algorithms for fast discovery of association rules. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining. 283--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Fan Zhang, Yan Zhang, and Jason D. Bakos. 2013a. Accelerating frequent itemset mining on graphics processing units. J. Supercomput. 66, 1, 94--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yan Zhang, Fan Zhang, Zheming Jin, and Jason D. Bakos. 2013b. An FPGA-based accelerator for frequent itemset mining. ACM Trans. Reconfig. Technol. Syst. 6, 1, Article 2. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient and Versatile FPGA Acceleration of Support Counting for Stream Mining of Sequences and Frequent Itemsets

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!