skip to main content
research-article

Distributed Inference over Decision Tree Ensembles on Clusters of FPGAs

Authors Info & Claims
Published:09 September 2019Publication History
Skip Abstract Section

Abstract

Given the growth in data inputs and application complexity, it is often the case that a single hardware accelerator is not enough to solve a given problem. In particular, the computational demands and I/O of many tasks in machine learning often require a cluster of accelerators to make a relevant difference in performance. In this article, we explore the efficient construction of FPGA clusters using inference over Decision Tree Ensembles as the target application. The article explores several levels of the problem: (1) a lightweight inter-FPGA communication protocol and routing layer to facilitate the communication between the different FPGAs, (2) the data partitioning and distribution strategies maximizing performance, (3) and an in depth analysis on how applications can be efficiently distributed over such a cluster. The experimental analysis shows that the resulting system can support inference over decision tree ensembles at a significantly higher throughput than that achieved by existing systems.

References

  1. 2018. Amazon EC2 F1 Instances. Retrieved from https://aws.amazon.com/ec2/instance-types/f1/.Google ScholarGoogle Scholar
  2. 2018. Distributed Inference over Decision Tree Ensembles. Retrieved from https://github.com/fpgasystems/Distributed-DecisionTrees.Google ScholarGoogle Scholar
  3. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Retrieved from https://www.tensorflow.org/.Google ScholarGoogle Scholar
  4. Gustavo Alonso, Zsolt Istvan, Kaan Kara, Muhsen Owaida, and David Sidler. 2019. doppioDB 1.0: Machine learning inside a relational engine. IEEE DE Bull, 42, 2 (2019).Google ScholarGoogle Scholar
  5. Flora Amato, Mario Barbareschi, Valentina Casola, and Antonino Mazzeo. 2014. An FPGA-based smart classifier for decision support systems. In Proceedings of the ACM Conference on Interaction Design and Children (IDC’14).Google ScholarGoogle ScholarCross RefCross Ref
  6. Zachary K. Baker and Viktor K. Prasanna. 2005. Efficient hardware data mining with the apriori algorithm on FPGAs. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mario Barbareschi, Salvatore Del Prete, Francesco Gargiulo, Antonino Mazzeo, and Carlo Sansone. 2015. Decision tree-based multiple classifier systems: An FPGA perspective. In Proceedings of the International Workshop on Multiple Classifier Systems (MCS’15).Google ScholarGoogle ScholarCross RefCross Ref
  8. J. Castillo, Jose L. Bosque, E. Castillo, P. Huerta, and J.I. Martinez. 2009. Hardware accelerated montecarlo financial simulation over low cost FPGA cluster. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Adrian Caulfield, Eric Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Haselman Michael, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, Daniel Lo, Todd Massengill, Kalin Ovtcharov, Michael Papamichael, Lisa Woods, Sitaram Lanka, Derek Chiou, and Doug Burger. 2016. A cloud-scale acceleration architecture. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting system. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. Chung, J. Fowers, K. Ovtcharov, M. Papamichael, A. Caulfield, T. Massengill, M. Liu, D. Lo, S. Alkalay, M. Haselman, M. Abeydeera, L. Adams, H. Angepat, C. Boehn, D. Chiou, O. Firestein, A. Forin, K. S. Gatlin, M. Ghandi, S. Heil, K. Holohan, A. El Husseini, T. Juhasz, K. Kagi, R. Kovvuri, S. Lanka, F. van Megen, D. Mukhortov, P. Patel, B. Perez, A. Rapsang, S. Reinhardt, B. RouhaniA. Sapek, R. Seera, S. Shekar, B. Sridharan, G. Weisz, L. Woods, P. Yi Xiao, D. Zhang, R. Zhao, and D. Burger. 2018. Serving DNNs in real time at datacenter scale with project brainwave. IEEE Micro 38, 2 (Mar, 2018), 8--20.Google ScholarGoogle ScholarCross RefCross Ref
  12. Thomas G. Dietterich. 2000. Ensemble methods in machine learning. In Proceedings of the International Workshop on Multiple Classifier Systems (MCS’00). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Brian Van Essen, Chris Macaraeg, Maya Gokhale, and Ryan Prenger. 2012. Accelerating a random forest classifier: Multi-core, GP-GPU, or FPGA? In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Fowers, K. Ovtcharov, M. Papamichael, T. Massengill, M. Liu, D. Lo, S. Alkalay, M. Haselman, L. Adams, M. Ghandi, S. Heil, P. Patel, A. Sapek, G. Weisz, L. Woods, S. Lanka, S. K. Reinhardt, A. M. Caulfield, E. S. Chung, and D. Burger. 2018. A configurable cloud-scale DNN processor for real-time AI. In Proceedings of the 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA’18). 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jerome H. Friedman and Jacqueline J. Meulman. 2003. Multiple additive regression trees with application in epidemiology. Stat. Med. 22, 9 (Apr. 2003).Google ScholarGoogle ScholarCross RefCross Ref
  16. Tong Geng, Tianqi Wang, Ahmed Sanaullah, Chen Yang, Rui Xu, Rushi Patel, and Martin C. Herbordt. 2018. FPDeep: Acceleration and load balancing of CNN training on FPGA clusters. In Proceedings of the 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’18), 81--84.Google ScholarGoogle Scholar
  17. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR’16.Google ScholarGoogle Scholar
  18. Kaan Kara, Dan Alistarh, Ce Zhang, Onur Mutlu, and Gustavo Alonso. 2017. FPGA accelerated dense linear machine learning: A precision-convergence trade-off. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’17).Google ScholarGoogle ScholarCross RefCross Ref
  19. Kaan Kara, Ken Eguro, Ce Zhang, and Gustavo Alonso. 2018. ColumnML: Column-store machine learning with on-the-fly data transformation. In Proceedings of the International Conference on Very Large Data Bases (PVLDB’18).Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yoshiaki Kono, Kentaro Sano, and Satoru Yamamoto. 2012. Scalability analysis of tightly-coupled FPGA-cluster for Lattice Boltzmann computation. In Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL’12).Google ScholarGoogle ScholarCross RefCross Ref
  21. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems—Volume 1 (NIPS’12). Curran Associates Inc., USA, 1097--1105. http://dl.acm.org/citation.cfm?id=2999134.2999257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Rafal Kulaga and Mrek Gorgon. 2015. FPGA implementation of decision trees and tree ensembles for character recognition in Vivado HLS. Image Process. Commun. 19, 2 (Mar. 2015).Google ScholarGoogle Scholar
  23. Oskar Mencer, Kuen Hung Tsoi, and Stephen Craimer. 2009. Cube: A 512-FPGA cluster. In Proceedings of the Southern Programmable Logic Conference (SPL’09).Google ScholarGoogle ScholarCross RefCross Ref
  24. Alexey Natekin and Alois Knoll. 2013. Gradient boosting machines, a tutorial. Front. Neurorobot. 7, Dec. (2013), 21.Google ScholarGoogle Scholar
  25. Jason Oberg, Ken Eguro, and Ray Bittner. 2012. Random decision tree body part recognition using FPGAs. In Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL’12).Google ScholarGoogle ScholarCross RefCross Ref
  26. N. Oliver, R. R. Sharma, S. Chang, et al. 2011. A reconfigurable computing system based on a cache-coherent fabric. In Proceedings of the International Conference on ReConFigurable Computing and FPGAs (ReConFig’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kalin Ovtcharov, Olatunji Ruwase, Joo-Young Kim, Jeremy Fowers, Karin Strauss, and Eric Chung. 2015. Accelerating Deep Convolutional Neural Networks Using Specialized Hardware. Retrieved from https://www.microsoft.com/en-us/research/publication/accelerating-deep-convolutional-neural-networks-using-specialized-hardware/.Google ScholarGoogle Scholar
  28. M. Owaida and G. Alonso. 2018. Application partitioning on FPGA clusters: Inference over decision tree ensembles. In Proceedings of the 2018 28th International Conference on Field Programmable Logic and Applications (FPL’18). 1--8.Google ScholarGoogle Scholar
  29. Muhsen Owaida, David Sidler, Kan Kara, and Gustavo Alonso. 2017a. Centaur: A framework for hybrid CPU-FPGA databases. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’17).Google ScholarGoogle ScholarCross RefCross Ref
  30. Muhsen Owaida, Hantian Zhang, Ce Zhang, and Gustavo Alonso. 2017b. Scalable inference of decision tree ensembles: Flexible design for CPU-FPGA platforms. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’17).Google ScholarGoogle ScholarCross RefCross Ref
  31. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, et al. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, Oct. (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, and Derek et. al Chiou. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the International Symposium on Computer Architecture (ISCA’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yun R. Qu and Viktor K. Prasanna. 2014. Scalable and dynamically updatable lookup engine for decision-trees on FPGA. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’14).Google ScholarGoogle Scholar
  34. Struharik R. 2015. Decision tree ensemble hardware accelerators for embedded applications. In Proceedings of the IEEE 16 th International Symposium on Intelligent Systems and Informatics (SISY’15).Google ScholarGoogle ScholarCross RefCross Ref
  35. Fareena Saqib, Aindrik Dutta, and Jim Plusquellic. 2015. Pipelined decision tree classification accelerator implementation in FPGA (DT-CAIF). IEEE Trans. Comput. 64, 1 (Jan. 2015).Google ScholarGoogle ScholarCross RefCross Ref
  36. David Sidler, Zsolt István, Muhsen Owaida, and Gustavo Alonso. 2017a. Accelerating pattern matching queries in hybrid CPU-FPGA architectures. In Proceedings of the Conference of the Association for Computing Machinery Special Interest Group on Management of Data (SIGMOD’17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. David Sidler, Zsolt István, Muhsen Owaida, Kaan Kara, and Gustavo Alonso. 2017b. doppioDB: A hardware accelerated database. In Proceedings of the Conference of the Association for Computing Machinery Special Interest Group on Management of Data (SIGMOD’17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Naif Tarafdar, Thomas Lin, Eric Fukuda, Hadi Bannazadeh, Alberto Leon-Garcia, and Paul Chow. 2017. Enabling flexible network FPGA clusters in a heterogeneous cloud data center. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Tommy Tracy, Yao Fu, Indranil Roy, Eric Jonas, and Paul Glendenning. 2016. Towards machine learning on the automata processor. In Proceedings of the International Conference ISC High Performance (ISC’16).Google ScholarGoogle ScholarCross RefCross Ref
  40. Kuen Hung Tsoi and Wayne Luk. 2010. Axel: A heterogeneous cluster with FPGAs and GPUs. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Zeke Wang, Kaan Kara, Hantian Zhang, Gustavo Alonso, Ce Zhang, and Onur Mutlu. 2019. Accelerating generalized linear models with MLWeaving: A one-size-fits-all system for any-precision learning. In Proceedings of the International Conference on Very Large Data Bases (PVLDB’17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Chen Zhang, Peng Li2, Guangyu Sun, Yijin Guan1, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Chen Zhang, Di Wu, Jiayu Sun, Guangyu Sun, Guojie Luo, and Jason Cong. 2016. Energy-efficient CNN implementation on a deeply pipelined FPGA cluster. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’16). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Distributed Inference over Decision Tree Ensembles on Clusters of FPGAs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!