skip to main content
10.1145/3447548.3467368acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

BLOCKSET (Block-Aligned Serialized Trees): Reducing Inference Latency for Tree ensemble Deployment

Published: 14 August 2021 Publication History

Abstract

We present methods to serialize and deserialize gradient-boosted trees and random forests that optimize inference latency when models are not loaded into memory. This arises when models are larger than memory, but also systematically when models are deployed on low-resource devices in the Internet of Things or run as cloud microservices where resources are allocated on demand. Block-Aligned Serialized Trees (BLOCKSET) introduce the concept of selective access for random forests and gradient boosted trees in which only the parts of the model needed for inference are deserialized and loaded into memory. %BLOCKSET combines concepts from external memory algorithms and data-parallel %layouts of random forests that maximize I/O-density for in-memory models. Using principles from external memory algorithms, we block-align the serialization format in order to minimize the number of I/Os. For gradient boosted trees, this results in a more than five time reduction in inference latency over layouts that do not perform selective access and a 2 times latency reduction over techniques that are selective, but do not encode I/O block boundaries in the layout.

Supplementary Material

MP4 File (blockset_blockaligned_serialized_trees_reducing-meghana_madhyastha-kunal_lillaney-38957942-5mNx.mp4)
This is the presentation video accompanying our KDD 2021 paper titled "BLOCKSET (Block Aligned Serialized Trees): Reducing Inference Latency for Tree Ensemble Deployments"

References

[1]
Julaiti Alafate and Yoav S Freund. 2019. Faster Boosting with Smaller Memory. In Advances in Neural Information Processing Systems. 11367--11376.
[2]
Ganesh Ananthanarayanan, Paramvir Bahl, Peter Bod'ik, Krishna Chintalapudi, Matthai Philipose, Lenin Ravindranath, and Sudipta Sinha. 2017. Real-time video analytics: The killer app for edge computing. Computer, Vol. 50, 10 (2017), 58--67.
[3]
Andreea Anghel, Nikolas Ioannou, Thomas Parnell, Nikolaos Papandreou, Celestine Mendler-Dünner, and Haris Pozidis. 2019. Breadth-first, Depth-next Training of Random Forests. arXiv preprint arXiv:1910.06853 (2019).
[4]
Maarten A. Breddels. 2014. Vaex . https://docs.vaex.io/en/latest/tutorial.html
[5]
Leo Breiman. 2001. Random forests. Machine learning, Vol. 45, 1 (2001), 5--32.
[6]
James Browne, Disa Mhembere, Tyler M. Tomita, Joshua T. Vogelstein, and Randal C. Burns. 2019. Forest Packing: Fast Parallel, Decision Forests. In Proceedings of the 2019 SIAM International Conference on Data Mining, SDM 2019, Calgary, Alberta, Canada, May 2--4, 2019. SIAM, 46--54. https://doi.org/10.1137/1.9781611975673.6
[7]
Joao Carreira, Pedro Fonseca, Alexey Tumanov, Andrew Zhang, and Randy Katz. 2019. Cirrus: A Serverless Framework for End-to-End ML Workflows. In ACM Symposium on Cloud Computing . 13--24. https://doi.org/10.1145/3357223.3362711
[8]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting system. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 785--794.
[9]
Hyunsu Cho and Mu Li. 2018. Treelite: toolbox for decision tree deployment. (2018).
[10]
Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J Franklin, Joseph E Gonzalez, and Ion Stoica. 2017. Clipper: A low-latency online prediction serving system. In USENIX Symposium on Networked Systems Design and Implementation. 613--627.
[11]
Erik D Demaine. 2002. Cache-oblivious algorithms and data structures. Lecture Notes from the EEF Summer School on Massive Data Sets, Vol. 8, 4 (2002), 1--249.
[12]
Damian Eads, Paul Baines, and Joshua S Bloom. 2018. Memory-Efficient Data Structures for Learning and Prediction. In Machine Learning and Systems .
[13]
J. Feng, Y. Yu, and Z.-H. Zhou. 2018. Multi-Layered Gradient Boosting Decision Trees. In Neural Information Processing Systems .
[14]
Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim. 2014. Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, Vol. 15, 1 (2014), 3133--3181.
[15]
Fabian Gieseke and Christian Igel. 2018. Training big random forests with little resources. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . 1445--1454.
[16]
P. Gonzalez-Guerrero, T. Tracy, X. Guo, and M. R. Stan. 2019. Towards low-power random forest using asynchronous computing with streams. In International Green and Sustainable Computing Conference. 1--5. https://doi.org/10.1109/IGSC48788.2019.8957193
[17]
Google. 2020. Minimizing real-time prediction serving latency in machine learning. https://cloud.google.com/solutions/machine-learning/minimizing-predictive-serving- latency-in-machine-learning .
[18]
Joseph M Hellerstein, Jose Faleiro, Joseph E Gonzalez, Johann Schleier-Smith, Vikram Sreekanti, Alexey Tumanov, and Chenggang Wu. 2018. Serverless computing: One step forward, two steps back. arXiv preprint arXiv:1812.03651 (2018).
[19]
V. Ishakian, V. Muthusamy, and A. Slominski. 2018. Serving Deep Learning Models in a Serverless Platform. In 2018 IEEE International Conference on Cloud Engineering (IC2E). 257--262. https://doi.org/10.1109/IC2E.2018.00052
[20]
Eric Jonas, Shivaram Venkataraman, Ion Stoica, and Benjamin Recht. 2017. Occupy the Cloud: Distributed Computing for the 99% . CoRR, Vol. abs/1702.04024 (2017).
[21]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems. 3146--3154.
[22]
Changkyu Kim, Jatin Chhugani, Nadathur Satish, Eric Sedlar, Anthony D. Nguyen, Tim Kaldewey, Victor W. Lee, Scott A. Brandt, and Pradeep Dubey. 2011. Designing Fast Architecture-Sensitive Tree Search on Modern Multicore/Many-Core Processors. ACM Trans. Database Syst., Vol. 36, 4, Article 22 (Dec. 2011), bibinfonumpages34 pages. https://doi.org/10.1145/2043652.2043655
[23]
Ashish Kumar, Saurabh Goyal, and Manik Varma. 2017. Resource-efficient machine learning in 2 KB RAM for the Internet of Things. In International Conference on Machine Learning-Volume 70. 1935--1944.
[24]
H. Li, K. Ota, and M. Dong. 2018. Learning IoT in Edge: Deep Learning for the Internet of Things with Edge Computing. IEEE Network, Vol. 32, 1 (2018), 96--101. https://doi.org/10.1109/MNET.2018.1700202
[25]
V. Mehta. 2019. Accelerating Random Forests up to 45x using cuML. https://medium.com/rapids-ai/accelerating-random-forests-up-to-45x-using-cuml-dfb782a31bea.
[26]
Disa Mhembere, Da Zheng, Carey E. Priebe, Joshua T. Vogelstein, and Randal Burns. 2017. knor: A NUMA-optimized In-memory, Distributed and Semi-external-memory k-means Library. In High-Performance Parallel and Distributed Computing .
[27]
Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I Jordan, et almbox. 2018. Ray: A distributed framework for emerging AI applications. In Symposium on Operating Systems Design and Implementation .
[28]
Matteo Interlandi Nakandala, Markus Weimer. 2020. A Tensor Compiler Approach for One-size-fits-all ML Prediction Serving. In Symposium on Operating Systems Design and Implementation .
[29]
Christopher Olston, Noah Fiedel, Kiril Gorovoy, Jeremiah Harmsen, Li Lao, Fangwei Li, Vinu Rajashekhar, Sukriti Ramesh, and Jordan Soyke. 2017. Tensorflow-serving: Flexible, high-performance ML serving. arXiv preprint arXiv:1712.06139 (2017).
[30]
Amichai Painsky and Saharon Rosset. 2018. Lossless (and lossy) compression of random forests. arXiv preprint arXiv:1810.11197 (2018).
[31]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et almbox. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, Vol. 12, Oct (2011), 2825--2830.
[32]
Mauro Ribeiro, Katarina Grolinger, and Miriam Capretz. 2015. MLaaS: Machine Learning as a Service. https://doi.org/10.1109/ICMLA.2015.152
[33]
Steve Symanovich. 2020. The future of IoT: 10 predictions about the Internet of Things. https://us.norton.com/internetsecurity-iot-5-predictions-for-the-future-of-iot.html.
[34]
Georgios Tzimpragos, Advait Madhavan, Dilip Vasudevan, Dmitri Strukov, and Timothy Sherwood. 2019. Boosted race trees for low energy classification. In International Conference on Architectural Support for Programming Languages and Operating Systems . 215--228.
[35]
Lei Zhao, Quan Deng, Youtao Zhang, and Jun Yang. 2019. RFAcc: A 3D ReRAM associative array based random forest accelerator. In International Conference on Supercomputing. 473--483.
[36]
Z.-H. Zhou and J. Feng. 2017. Deep Forest: Towards an Alternative to Deep Neural Networks. In Proceedings of the International Joint Conference on Artificial Intelligence .

Cited By

View all
  • (2024)T-Rex (Tree-Rectangles): Reformulating Decision Tree Traversal as Hyperrectangle Enclosure2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00145(1792-1804)Online publication date: 13-May-2024
  • (2023)Accelerating Random Forest on Memory-Constrained Devices Through Data Storage OptimizationIEEE Transactions on Computers10.1109/TC.2022.321589872:6(1595-1609)Online publication date: 1-Jun-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
August 2021
4259 pages
ISBN:9781450383325
DOI:10.1145/3447548
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2021

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

  • Darpa

Conference

KDD '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)86
  • Downloads (Last 6 weeks)10
Reflects downloads up to 23 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)T-Rex (Tree-Rectangles): Reformulating Decision Tree Traversal as Hyperrectangle Enclosure2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00145(1792-1804)Online publication date: 13-May-2024
  • (2023)Accelerating Random Forest on Memory-Constrained Devices Through Data Storage OptimizationIEEE Transactions on Computers10.1109/TC.2022.321589872:6(1595-1609)Online publication date: 1-Jun-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media