Abstract
Machine learning has enabled many interesting applications and is extensively being used in big data systems. The popular approach - training machine learning models in frameworks like Tensorflow, Pytorch and Keras - requires movement of data from database engines to analytical engines, which adds an excessive overhead on data scientists and becomes a performance bottleneck for model training. In this demonstration, we give a practical exhibition of a solution for the enablement of distributed machine learning natively inside database engines. During the demo, the audience will interactively use Python APIs in Jupyter Notebooks to train multiple linear regression models on synthetic regression datasets and neural network models on vision and sensory datasets directly inside Teradata SQL Engine.
- Google BigQuery ML. https://cloud.google.com/bigquery/docs/bigquery. Accessed: 2019-03-14.Google Scholar
- Neural Networks in Microsoft SQL Server. https://docs.microsoft.com/en-us/sql/analysis-services/data-mining/microsoft-neural-network-algorithm. Accessed: 2019-03-14.Google Scholar
- Neural Networks in Oracle. https://docs.oracle.com/en/database/oracle/oracle-database/18/dmcon/neural-network.html. Accessed: 2019-03-14.Google Scholar
- C. Ballinger and R. Fryer. Born To Be Parallel: Why Parallel Origins Give Teradata an Enduring Performance Edge. IEEE Data Eng. Bull., 20(2):3--12, 1997.Google Scholar
- J. Catozzi and S. Rabinovici. Operating System Extensions for The Teradata Parallel VLDB. In VLDB, 2001. Google Scholar
Digital Library
- H. Gjoreski, M. Ciliberto, L. Wang, F. J. O. Morales, S. Mekki, S. Valentin, and D. Roggen. The university of sussex-huawei locomotion and transportation dataset for multimodal analytics with mobile devices. IEEE Access, 2018.Google Scholar
Cross Ref
- G. Hinton, N. Srivastava, and K. Swersky. Neural Networks for Machine Learning Lecture 6a Overview of Mini-batch Gradient Descent. Coursera Lecture slides, 2012.Google Scholar
- J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon. Federated Learning: Strategies for Improving Communication Efficiency. arXiv preprint arXiv:1610.05492, 2016.Google Scholar
- Y. LeCun. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998.Google Scholar
- Michael Stonebraker. The Case for Shared Nothing. IEEE Database Eng. Bull., 9(1):4--9, 1986.Google Scholar
- A. Qiao, A. Aghayev, W. Yu, H. Chen, Q. Ho, G. A. Gibson, and E. P. Xing. Litz: Elastic framework for high-performance distributed machine learning. In USENIX Annual Technical Conference, 2018. Google Scholar
Digital Library
- H. Xiao, K. Rasul, and R. Vollgraf. Fashion-mnist: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv preprint arXiv:1708.07747, 2017.Google Scholar
Index Terms
(auto-classified)In-database distributed machine learning: demonstration using Teradata SQL engine
Recommendations
Virtual Database Technology for Distributed Database
WAINA '10: Proceedings of the 2010 IEEE 24th International Conference on Advanced Information Networking and Applications WorkshopsIn this paper, our research objective is to develop a database virtualization technique so that data analysts or other users who apply data mining methods to their jobs can use all ubiquitous databases in the Internet as if they were recognized as a ...






Comments