skip to main content
research-article

In-database distributed machine learning: demonstration using Teradata SQL engine

Published:01 August 2019Publication History
Skip Abstract Section

Abstract

Machine learning has enabled many interesting applications and is extensively being used in big data systems. The popular approach - training machine learning models in frameworks like Tensorflow, Pytorch and Keras - requires movement of data from database engines to analytical engines, which adds an excessive overhead on data scientists and becomes a performance bottleneck for model training. In this demonstration, we give a practical exhibition of a solution for the enablement of distributed machine learning natively inside database engines. During the demo, the audience will interactively use Python APIs in Jupyter Notebooks to train multiple linear regression models on synthetic regression datasets and neural network models on vision and sensory datasets directly inside Teradata SQL Engine.

References

  1. Google BigQuery ML. https://cloud.google.com/bigquery/docs/bigquery. Accessed: 2019-03-14.Google ScholarGoogle Scholar
  2. Neural Networks in Microsoft SQL Server. https://docs.microsoft.com/en-us/sql/analysis-services/data-mining/microsoft-neural-network-algorithm. Accessed: 2019-03-14.Google ScholarGoogle Scholar
  3. Neural Networks in Oracle. https://docs.oracle.com/en/database/oracle/oracle-database/18/dmcon/neural-network.html. Accessed: 2019-03-14.Google ScholarGoogle Scholar
  4. C. Ballinger and R. Fryer. Born To Be Parallel: Why Parallel Origins Give Teradata an Enduring Performance Edge. IEEE Data Eng. Bull., 20(2):3--12, 1997.Google ScholarGoogle Scholar
  5. J. Catozzi and S. Rabinovici. Operating System Extensions for The Teradata Parallel VLDB. In VLDB, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Gjoreski, M. Ciliberto, L. Wang, F. J. O. Morales, S. Mekki, S. Valentin, and D. Roggen. The university of sussex-huawei locomotion and transportation dataset for multimodal analytics with mobile devices. IEEE Access, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  7. G. Hinton, N. Srivastava, and K. Swersky. Neural Networks for Machine Learning Lecture 6a Overview of Mini-batch Gradient Descent. Coursera Lecture slides, 2012.Google ScholarGoogle Scholar
  8. J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon. Federated Learning: Strategies for Improving Communication Efficiency. arXiv preprint arXiv:1610.05492, 2016.Google ScholarGoogle Scholar
  9. Y. LeCun. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998.Google ScholarGoogle Scholar
  10. Michael Stonebraker. The Case for Shared Nothing. IEEE Database Eng. Bull., 9(1):4--9, 1986.Google ScholarGoogle Scholar
  11. A. Qiao, A. Aghayev, W. Yu, H. Chen, Q. Ho, G. A. Gibson, and E. P. Xing. Litz: Elastic framework for high-performance distributed machine learning. In USENIX Annual Technical Conference, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Xiao, K. Rasul, and R. Vollgraf. Fashion-mnist: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv preprint arXiv:1708.07747, 2017.Google ScholarGoogle Scholar

Index Terms

(auto-classified)
  1. In-database distributed machine learning: demonstration using Teradata SQL engine

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 12, Issue 12
      August 2019
      547 pages

      Publisher

      VLDB Endowment

      Publication History

      • Published: 1 August 2019
      Published in pvldb Volume 12, Issue 12

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!