Declarative Recursive Computation on an RDBMS or, Why You Should Use a Database For Distributed Machine Learning

Dimitrije Jankov, Shangyu Luo, Binhang Yuan, Zhuhua Cai, Jia Zou, Chris Jermaine, Zekai J. Gao

Research output: Contribution to journalConference articlepeer-review

26 Scopus citations

Abstract

A number of popular systems, most notably Google's TensorFlow, have been implemented from the ground up to support machine learning tasks. We consider how to make a very small set of changes to a modern relational database management system (RDBMS) to make it suitable for distributed learning computations. Changes include adding better support for recursion, and optimization and execution of very large compute plans. We also show that there are key advantages to using an RDBMS as a machine learning platform. In particular, learning based on a database management system allows for trivial scaling to large data sets and especially large models, where different computational units operate on different parts of a model that may be too large to fit into RAM.

Original languageEnglish (US)
Pages (from-to)822-835
Number of pages14
JournalProceedings of the VLDB Endowment
Volume12
Issue number7
DOIs
StatePublished - 2018
Externally publishedYes
Event45th International Conference on Very Large Data Bases, VLDB 2019 - Los Angeles, United States
Duration: Aug 26 2017Aug 30 2017

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • General Computer Science

Fingerprint

Dive into the research topics of 'Declarative Recursive Computation on an RDBMS or, Why You Should Use a Database For Distributed Machine Learning'. Together they form a unique fingerprint.

Cite this