Database Meets Deep Learning: Challenges and Opportunities

  title={Database Meets Deep Learning: Challenges and Opportunities},
  author={Wei Wang and Meihui Zhang and Gang Chen and Hosagrahar V. Jagadish and Beng Chin Ooi and Kian-Lee Tan},
Deep learning has recently become very popular on account of its incredible success in many complex datadriven applications, including image classification and speech recognition. The database community has worked on data-driven applications for many years, and therefore should be playing a lead role in supporting this new wave. However, databases and deep learning are different in terms of both techniques and applications. In this paper, we discuss research problems at the intersection of the… 

Figures and Tables from this paper

In-Machine-Learning Database: Reimagining Deep Learning with Old-School SQL

  • Len Du
  • Computer Science
  • 2020
This work says "yes" by applying plain old SQL to deep learning, in a sense implementing deep learning algorithms with SQL, finding a way to express common deep learning operations in SQL, encouraging a different way of thinking and thus potentially novel models.

Deep Learning Through the Lens of Classical SQL

In-database machine learning has been very popular, almost being a cliche. However, can we do it the other way around? In this work, we say “yes” by applying plain old SQL to Deep Learning (DL), in a

Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools

This survey performs a broad and thorough investigation on challenges, techniques and tools for scalable DL on distributed infrastructures, and highlights future research trends in DL systems that deserve further research.

Scalable Deep Learning on Distributed Infrastructures

This survey performs a broad and thorough investigation on challenges, techniques and tools for scalable DL on distributed infrastructures, and highlights future research trends in DL systems that deserve further research.

SINGA-Easy: An Easy-to-Use Framework for MultiModal Analysis

SINGA-Easy is introduced, a new deep learning framework that provides distributed hyper-parameter tuning at the training stage, dynamic computational cost control at the inference stage, and intuitive user interactions with multimedia contents facilitated by model explanation.

Engineering Challenges of Deep Learning

The challenges identified in this paper can be used to guide future research by the software engineering and DL communities and could enable a large number of companies to start taking advantage of the high potential of the DL technology.

Data Management Challenges for Deep Learning

A case study approach is employed to explore the data management issues faced by practitioners across various domains when they use real-world data for training and deploying deep learning models.

A Survey on Deep Reinforcement Learning for Data Processing and Analytics

This work provides a comprehensive review of recent works focusing on utilizing DRL to improve data processing and analytics, and presents an introduction to key concepts, theories, and methods in DRL.

Improving Data Analytics with Fast and Adaptive Regularization

This paper proposes a general adaptive regularization method based on Gaussian Mixture to learn the best regularization function according to the observed parameters, and develops an effective update algorithm which integrates Expectation Maximization with Stochastic Gradient Descent.

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

This paper proposes SLIDE (Sub-LInear Deep learning Engine) that uniquely blends smart randomized algorithms, with multi-core parallelism and workload optimization, using just a CPU, outperforming an optimized implementation of Tensorflow (TF) on the best available GPU.



Deep Learning at Scale and at Ease

This article designs a distributed deep learning platform called SINGA, which has an intuitive programming model based on the common layer abstraction of deep learning models, and shows that it outperforms many other state-of-the-art deep learning systems.

SINGA: Putting Deep Learning in the Hands of Multimedia Users

This paper designs a distributed deep learning platform called SINGA which has an intuitive programming model and good scalability, and experience with developing and training deep learning models for real-life multimedia applications in SINGSA shows that the platform is both usable and scalable.

Very Deep Convolutional Networks for Large-Scale Image Recognition

This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

SINGA: A Distributed Deep Learning Platform

A distributed deep learning system, called SINGA, for training big models over large datasets, which supports a variety of popular deep learning models and provides different neural net partitioning schemes for training large models.

Deep learning in neural networks: An overview

Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes

This work builds a highly scalable deep learning training system for dense GPU clusters with three main contributions: a mixed-precision training method that significantly improves the training throughput of a single GPU without losing accuracy, an optimization approach for extremely large mini-batch size that can train CNN models on the ImageNet dataset without lost accuracy, and highly optimized all-reduce algorithms.

Effective deep learning-based multi-modal retrieval

This paper proposes a general learning objective that effectively captures both intramodal and intermodal semantic relationships of data from heterogeneous sources and proposes two learning algorithms to realize it: an unsupervised approach that uses stacked auto-encoders and requires minimum prior knowledge on the training data and a supervised approach using deep convolutional neural network and neural language model.

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

This paper empirically show that on the ImageNet dataset large minibatches cause optimization difficulties, but when these are addressed the trained networks exhibit good generalization and enable training visual recognition models on internet-scale data with high efficiency.

ImageNet classification with deep convolutional neural networks

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

Large Scale Distributed Deep Networks

This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training.