Scalable stacking and learning for building deep architectures

@article{Deng2012ScalableSA,
  title={Scalable stacking and learning for building deep architectures},
  author={Li Deng and Dong Yu and John C. Platt},
  journal={2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2012},
  pages={2133-2136}
}
  • L. Deng, Dong Yu, John C. Platt
  • Published 1 March 2012
  • Computer Science
  • 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Deep Neural Networks (DNNs) have shown remarkable success in pattern recognition tasks. [] Key Method The DSN provides a method of stacking simple processing modules in buiding deep architectures, with a convex learning problem in each module. Additional fine tuning further improves the DSN, while introducing minor non-convexity. Full learning in the DSN is batch-mode, making it amenable to parallel training over many machines and thus be scalable over the potentially huge size of the training data…

Figures and Tables from this paper

Parallel Training for Deep Stacking Networks

TLDR
This paper presents the first parallel implementation of the DSN training algorithm, and shows the tradeoff between the time/memory saving via training parallelism and the associated cost arising from inter-CPU communication.

Stacking-based deep neural network: Deep analytic network on convolutional spectral histogram features

  • C. LowA. Teoh
  • Computer Science
    2017 IEEE International Conference on Image Processing (ICIP)
  • 2017
TLDR
The empirical results unveil that DAN escalates the SH baseline performance over a sufficiently deep layer, and some key DNN constituents, specifically, rectified linear unit, fine-tuning, and normalization, are revealed.

Visual Representation and Classification by Learning Group Sparse Deep Stacking Network

TLDR
A group sparse DSN (GS-DSN) is constructed by stacking the group sparse SNNM modules and achieves the state-of-the-art performance (99.1%) on 15-Scene.

Tensor Deep Stacking Networks

TLDR
A sufficient depth of the T-DSN, a symmetry in the two hidden layers structure in each T- DSN block, the model parameter learning algorithm, and a softmax layer on top of T-DsN are shown to have all contributed to the low error rates observed in the experiments for all three tasks.

Parallel Training of Deep Networks with Local Updates

TLDR
This paper investigates how to continue scaling compute efficiently beyond the point of diminishing returns for large batches through local parallelism, a framework which parallelizes training of individual layers in deep networks by replacing global back Propagation with truncated layer-wise backpropagation.

Deep stacking networks for information retrieval

TLDR
It is demonstrated desirable monotonic correlation between NDCG and classification rate in a wide range of IR quality and the weaker correlation and more flat relationship in the high IR-quality region suggest the need for developing new learning objectives and optimization methods.

Partitioning Large Scale Deep Belief Networks Using Dropout

TLDR
This work considers a well-known machine learning model, deep belief networks (DBNs), and proposes an approach that can use the computing clusters in a distributed environment to train large models, while the dense matrix computations within a single machine are sped up using graphics processors (GPU).

On Distributed Deep Network for Processing Large-Scale Sets of Complex Data

TLDR
Bagging-Down SGD algorithm is developed that introduces the parameter server adding on the several model replicas, and separates the updating and the training computing to accelerate the whole system.

A deep architecture with bilinear modeling of hidden representations: Applications to phonetic recognition

We develop and describe a novel deep architecture, the Tensor Deep Stacking Network (T-DSN), where multiple blocks are stacked one on top of another and where a bilinear mapping from hidden

Stacking-Based Deep Neural Network: Deep Analytic Network for Pattern Classification

TLDR
DAN/K-DAN outperform the present S-DNNs and also the BP-trained DNNs, including multiplayer perceptron, deep belief network, etc., without data augmentation applied, and are trainable using only CPU even for small-scale training sets.
...

References

SHOWING 1-10 OF 16 REFERENCES

Deep Convex Net: A Scalable Architecture for Speech Pattern Classification

TLDR
Results on both MNIST and TIMIT tasks evaluated thus far demonstrate superior performance of DCN over the DBN (Deep Belief Network) counterpart that forms the basis of the DNN, reflected not only in training scalability and CPU-only computation, but more importantly in classification accuracy in both tasks.

A deep architecture with bilinear modeling of hidden representations: Applications to phonetic recognition

We develop and describe a novel deep architecture, the Tensor Deep Stacking Network (T-DSN), where multiple blocks are stacked one on top of another and where a bilinear mapping from hidden

Learning Deep Architectures for AI

TLDR
The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.

Towards deeper understanding: Deep convex networks for semantic utterance classification

TLDR
The DCN-based method produces higher SUC accuracy than the Boosting-based discriminative classifier with word trigrams, and experimental results obtained on a domain classification task for spoken language understanding demonstrate the effectiveness of DCNs.

Investigation of full-sequence training of deep belief networks for speech recognition

TLDR
It is shown that the DBNs learned using the sequence-based training criterion outperform those with frame-based criterion using both threelayer and six-layer models, but the optimization procedure for the deeper DBN is more difficult for the former criterion.

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

TLDR
A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.

Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition

TLDR
Deep neural networks are employed to improve detection accuracy over conventional shallow MLPs (multi-layer perceptrons) with one hidden layer to open the door to a new family of flexible speech recognition system design for both top-down and bottom-up, lattice-based search strategies and knowledge integration.

The Cascade-Correlation Learning Architecture

TLDR
The Cascade-Correlation architecture has several advantages over existing algorithms: it learns very quickly, the network determines its own size and topology, it retains the structures it has built even if the training set changes, and it requires no back-propagation of error signals through the connections of the network.

Deep Boltzmann Machines

TLDR
A new learning algorithm for Boltzmann machines that contain many layers of hidden variables that is made more efficient by using a layer-by-layer “pre-training” phase that allows variational inference to be initialized with a single bottomup pass.

Gradient-based learning applied to document recognition

TLDR
This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques.