• Corpus ID: 12219023

Unsupervised feature learning for audio classification using convolutional deep belief networks

@inproceedings{Lee2009UnsupervisedFL,
  title={Unsupervised feature learning for audio classification using convolutional deep belief networks},
  author={Honglak Lee and Peter T. Pham and Yan Largman and A. Ng},
  booktitle={NIPS},
  year={2009}
}
In recent years, deep learning approaches have gained significant interest as a way of building hierarchical representations from unlabeled data. [] Key Result We hope that this paper will inspire more research on deep learning approaches applied to a wide range of audio recognition tasks.

Figures and Tables from this paper

Speaker recognition with hybrid features from a deep belief network
TLDR
This paper studies the use of features from different levels of deep belief network for quantizing the audio data into vectors of audio word counts, and shows that the audio word count vectors generated from mixture of DBN features at different layers give better performance than the MFCC features.
DEEP CONVOLUTIONAL NEURAL NETWORKS FOR MUSIC CLASSIFICATION FROM RAW PCM-ENCODED AUDIO
TLDR
Whereas previous work in feature learning for music information retrieval has relied on intermediate representations of audio, such as discrete Fourier transforms and spectrograms, this system learns features and classifies audio directly from PCM-encoded audio samples.
Training Neural Audio Classifiers with Few Data
  • Jordi Pons, J. Serrà, X. Serra
  • Computer Science
    ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2019
TLDR
Results indicate that transfer learning is a powerful strategy in such scenarios, but prototypical networks show promising results when one does not count with external or validation data.
Convolutional Data : Towards Deep Audio Learning from Big Data ( Abstract )
TLDR
Convolutional Deep Belief Networks have been shown capable of learning high-level features from audio spectrograms, and when applied to music genre classification, the first layer features performed the best overall.
A Deep Bag-of-Features Model for Music Auto-Tagging
TLDR
This paper presents a two-stage learning model to effectively predict multiple labels from music audio, and achieves high performance on Magnatagatune, a popularly used dataset in music auto-tagging.
Web Classification Using Deep Belief Networks
TLDR
This paper applies deep belief networks to web data and evaluates the algorithm on various classification experiments by comparing its performance with that of the SVM classification algorithm.
Learning Features from Music Audio with Deep Belief Networks
TLDR
This work presents a system that can automatically extract relevant features from audio for a given task by using a Deep Belief Network on Discrete Fourier Transforms of the audio to solve the task of genre recognition.
End-to-end learning for music audio
  • S. Dieleman, B. Schrauwen
  • Computer Science
    2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2014
TLDR
Although convolutional neural networks do not outperform a spectrogram-based approach, the networks are able to autonomously discover frequency decompositions from raw audio, as well as phase-and translation-invariant feature representations.
Acoustic scene classification using sparse feature learning and event-based pooling
TLDR
The results show that learned features outperform MFCCs, event-based pooling achieves higher accuracy than uniform pooling and, furthermore, a combination of the two methods performs even better than either one used alone.
Unsupervised feature learning on monaural DOA estimation using convolutional deep belief networks
In recent years, deep learning approaches have gained significant interest as a way of building hierarchical representations from unlabeled data. Additionally, in the field of sound
...
...

References

SHOWING 1-10 OF 21 REFERENCES
Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations
TLDR
The convolutional deep belief network is presented, a hierarchical generative model which scales to realistic image sizes and is translation-invariant and supports efficient bottom-up and top-down probabilistic inference.
Self-taught learning: transfer learning from unlabeled data
TLDR
An approach to self-taught learning that uses sparse coding to construct higher-level features using the unlabeled data to form a succinct input representation and significantly improve classification performance.
Shift-Invariance Sparse Coding for Audio Classification
TLDR
This paper presents an efficient algorithm for learning SISC bases, and shows that SISC's learned high-level representations of speech and music provide useful features for classification tasks within those domains.
Sparse deep belief net model for visual area V2
TLDR
An unsupervised learning model is presented that faithfully mimics certain properties of visual area V2 and the encoding of these more complex "corner" features matches well with the results from the Ito & Komatsu's study of biological V2 responses, suggesting that this sparse variant of deep belief networks holds promise for modeling more higher-order features.
A Fast Learning Algorithm for Deep Belief Nets
TLDR
A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.
Stacks of convolutional Restricted Boltzmann Machines for shift-invariant feature learning
TLDR
The convolutional RBM (C-RBM), a variant of the RBM model in which weights are shared to respect the spatial structure of images, is developed, which learns a set of features that can generate the images of a specific object class.
Efficient Learning of Sparse Representations with an Energy-Based Model
TLDR
A novel unsupervised method for learning sparse, overcomplete features using a linear encoder, and a linear decoder preceded by a sparsifying non-linearity that turns a code vector into a quasi-binary sparse code vector.
Learning Structured Models for Phone Recognition
TLDR
This work presents a maximally streamlined approach to learning HMM-based acoustic models for automatic speech recognition using a split-merge EM procedure which makes no assumptions about subphone structure or context-dependent structure, and which uses only a single Gaussian per HMM state.
An empirical evaluation of deep architectures on problems with many factors of variation
TLDR
A series of experiments indicate that these models with deep architectures show promise in solving harder learning problems that exhibit many factors of variation.
Greedy Layer-Wise Training of Deep Networks
TLDR
These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.
...
...