• Corpus ID: 195949107

Convolutional operators in the time-frequency domain

  title={Convolutional operators in the time-frequency domain},
  author={Vincent Lostanlen},
This dissertation addresses audio classification by designing signal representations which satisfy appropriate invariants while preserving inter-class variability. First, we study time-frequencyscattering, a representation which extract modulations at various scales and rates in a similar way to idealized models of spectrotemporal receptive fields in auditory neuroscience. We report state-of-the-artresults in the classification of urban and environmental sounds, thus outperforming short-term… 
Per-Channel Energy Normalization: Why and How
This letter investigates the adequacy of PCEN for spectrogram-based pattern recognition in far-field noisy recordings, both from theoretical and practical standpoints and describes the asymptotic regimes in PCEN: temporal integration, gain control, and dynamic range compression.
The shape of RemiXXXes to come
This article explains how to apply time–frequency scattering, a convolutional operator extracting modulations in the time–frequency domain at different rates and scales, to the re-synthesis and
The Shape of RemiXXXes to Come: Audio Texture Synthesis with Time-frequency Scattering
This article explains how to apply time--frequency scattering, a convolutional operator extracting modulations in the time--frequency domain at different rates and scales, to the re-synthesis and
Relevance-based quantization of scattering features for unsupervised mining of environmental audio
A two-scale representation is proposed which describes a recording using clusters of scattering coefficients, which captures short-scale structure while the cluster model captures longer time scales, allowing for more accurate characterization of sparse events.
Extended playing techniques: the next milestone in musical instrument recognition
This work identifies and discusses three necessary conditions for significantly outperforming the traditional mel-frequency cepstral coefficient (MFCC) baseline: the addition of second-order scattering coefficients to account for amplitude modulation, the incorporation of long-range temporal dependencies, and metric learning using large-margin nearest neighbors (LMNN) to reduce intra-class variability.
0 M ay 2 01 9 On Time-frequency Scattering and Computer Music
The quest for an adequate representation of auditory textures lies at the foundation of computer music research. Indeed, none of its analog predecessors ever managed a practical compromise between
On Time-frequency Scattering and Computer Music
Time-frequency scattering, a mathematical transformation of sound waves, can also be useful for applications in contemporary music creations.
Hybrid scattering-LSTM networks for automated detection of sleep arousals.
A new automatic detector of non-apnea arousal regions in multichannel PSG recordings that is the first use of a hybrid ST-BiLSTM network with biomedical signals and requires no explicit mechanism to overcome class imbalance in the data.


Deep Convolutional Networks on the Pitch Spiral For Music Instrument Recognition
This article investigates the construction of learned convolutional architectures for instrument recognition, given a limited amount of annotated training data, and benchmarked three different weight sharing strategies for deep Convolutional networks in the time-frequency domain, providing an acoustical interpretation of these strategies within the source-filter framework of quasi-harmonic sounds.
Deep Scattering Spectrum with deep neural networks
This paper identifies the effective normalization, neural network topology and regularization techniques to effectively model higher order scatter and results in relative improvement of 7% compared to log-mel features on TIMIT, providing a phonetic error rate of 17.4%, one of the lowest reported PERs to date on this task.
Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations
A content-based audio classification algorithm based on novel multiscale spectro-temporal modulation features inspired by a model of auditory cortical processing to discriminate speech from nonspeech consisting of animal vocalizations, music, and environmental sounds is described.
Mel-frequency spectral coefficients (MFSCs), calculated by averaging the spectrogram along a mel-frequency scale, are used in many audio classification tasks. Their efficiency can be partly explained
CQT-based Convolutional Neural Networks for Audio Scene Classification
It is shown in this paper that a ConstantQ-transformed input to a Convolutional Neural Network improves results and a parallel (graphbased) neural network architecture is proposed which captures relevant audio characteristics both in time and in frequency.
WaveNet: A Generative Model for Raw Audio
WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition.
Biomimetic spectro-temporal features for music instrument recognition in isolated notes and solo phrases
The study presents an approach for parsing solo performances into their individual note constituents and adapting back-end classifiers using support vector machines to achieve a generalization of instrument recognition to off-the-shelf, commercially available solo music.
Environmental Sound Recognition With Time–Frequency Audio Features
An empirical feature analysis for audio environment characterization is performed and a matching pursuit algorithm is proposed to use to obtain effective time-frequency features to yield higher recognition accuracy for environmental sounds.
Idealized Computational Models for Auditory Receptive Fields
It is demonstrated how the presented framework allows for computation of basic auditory features for audio processing and that it leads to predictions about auditory receptive fields with good qualitative similarity to biological receptive fields measured in the inferior colliculus and primary auditory cortex of mammals.
Joint Acoustic and Modulation Frequency
The concept of a two-dimensional joint acoustic and modulation frequency representation is proposed and a simple single sinusoid amplitude modulator of a sinusoidal carrier is used to illustrate properties of an unconstrained and ideal joint representation.