Convolutional operators in the time-frequency domain
@inproceedings{Lostanlen2017ConvolutionalOI, title={Convolutional operators in the time-frequency domain}, author={Vincent Lostanlen}, year={2017} }
This dissertation addresses audio classification by designing signal representations which satisfy appropriate invariants while preserving inter-class variability. First, we study time-frequencyscattering, a representation which extract modulations at various scales and rates in a similar way to idealized models of spectrotemporal receptive fields in auditory neuroscience. We report state-of-the-artresults in the classification of urban and environmental sounds, thus outperforming short-term…
Figures and Tables from this paper
figure 1.1 figure 1.2 figure 2.1 figure 2.2 figure 2.3 figure 2.4 figure 2.5 figure 2.6 figure 2.7 figure 3.1 figure 3.10 table 3.1 figure 3.11 figure 3.2 table 3.2 figure 3.3 table 3.3 figure 3.4 figure 3.5 figure 3.6 figure 3.7 figure 3.8 figure 3.9 figure 4.1 figure 4.10 table 4.1 figure 4.11 figure 4.12 figure 4.13 figure 4.14 figure 4.15 figure 4.16 figure 4.17 figure 4.18 figure 4.19 figure 4.2 figure 4.20 table 4.2 figure 4.21 figure 4.22 figure 4.3 figure 4.4 figure 4.5 figure 4.6 figure 4.7 figure 4.8 figure 4.9
8 Citations
Per-Channel Energy Normalization: Why and How
- PhysicsIEEE Signal Processing Letters
- 2019
This letter investigates the adequacy of PCEN for spectrogram-based pattern recognition in far-field noisy recordings, both from theoretical and practical standpoints and describes the asymptotic regimes in PCEN: temporal integration, gain control, and dynamic range compression.
The shape of RemiXXXes to come
- Mathematics
- 2019
This article explains how to apply time–frequency scattering, a convolutional operator extracting modulations in the time–frequency domain at different rates and scales, to the re-synthesis and…
The Shape of RemiXXXes to Come: Audio Texture Synthesis with Time-frequency Scattering
- PhysicsArXiv
- 2019
This article explains how to apply time--frequency scattering, a convolutional operator extracting modulations in the time--frequency domain at different rates and scales, to the re-synthesis and…
Relevance-based quantization of scattering features for unsupervised mining of environmental audio
- Computer ScienceEURASIP J. Audio Speech Music. Process.
- 2018
A two-scale representation is proposed which describes a recording using clusters of scattering coefficients, which captures short-scale structure while the cluster model captures longer time scales, allowing for more accurate characterization of sparse events.
Extended playing techniques: the next milestone in musical instrument recognition
- Computer ScienceDLfm
- 2018
This work identifies and discusses three necessary conditions for significantly outperforming the traditional mel-frequency cepstral coefficient (MFCC) baseline: the addition of second-order scattering coefficients to account for amplitude modulation, the incorporation of long-range temporal dependencies, and metric learning using large-margin nearest neighbors (LMNN) to reduce intra-class variability.
0 M ay 2 01 9 On Time-frequency Scattering and Computer Music
- Art
- 2019
The quest for an adequate representation of auditory textures lies at the foundation of computer music research. Indeed, none of its analog predecessors ever managed a practical compromise between…
On Time-frequency Scattering and Computer Music
- Computer ScienceArXiv
- 2018
Time-frequency scattering, a mathematical transformation of sound waves, can also be useful for applications in contemporary music creations.
Hybrid scattering-LSTM networks for automated detection of sleep arousals.
- Computer SciencePhysiological measurement
- 2019
A new automatic detector of non-apnea arousal regions in multichannel PSG recordings that is the first use of a hybrid ST-BiLSTM network with biomedical signals and requires no explicit mechanism to overcome class imbalance in the data.
References
SHOWING 1-10 OF 270 REFERENCES
Deep Convolutional Networks on the Pitch Spiral For Music Instrument Recognition
- Computer ScienceISMIR
- 2016
This article investigates the construction of learned convolutional architectures for instrument recognition, given a limited amount of annotated training data, and benchmarked three different weight sharing strategies for deep Convolutional networks in the time-frequency domain, providing an acoustical interpretation of these strategies within the source-filter framework of quasi-harmonic sounds.
Deep Scattering Spectrum with deep neural networks
- Computer Science2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2014
This paper identifies the effective normalization, neural network topology and regularization techniques to effectively model higher order scatter and results in relative improvement of 7% compared to log-mel features on TIMIT, providing a phonetic error rate of 17.4%, one of the lowest reported PERs to date on this task.
Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations
- Physics, Computer ScienceIEEE Transactions on Audio, Speech, and Language Processing
- 2006
A content-based audio classification algorithm based on novel multiscale spectro-temporal modulation features inspired by a model of auditory cortical processing to discriminate speech from nonspeech consisting of animal vocalizations, music, and environmental sounds is described.
SCATTERING REPRESENTATION OF MODULATED SOUNDS
- Physics
- 2012
Mel-frequency spectral coefficients (MFSCs), calculated by averaging the spectrogram along a mel-frequency scale, are used in many audio classification tasks. Their efficiency can be partly explained…
CQT-based Convolutional Neural Networks for Audio Scene Classification
- Computer ScienceDCASE
- 2016
It is shown in this paper that a ConstantQ-transformed input to a Convolutional Neural Network improves results and a parallel (graphbased) neural network architecture is proposed which captures relevant audio characteristics both in time and in frequency.
WaveNet: A Generative Model for Raw Audio
- Computer ScienceSSW
- 2016
WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition.
Biomimetic spectro-temporal features for music instrument recognition in isolated notes and solo phrases
- PhysicsEURASIP J. Audio Speech Music. Process.
- 2015
The study presents an approach for parsing solo performances into their individual note constituents and adapting back-end classifiers using support vector machines to achieve a generalization of instrument recognition to off-the-shelf, commercially available solo music.
Environmental Sound Recognition With Time–Frequency Audio Features
- Computer ScienceIEEE Transactions on Audio, Speech, and Language Processing
- 2009
An empirical feature analysis for audio environment characterization is performed and a matching pursuit algorithm is proposed to use to obtain effective time-frequency features to yield higher recognition accuracy for environmental sounds.
Idealized Computational Models for Auditory Receptive Fields
- Computer SciencePloS one
- 2015
It is demonstrated how the presented framework allows for computation of basic auditory features for audio processing and that it leads to predictions about auditory receptive fields with good qualitative similarity to biological receptive fields measured in the inferior colliculus and primary auditory cortex of mammals.
Joint Acoustic and Modulation Frequency
- EngineeringEURASIP J. Adv. Signal Process.
- 2003
The concept of a two-dimensional joint acoustic and modulation frequency representation is proposed and a simple single sinusoid amplitude modulator of a sinusoidal carrier is used to illustrate properties of an unconstrained and ideal joint representation.