Differentiable Time-Frequency Scattering on GPU
@inproceedings{Muradeli2022DifferentiableTS, title={Differentiable Time-Frequency Scattering on GPU}, author={John Muradeli and Cyrus Vahidi and Changhong Wang and Han Han and Vincent Lostanlen and Mathieu Lagrange and Georgy Fazekas}, year={2022} }
Joint time–frequency scattering (JTFS) is a convolutional operator in the time–frequency domain which extracts spectrotemporal modulations at various rates and scales. It offers an idealized model of spectrotemporal receptive fields (STRF) in the primary auditory cortex, and thus may serve as a biological plausible surrogate for human perceptual judgments at the scale of isolated audio events. Yet, prior implementations of JTFS and STRF have remained out-side of the standard toolkit of…
One Citation
Learnable Front Ends Based on Temporal Modulation for Music Tagging
- Computer ScienceArXiv
- 2022
Experimental results show that the proposed front ends surpass state-of-the-art (SOTA) methods on the MagnaTagATune dataset in automatic music tagging, and they are also helpful for keyword spotting on speech commands.
References
SHOWING 1-10 OF 23 REFERENCES
Learning metrics on spectrotemporal modulations reveals the perception of musical instrument timbre.
- PhysicsNature human behaviour
- 2020
A broad overview of former studies on musical timbre is provided to identify its relevant acoustic substrates according to biologically inspired models and observe that timbre has both generic and experiment-specific acoustic correlates.
Multiresolution spectrotemporal analysis of complex sounds.
- PhysicsThe Journal of the Acoustical Society of America
- 2005
A computational model of auditory analysis is described that is inspired by psychoacoustical and neurophysiological findings in early and central stages of the auditory system. The model provides a…
Kymatio: Scattering Transforms in Python
- Computer ScienceJ. Mach. Learn. Res.
- 2020
The Kymatio software package is presented, an easy-to-use, high-performance Python implementation of the scattering transform in 1D, 2D, and 3D that is compatible with modern deep learning frameworks.
Extended playing techniques: the next milestone in musical instrument recognition
- Computer ScienceDLfm
- 2018
This work identifies and discusses three necessary conditions for significantly outperforming the traditional mel-frequency cepstral coefficient (MFCC) baseline: the addition of second-order scattering coefficients to account for amplitude modulation, the incorporation of long-range temporal dependencies, and metric learning using large-margin nearest neighbors (LMNN) to reduce intra-class variability.
Joint Time–Frequency Scattering
- Computer ScienceIEEE Transactions on Signal Processing
- 2019
The joint time–frequency scattering transform is introduced, a time-shift invariant representation that characterizes the multiscale energy distribution of a signal in time and frequency that may be implemented as a deep convolutional neural network whose filters are not learned but calculated from wavelets.
Joint Scattering for Automatic Chick Call Recognition
- Computer Science2022 30th European Signal Processing Conference (EUSIPCO)
- 2022
An automatic system for chick call recognition using the joint time-frequency scattering (JTFS) transform improves the frame- and event-based macro F-measures by 10.2% and 11.7%, respectively, than that of a mel-frequency cepstral coefficients baseline.
Parametric Scattering Networks
- Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2022
Focusing on Morlet wavelets, it is proposed to learn the scales, orientations, and aspect ratios of the filters to produce problem-specific parameterizations of the scattering transform, and it is shown that learned versions of this scattering transform yield significant performance gains in small-sample classification settings over the standard scat-tering transform.
Time–frequency scattering accurately models auditory similarities between instrumental playing techniques
- Computer ScienceEURASIP J. Audio Speech Music. Process.
- 2021
A machine listening model that relies on joint time–frequency scattering features to extract spectrotemporal modulations as acoustic features and minimizes triplet loss in the cluster graph by means of the large-margin nearest neighbor (LMNN) metric learning algorithm.
Playing Technique Recognition by Joint Time–Frequency Scattering
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
A recognition system based on the joint time–frequency scattering transform (jTFST) for pitch evolution-based playing techniques (PETs), a group of playing techniques with monotonic pitch changes over time, is proposed.
nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks
- Computer ScienceIEEE Access
- 2020
A new neural network-based audio processing framework with graphics processing unit (GPU) support that leverages 1D convolutional neural networks to perform time domain to frequency domain conversion, which allows on-the-fly spectrogram extraction due to its fast speed, without the need to store any spectrograms on the disk.