• Corpus ID: 245837267

A novel audio representation using space filling curves

  title={A novel audio representation using space filling curves},
  author={Alessandro Mari and Arash Salarian},
Since convolutional neural networks (CNNs) have revolutionized the image processing field, they have been widely applied in the audio context. A common approach is to convert the one-dimensional audio signal time series to twodimensional images using a time-frequency decomposition method. Also it is common to discard the phase information. In this paper, we propose to map one-dimensional audio waveforms to two-dimensional images using space filling curves (SFCs). These mappings do not compress… 

Figures and Tables from this paper



Raw Waveform-based Audio Classification Using Sample-level CNN Architectures

Two types of sample-level deep convolutional neural networks that take raw waveforms as input and uses filters with small granularity reach state-of-the-art performance levels for the three different categories of sound.

Acoustic Modelling from the Signal Domain Using CNNs

The resulting ‘direct-fromsignal’ network is competitive with state of the art networks based on conventional features with iVector adaptation and, unlike some previous work on learned feature extractors, the objective function converges as fast as for a network based on traditional features.

ImageNet classification with deep convolutional neural networks

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

A new scaling method is proposed that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient and is demonstrated the effectiveness of this method on scaling up MobileNets and ResNet.

MixConv: Mixed Depthwise Convolutional Kernels

This paper proposes a new mixed depthwise convolution (MixConv), which naturally mixes up multiple kernel sizes in a single convolution, and improves the accuracy and efficiency for existing MobileNets on both ImageNet classification and COCO object detection.

Deep Residual Learning for Small-Footprint Keyword Spotting

  • Raphael TangJimmy J. Lin
  • Computer Science, Economics
    2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2018
This work explores the application of deep residual learning and dilated convolutions to the keyword spotting task, using the recently-released Google Speech Commands Dataset as a benchmark and establishes an open-source state-of-the-art reference to support the development of future speech-based interfaces.

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

Highly Accurate Mandarin Tone Classification In The Absence of Pitch Information

A deep neural network based only on 40 mel-frequency cepstral coefficients (MFCCs) achieved substantially better results than the best previously reported results on broadcast news tone classi-cation and are also better than a human listener achieved in categorizing test stimuli created by amplitude-and frequency-modulating complex tones.

Word-Level Embeddings for Cross-Task Transfer Learning in Speech Processing

This work introduces an encoder capturing word-level representations of speech for cross-task transfer learning and shows that the speech representation captured by the encoder through the pre-training is transferable across distinct speech processing tasks and datasets.

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.