Corpus ID: 235458522

Voice2Series: Reprogramming Acoustic Models for Time Series Classification

@article{Yang2021Voice2SeriesRA,
  title={Voice2Series: Reprogramming Acoustic Models for Time Series Classification},
  author={Chao-Han Huck Yang and Yun-Yun Tsai and Pin-Yu Chen},
  journal={ArXiv},
  year={2021},
  volume={abs/2106.09296}
}
Learning to classify time series with limited data is a practical yet challenging problem. Current methods are primarily based on hand-designed feature extraction rules or domain-specific data augmentation. Motivated by the advances in deep speech processing models and the fact that voice data are univariate temporal signals, in this paper we propose Voice2Series (V2S), a novel end-to-end approach that reprograms acoustic models for time series classification, through input transformation… Expand
1 Citations

Figures and Tables from this paper

Multi-task Language Modeling for Improving Speech Recognition of Rare Words
TLDR
This paper proposes a second-pass system with multi-task learning, utilizing semantic targets (such as intent and slot prediction) to improve speech recognition performance, and shows that the rescoring model with trained with these additional tasks outperforms the baseline rescoring models. Expand

References

SHOWING 1-10 OF 64 REFERENCES
Deep learning for time series classification: a review
TLDR
This article proposes the most exhaustive study of DNNs for TSC by training 8730 deep learning models on 97 time series datasets and provides an open source deep learning framework to the TSC community. Expand
ConvTimeNet: A Pre-trained Deep Convolutional Neural Network for Time Series Classification
TLDR
Significant gains in classification accuracy as well as computational efficiency when using pre-trained CTN as a starting point for subsequent task-specific fine-tuning compared to existing state-of-the-art TSC approaches are observed. Expand
Transfer learning for time series classification
TLDR
In an effort to predict the best source dataset for a given target dataset, a new method relying on Dynamic Time Warping to measure inter-datasets similarities is proposed, leading to an improvement in accuracy on 71 out of 85 datasets. Expand
Look, Listen, and Learn More: Design Choices for Deep Audio Embeddings
TLDR
This paper investigates how L3-Net design choices impact the performance of downstream audio classifiers trained with these embeddings, and shows that audio-informed choices of input representation are important, and that using sufficient data for training the embedding is key. Expand
Time series classification from scratch with deep neural networks: A strong baseline
TLDR
The proposed Fully Convolutional Network (FCN) achieves premium performance to other state-of-the-art approaches and the exploration of the very deep neural networks with the ResNet structure is also competitive. Expand
A Regression Approach to Speech Enhancement Based on Deep Neural Networks
TLDR
The proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general, and is effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods. Expand
WARP: Word-level Adversarial ReProgramming
TLDR
An alternative approach based on adversarial reprogramming is presented, which attempts to learn taskspecific word embeddings that, when concatenated to the input text, instruct the language model to solve the specified task. Expand
A neural attention model for speech command recognition
TLDR
A convolutional recurrent network with attention for speech command recognition that establishes a new state-of-the-art accuracy of 94.1% and allows inspecting what regions of the audio were taken into consideration by the network when outputting a given category. Expand
Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement
TLDR
A U-Net based attention model, UNetAt, is presented to enhance adversarial speech signals and it is found that temporal features learned by the attention network are capable of enhancing the robustness of DNN based ASR models. Expand
English Conversational Telephone Speech Recognition by Humans and Machines
TLDR
An independent set of human performance measurements on two conversational tasks are performed and it is found that human performance may be considerably better than what was earlier reported, giving the community a significantly harder goal to achieve. Expand
...
1
2
3
4
5
...