Towards Learning a Universal Non-Semantic Representation of Speech

@article{Shor2020TowardsLA,
  title={Towards Learning a Universal Non-Semantic Representation of Speech},
  author={Joel Shor and A. Jansen and R. Maor and Oran Lang and Omry Tuval and F. D. C. Quitry and M. Tagliasacchi and Ira Shavitt and D. Emanuel and Yinnon A. Haviv},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.12764}
}
The ultimate goal of transfer learning is to reduce labeled data requirements by exploiting a pre-existing embedding model trained for different datasets or tasks. The visual and language communities have established benchmarks to compare embeddings, but the speech community has yet to do so. This paper proposes a benchmark for comparing speech representations on non-semantic tasks, and proposes a representation based on an unsupervised triplet-loss objective. The proposed representation… Expand
FUN! Fast, Universal, Non-Semantic Speech Embeddings
Contrastive Learning of General-Purpose Audio Representations
C L ] 3 M ay 2 02 1 SUPERB : Speech processing Universal PERformance Benchmark
Multimodal Self-Supervised Learning of General Audio Representations
PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation
...
1
2
...

References

SHOWING 1-10 OF 59 REFERENCES
Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks
Unsupervised Learning of Semantic Audio Representations
  • A. Jansen, M. Plakal, +5 authors R. Saurous
  • Computer Science, Engineering
  • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2018
The Visual Task Adaptation Benchmark
Generative Pre-Training for Speech with Autoregressive Predictive Coding
  • Yu-An Chung, James R. Glass
  • Computer Science, Engineering
  • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
Attention-augmented End-to-end Multi-task Learning for Emotion Prediction from Speech
Look, Listen and Learn
An Unsupervised Autoregressive Model for Speech Representation Learning
Variational Autoencoders for Learning Latent Representations of Speech Emotion
...
1
2
3
4
5
...