• Corpus ID: 256416560

The Efficacy of Self-Supervised Speech Models for Audio Representations

  author={Tung-Yu Wu and Chen-An Li and Tzu-Han Lin and Tsung-Yuan Hsu and Hung-yi Lee},
Self-supervised learning (SSL) speech models, which can serve as powerful upstream models to extract meaningful speech representations, have achieved unprecedented success in speech representation learning. However, their effectiveness on non-speech datasets is rela-tively less explored. In this work, we propose an ensemble framework, with a combination of ensemble techniques, to fuse SSL speech models’ embeddings. Extensive experiments on speech and non-speech audio datasets are conducted to… 

