Separate But Together: Unsupervised Federated Learning for Speech Enhancement from Non-IID Data

  title={Separate But Together: Unsupervised Federated Learning for Speech Enhancement from Non-IID Data},
  author={Efthymios Tzinis and Jonah Casebeer and Zhepei Wang and Paris Smaragdis},
  journal={2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
We propose FedEnhance, an unsupervised federated learning (FL) approach for speech enhancement and separation with non-IID distributed data across multiple clients. We simulate a realworld scenario where each client only has access to a few noisy recordings from a limited and disjoint number of speakers (hence non-IID). Each client trains their model in isolation using mixture invariant training while periodically providing updates to a central server. Our experiments show that our approach… 

Figures from this paper

Continual Self-Training With Bootstrapped Remixing For Speech Enhancement

The proposed RemixIT method provides a seamless alternative for semi-supervised and unsupervised domain adaptation for speech enhancement tasks, while being general enough to be applied to any separation task and paired with any separation model.

RemixIT: Continual Self-Training of Speech Enhancement Models via Bootstrapped Remixing

Experimental results on multiple speech enhancement datasets and tasks not only show the superiority of the method over prior approaches but also showcase that RemixIT can be combined with any separation model as well as be applied towards any semi-supervised and unsupervised domain adaptation task.

Leveraging Low-Distortion Target Estimates for Improved Speech Enhancement

A novel explanation from the perspective of the low-distortion nature of such algorithms is provided, and it is found that they can consistently improve phase estimation.

CodeFed: Federated Speech Recognition for Low-Resource Code-Switching Detection

This work presents CodeFed: A federated learning-based code-switching detection model that can be deployed to collaboratively trained by leveraging private data from multiple users, without compromising their privacy.

Anomaly Detection through Unsupervised Federated Learning

This paper proposes a novel method in which, through a preprocessing phase, clients are grouped into communities, each having similar majority patterns, and it can detect communities consistent with the ideal partitioning in which groups of clients having the same inlier patterns are known.

STFT-Domain Neural Speech Enhancement with Very Low Algorithmic Latency

Compared with Conv-TasNet, the STFT- domain system can achieve better enhancement performance for a comparable amount of computation, or comparable perfor- mance with less computation, maintaining strong performance at an algorithmic latency as low as 2 ms.

Applications of Federated Learning; Taxonomy, Challenges, and Research Trends

The areas of medical AI, IoT, edge systems, and the autonomous industry can adapt the FL in many of its sub-domains; however, the challenges these domains can encounter are statistical heterogeneity, system heterogeneity, data imbalance, resource allocation, and privacy.

TinyMLOps: Operational Challenges for Widespread Edge AI Adoption

Tasks such as monitoring and managing the application, common functionality for a MLOps platform, and how they are complicated by the distributed nature of edge deployment are shown.



Training Speech Recognition Models with Federated Learning: A Quality/Cost Framework

A framework by which the degree of non-IID-ness can be varied is proposed, consequently illustrating a trade-off between model quality and the computational cost of federated training, which is captured through a novel metric.

Personalized Speech Enhancement through Self-Supervised Data Augmentation and Purification

The proposed data puri-cation step improves the usability of the speaker-specific noisy data in the context of personalized speech enhancement and may be seen as privacy-preserving as it does not rely on any clean speech recordings or speaker embeddings.

Finding Strength in Weakness: Learning to Separate Sounds With Weak Supervision

This work proposes objective functions and network architectures that enable training a source separation system with weak labels and benchmarks the performance of the algorithm using synthetic mixtures of overlapping events created from a database of sounds recorded in urban environments.

WHAM!: Extending Speech Separation to Noisy Environments

The WSJ0 Hipster Ambient Mixtures dataset is created, consisting of two speaker mixtures from the wsj0-2mix dataset combined with real ambient noise samples, to benchmark various speech separation architectures and objective functions to evaluate their robustness to noise.

Speech enhancement with weakly labelled data from AudioSet

This paper proposes a speech enhancement framework trained on weakly labelled data that achieves a PESQ of 2.28 and an SSNR of 8.75 dB on the VoiceBank-DEMAND dataset, outperforming the previous SEGAN system.

PoCoNet: Better Speech Enhancement with Frequency-Positional Embeddings, Semi-Supervised Conversational Data, and Biased Loss

The novel PoCoNet architecture is a convolutional neural network that is able to more efficiently build frequency-dependent features in the early layers, and a new loss function biased towards preserving speech quality helps the optimization better match human perceptual opinions on speech quality.

Sudo RM -RF: Efficient Networks for Universal Audio Source Separation

The backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Features (SuDoRM-RF) as well as their aggregation which is performed through simple one-dimensional convolutions.

Unsupervised Training of a Deep Clustering Model for Multichannel Blind Source Separation

We propose a training scheme to train neural network-based source separation algorithms from scratch when parallel clean data is unavailable. In particular, we demonstrate that an unsupervised

Federated Learning for Keyword Spotting

An extensive empirical study of the federated averaging algorithm for the "Hey Snips" wake word based on a crowdsourced dataset that mimics a federation of wake word users shows that using an adaptive averaging strategy inspired from Adam highly reduces the number of communication rounds required to reach the target performance.

Communication-Efficient Learning of Deep Networks from Decentralized Data

This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.