Federated Self-Training for Semi-Supervised Audio Recognition

  title={Federated Self-Training for Semi-Supervised Audio Recognition},
  author={Vasileios Tsouvalas and Aaqib Saeed and Tanir Ozcelebi},
  journal={ACM Transactions on Embedded Computing Systems (TECS)},
Federated Learning is a distributed machine learning paradigm dealing with decentralized and personal datasets. Since data reside on devices like smartphones and virtual assistants, labeling is entrusted to the clients or labels are extracted in an automated way. Specifically, in the case of audio data, acquiring semantic annotations can be prohibitively expensive and time-consuming. As a result, an abundance of audio data remains unlabeled and unexploited on users’ devices. Most existing… 

Privacy-preserving Speech Emotion Recognition through Semi-Supervised Federated Learning

This is the first federated SER approach, which utilizes self-training learning in conjunction with federated learning to exploit both labeled and unlabeled on-device data, and shows that the federated approach can learn generalizable SER models even under low availability of data labels and highly non-i.i.d. distributions.

FedLN: Federated Learning with Label Noise

  • Computer Science
  • 2022
FedLN is proposed, a frame-based work to deal with label noise across different FL training stages; namely, FL initialization, and server-side model aggregation.

Federated Learning with Noisy Labels

FedLN is proposed, a framework to deal with label noise across different FL training stages; namely, FL initialization, on-device model training, and server model aggregation, which computes per-client noise-level estimation in a single federated round and improves the models’ performance by correcting (or limiting the effect of) noisy samples.

Federated Domain Adaptation for ASR with Full Self-Supervision

A FL system for on-device ASR domain adaptation with full self-supervision, which uses self-labeling together with data augmentation and filtering techniques and can improve a strong Emformer-Transducer based ASR model pretrained on out-of-domain data.

Federated Cycling (FedCy): Semi-supervised Federated Learning of Surgical Phases

FedCy is proposed, a federated semi-supervised learning (FSSL) method that combines FL and self-super supervised learning to exploit a decentralized dataset of both labeled and unlabeled videos, thereby improving performance on the task of surgical phase recognition.



FSD50K: An Open Dataset of Human-Labeled Sound Events

FSD50K is introduced, an open dataset containing over 51 k audio clips totalling over 100 h of audio manually labeled using 200 classes drawn from the AudioSet Ontology, to provide an alternative benchmark dataset and thus foster SER research.

Augmenting Conversational Agents with Ambient Acoustic Contexts

This work proposes a solution that redesigns the input segment intelligently for ambient context recognition, achieved in a two-step inference pipeline, first separate the non-speech segment from acoustic signals and then use a neural network to infer diverse ambient contexts.

Advances and Open Problems in Federated Learning

Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.

Federated Optimization in Heterogeneous Networks

This work introduces a framework, FedProx, to tackle heterogeneity in federated networks, and provides convergence guarantees for this framework when learning over data from non-identical distributions (statistical heterogeneity), and while adhering to device-level systems constraints by allowing each participating device to perform a variable amount of work.

Federated Learning with Non-IID Data

This work presents a strategy to improve training on non-IID data by creating a small subset of data which is globally shared between all the edge devices, and shows that accuracy can be increased by 30% for the CIFAR-10 dataset with only 5% globally shared data.

Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

An audio dataset of spoken words designed to help train and evaluate keyword spotting systems and suggests a methodology for reproducible and comparable accuracy metrics for this task.

Communication-Efficient Learning of Deep Networks from Decentralized Data

This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.

Contrastive Learning of General-Purpose Audio Representations

This work builds on top of recent advances in contrastive learning for computer vision and reinforcement learning to design a lightweight, easy-to-implement self-supervised model of audio, and shows that despite its simplicity, this method significantly outperforms previous self- supervised systems.

Federated Learning for Keyword Spotting

An extensive empirical study of the federated averaging algorithm for the "Hey Snips" wake word based on a crowdsourced dataset that mimics a federation of wake word users shows that using an adaptive averaging strategy inspired from Adam highly reduces the number of communication rounds required to reach the target performance.

End-to-End Speech Recognition from Federated Acoustic Models

This paper constructs a challenging and realistic ASR federated experimental setup consisting of clients with heterogeneous data distributions using the French and Italian sets of the CommonVoice dataset, a large heterogeneous dataset containing thousands of different speakers, acoustic environments and noises.