Corpus ID: 235790668

EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments

@article{Donley2021EasyComAA,
  title={EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments},
  author={Jacob Donley and Vladimir Tourbabin and Jung-Suk Lee and Mark Broyles and Hao Jiang and Jie Shen and Maja Pantic and Vamsi K. Ithapu and Ravish Mehra},
  journal={ArXiv},
  year={2021},
  volume={abs/2107.04174}
}
Augmented Reality (AR) as a platform has the potential to facilitate the reduction of the cocktail party effect. Future AR headsets could potentially leverage information from an array of sensors spanning many different modalities. Training and testing signal processing and machine learning algorithms on tasks such as beam-forming and speech enhancement require high quality representative data. To the best of the author’s knowledge, as of publication there are no available datasets that contain… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 33 REFERENCES
The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines
TLDR
The 5th CHiME Challenge is introduced, which considers the task of distant multi-microphone conversational ASR in real home environments and describes the data collection procedure, the task, and the baseline systems for array synchronization, speech enhancement, and conventional and end-to-end ASR. Expand
EgoCom: A Multi-person Multi-modal Egocentric Communications Dataset.
TLDR
The Egocentric Communications (EgoCom) dataset is introduced to advance the state-of-the-art in conversational AI, natural language, audio speech analysis, computer vision, and machine learning, and synchronous multi-perspective data to augment performance of embodied AI tasks. Expand
SDR – Half-baked or Well Done?
TLDR
It is argued here that the signal-to-distortion ratio (SDR) implemented in the BSS_eval toolkit has generally been improperly used and abused, especially in the case of single-channel separation, resulting in misleading results. Expand
An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers
  • J. Jensen, C. Taal
  • Computer Science
  • IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2016
TLDR
It is shown that ESTOI can be interpreted in terms of an orthogonal decomposition of short-time spectrograms into intelligibility subspaces, i.e., a ranking of spectrogram features according to their importance to intelligibility. Expand
CHiME-6 Challenge: Tackling Multispeaker Speech Recognition for Unsegmented Recordings
TLDR
Of note, Track 2 is the first challenge activity in the community to tackle an unsegmented multispeaker speech recognition scenario with a complete set of reproducible open source baselines providing speech enhancement, speaker diarization, and speech recognition modules. Expand
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
TLDR
This paper introduces EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments, and had the participants narrate their own videos (after recording), thus reflecting true intention, and crowd-sourced ground-truths based on these. Expand
An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech
TLDR
A short-time objective intelligibility measure (STOI) is presented, which shows high correlation with the intelligibility of noisy and time-frequency weighted noisy speech (e.g., resulting from noise reduction) of three different listening experiments and showed better correlation with speech intelligibility compared to five other reference objective intelligible models. Expand
Restaurant acoustics – Verbal communication in eating establishments
A well-known but also very complicated problem in room acoustics is the ambient noise when many people are gathered for a reception or in a restaurant, a bar, a canteen or a similar place. In suchExpand
An Evaluation of Intrusive Instrumental Intelligibility Metrics
TLDR
The results show that intelligibility metrics tend to perform poorly on datasets that were not used during their development and by modifying the original implementations of SIIB and STOI, the advantage of reducing statistical dependencies between input features is demonstrated. Expand
Performance measurement in blind audio source separation
TLDR
This paper considers four different sets of allowed distortions in blind audio source separation algorithms, from time-invariant gains to time-varying filters, and derives a global performance measure using an energy ratio, plus a separate performance measure for each error term. Expand
...
1
2
3
4
...