Corpus ID: 235790668

EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments

@article{Donley2021EasyComAA,
  title={EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments},
  author={Jacob Donley and Vladimir Tourbabin and Jung-Suk Lee and Mark Broyles and Hao Jiang and Jie Shen and Maja Pantic and Vamsi K. Ithapu and Ravish Mehra},
  journal={ArXiv},
  year={2021},
  volume={abs/2107.04174}
}
Augmented Reality (AR) as a platform has the potential to facilitate the reduction of the cocktail party effect. Future AR headsets could potentially leverage information from an array of sensors spanning many different modalities. Training and testing signal processing and machine learning algorithms on tasks such as beam-forming and speech enhancement require high quality representative data. To the best of the author’s knowledge, as of publication there are no available datasets that contain… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 30 REFERENCES
EgoCom: A Multi-person Multi-modal Egocentric Communications Dataset.
TLDR
The Egocentric Communications (EgoCom) dataset is introduced to advance the state-of-the-art in conversational AI, natural language, audio speech analysis, computer vision, and machine learning, and synchronous multi-perspective data to augment performance of embodied AI tasks. Expand
SDR – Half-baked or Well Done?
TLDR
It is argued here that the signal-to-distortion ratio (SDR) implemented in the BSS_eval toolkit has generally been improperly used and abused, especially in the case of single-channel separation, resulting in misleading results. Expand
An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers
  • J. Jensen, C. Taal
  • Computer Science
  • IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2016
TLDR
It is shown that ESTOI can be interpreted in terms of an orthogonal decomposition of short-time spectrograms into intelligibility subspaces, i.e., a ranking of spectrogram features according to their importance to intelligibility. Expand
CHiME-6 Challenge: Tackling Multispeaker Speech Recognition for Unsegmented Recordings
TLDR
Of note, Track 2 is the first challenge activity in the community to tackle an unsegmented multispeaker speech recognition scenario with a complete set of reproducible open source baselines providing speech enhancement, speaker diarization, and speech recognition modules. Expand
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
TLDR
This paper introduces EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments, and had the participants narrate their own videos (after recording), thus reflecting true intention, and crowd-sourced ground-truths based on these. Expand
An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech
TLDR
A short-time objective intelligibility measure (STOI) is presented, which shows high correlation with the intelligibility of noisy and time-frequency weighted noisy speech (e.g., resulting from noise reduction) of three different listening experiments and showed better correlation with speech intelligibility compared to five other reference objective intelligible models. Expand
Restaurant acoustics – Verbal communication in eating establishments
A well-known but also very complicated problem in room acoustics is the ambient noise when many people are gathered for a reception or in a restaurant, a bar, a canteen or a similar place. In suchExpand
An Evaluation of Intrusive Instrumental Intelligibility Metrics
TLDR
The results show that intelligibility metrics tend to perform poorly on datasets that were not used during their development and by modifying the original implementations of SIIB and STOI, the advantage of reducing statistical dependencies between input features is demonstrated. Expand
Performance measurement in blind audio source separation
TLDR
This paper considers four different sets of allowed distortions in blind audio source separation algorithms, from time-invariant gains to time-varying filters, and derives a global performance measure using an energy ratio, plus a separate performance measure for each error term. Expand
The Hearing-Aid Speech Perception Index (HASPI)
TLDR
HASPI is found to give accurate intelligibility predictions for a wide range of signal degradations including speech degraded by noise and nonlinear distortion, speech processed using frequency compression, noisy speech processed through a noise-suppression algorithm, and speech where the high frequencies are replaced by the output of a noise vocoder. Expand
...
1
2
3
...