Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset
@article{Yang2022OpenSM, title={Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset}, author={Zehui Yang and Yifan Chen and Lei Luo and Runyan Yang and Lingxuan Ye and Gaofeng Cheng and Ji Xu and Yaohui Jin and Qingqing Zhang and Pengyuan Zhang and Lei Xie and Yonghong Yan}, journal={ArXiv}, year={2022}, volume={abs/2203.16844} }
This paper introduces a high-quality rich annotated Mandarin conversational (RAMC) speech dataset called MagicDataRAMC. The MagicData-RAMC corpus contains 180 hours of conversational speech data recorded from native speakers of Mandarin Chinese over mobile phones with a sampling rate of 16 kHz. The dialogs in MagicData-RAMC are classified into 15 diversified domains and tagged with topic labels, ranging from science and technology to ordinary life. Accurate transcription and precise speaker…
References
SHOWING 1-10 OF 37 REFERENCES
HKUST/MTS: A Very Large Scale Mandarin Telephone Speech Corpus
- Physics, Computer ScienceISCSLP
- 2006
The paper describes the design, collection, transcription and analysis of 200 hours of HKUST Mandarin Telephone Speech Corpus (HKUST/MTS), the largest and first of its kind for Mandarin conversational telephone speech, providing abundant and diversified samples for Mandarin speech recognition and other application-dependent tasks.
WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition
- Computer ScienceICASSP
- 2022
WenetSpeech is the current largest open-sourced Mandarin speech corpus with transcriptions, which benefits research on production-level speech recognition, and is provided for cross-validation purpose in training and evaluation.
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10, 000 Hours of Transcribed Audio
- Computer ScienceInterspeech
- 2021
This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 33,000 hours of…
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario
- Computer Science, PhysicsInterspeech
- 2021
AISHELL-4, a sizable real-recorded Mandarin speech dataset collected by 8-channel circular microphone array for speech processing in conference scenario, is presented, and is the only Mandarin dataset for conversation speech, providing additional value for data diversity in speech community.
AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline
- Physics, Computer Science2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)
- 2017
An open-source Mandarin speech corpus called AISHELL-1 is released. It is by far the largest corpus which is suitable for conducting the speech recognition research and building speech recognition…
Librispeech: An ASR corpus based on public domain audio books
- Computer Science2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2015
It is shown that acoustic models trained on LibriSpeech give lower error rate on the Wall Street Journal (WSJ) test sets than models training on WSJ itself.
CN-Celeb: A Challenging Chinese Speaker Recognition Dataset
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
CN-Celeb is presented, a large-scale speaker recognition dataset collected ‘in the wild’ that contains more than 130,000 utterances from 1,000 Chinese celebrities, and covers 11 different genres in real world.
Listen, attend and spell: A neural network for large vocabulary conversational speech recognition
- Computer Science2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2016
We present Listen, Attend and Spell (LAS), a neural speech recognizer that transcribes speech utterances directly to characters without pronunciation models, HMMs or other components of traditional…
The Design for the Wall Street Journal-based CSR Corpus
- Computer ScienceHLT
- 1992
This paper presents the motivating goals, acoustic data design, text processing steps, lexicons, and testing paradigms incorporated into the multi-faceted WSJ CSR Corpus, a corpus containing significant quantities of both speech data and text data.
History Utterance Embedding Transformer LM for Speech Recognition
- Computer ScienceICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2021
The history utterance embedding Transformer LM (HTLM), which includes an embedding generation network for extracting contextual information contained in the history utterances and a main TransformerLM for current prediction, and the two-stage attention (TSA) is proposed to encode richer contextual information into the embedding of history utterments while supporting GPU parallel training.