Corpus ID: 236772109

The History of Speech Recognition to the Year 2030

  title={The History of Speech Recognition to the Year 2030},
  author={Awni Y. Hannun},
The decade from 2010 to 2020 saw remarkable improvements in automatic speech recognition. Many people now use speech recognition on a daily basis, for example to perform voice search queries, send text messages, and interact with voice assistants like Amazon Alexa and Siri by Apple. Before 2010 most people rarely used speech recognition. Given the remarkable changes in the state of speech recognition over the previous decade, what can we expect over the coming decade? I attempt to forecast the… Expand
1 Citations

Figures and Tables from this paper

DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT
  • Heng-Jui Chang, Shu-Wen Yang, Hung-yi Lee
  • Computer Science, Engineering
  • ArXiv
  • 2021
This paper introduces DistilHuBERT, a novel multi-task learning framework to distill hidden representations from a HuberT model directly, which reduces HuBERT’s size by 75% and 73% faster while retaining most performance in ten different tasks. Expand


English Conversational Telephone Speech Recognition by Humans and Machines
An independent set of human performance measurements on two conversational tasks are performed and it is found that human performance may be considerably better than what was earlier reported, giving the community a significantly harder goal to achieve. Expand
Achieving Human Parity in Conversational Speech Recognition
The human error rate on the widely used NIST 2000 test set is measured, and the latest automated speech recognition system has reached human parity, establishing a new state of the art, and edges past the human benchmark. Expand
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition. Expand
Personalization of End-to-End Speech Recognition on Mobile Devices for Named Entities
This work evaluates the effectiveness of several techniques to personalize end-to-end speech models and improve the recognition of proper names relevant to the user, and proposes using keyword-dependent precision and recall metrics to measure vocabulary acquisition performance. Expand
Self-Training for End-to-End Speech Recognition
  • Jacob Kahn, Ann Lee, Awni Y. Hannun
  • Computer Science, Engineering
  • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
It is demonstrated that training with pseudo-labels can substantially improve the accuracy of a baseline model and is revisit self-training in the context of end-to-end speech recognition. Expand
Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition
We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of theExpand
The People’s Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage
The People’s Speech is a free-to-download 31,400-hour and growing supervised conversational English speech recognition dataset licensed for academic and commercial usage under CC-BY-SA. The data isExpand
Deep Speech: Scaling up end-to-end speech recognition
Deep Speech, a state-of-the-art speech recognition system developed using end-to-end deep learning, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set. Expand
Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin
It is shown that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech-two vastly different languages, and is competitive with the transcription of human workers when benchmarked on standard datasets. Expand
wav2vec: Unsupervised Pre-training for Speech Recognition
Wav2vec is trained on large amounts of unlabeled audio data and the resulting representations are then used to improve acoustic model training and outperforms Deep Speech 2, the best reported character-based system in the literature while using two orders of magnitude less labeled training data. Expand