Automatic Lyric Transcription from Karaoke Vocal Tracks: Resources and a Baseline System

  title={Automatic Lyric Transcription from Karaoke Vocal Tracks: Resources and a Baseline System},
  author={Gerardo Roa Dabike and J. Barker},
Automatic sung speech recognition is a relatively understudied topic that has been held back by a lack of large and freely available datasets. This has recently changed thanks to the release of the DAMP Sing! dataset, a 1100 hour karaoke dataset originating from the social music-making company, Smule. This paper presents work undertaken to define an easily replicable, automatic speech recognition benchmark for this data. In particular, we describe how transcripts and alignments have been… Expand
Automatic Lyrics Transcription using Dilated Convolutional Neural Networks with Self-Attention
Computational Pronunciation Analysis in Sung Utterances
Automatic Lyrics Alignment and Transcription in Polyphonic Music: Does Background Music Help?
The Use of Voice Source Features for Sung Speech Recognition
Lyrics Information Processing: Analysis, Generation, and Applications


Bootstrapping a System for Phoneme Recognition and Keyword Spotting in Unaccompanied Singing
Transcribing Lyrics from Commercial Song Audio: the First Step Towards Singing Content Processing
The CAPIO 2017 Conversational Speech Recognition System
Speech analysis of sung-speech and lyric recognition in monophonic singing
Librispeech: An ASR corpus based on public domain audio books
TED-LIUM: an Automatic Speech Recognition dedicated corpus
The Kaldi Speech Recognition Toolkit
The design for the wall street journal-based CSR corpus
Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI
Recognition of phonemes and words in singing