Integrating Additional Chord Information Into HMM-Based Lyrics-to-Audio Alignment

@article{Mauch2012IntegratingAC,
  title={Integrating Additional Chord Information Into HMM-Based Lyrics-to-Audio Alignment},
  author={Matthias Mauch and Hiromasa Fujihara and Masataka Goto},
  journal={IEEE Transactions on Audio, Speech, and Language Processing},
  year={2012},
  volume={20},
  pages={200-210}
}
Aligning lyrics to audio has a wide range of applications such as the automatic generation of karaoke scores, song-browsing by lyrics, and the generation of audio thumbnails. [] Key Method We propose two novel methods that implement this idea: First, assuming that all chords of a song are known, we extend a hidden Markov model (HMM) framework by including chord changes in the Markov chain and an additional audio feature (chroma) in the emission vector; second, for the more realistic case in which some chord…
Automatic Lyrics-to-audio Alignment on Polyphonic Music Using Singing-adapted Acoustic Models
TLDR
It is demonstrated that the use of audio source separation method and effective end-pointing of the songs has a high impact on the alignment performance through the experiments, and is comparable with the state-of-the-art lyrics-to-audio alignment system that is trained on a large polyphonic music database.
Improving Lyrics Alignment through Joint Pitch Detection
TLDR
This paper proposes a multi-task learning approach for lyrics alignment that incorporates pitch and thus can make use of a new source of highly accurate temporal information and shows that the accuracy of the alignment result is improved by this approach.
MSTRE-Net: Multistreaming Acoustic Modeling for Automatic Lyrics Transcription
TLDR
A novel variant of the Multistreaming Time-Delay Neural Network (MTDNN) architecture, called MSTRE-Net, which processes the temporal information using multiple streams in parallel with varying resolutions keeping the network more compact, and thus with a faster inference and an improved recognition rate than having identical TDNN streams.
Word level lyrics-audio synchronization using separated vocals
TLDR
This paper presents an approach for lyric-audio alignment by comparing synthesized speech with a vocal track removed from an instrument mixture using source separation, taking a hierarchical approach to solve the problem.
Real-time audio-to-score alignment of singing voice based on melody and lyric information
TLDR
This paper aims at exploiting the advantages of melody and lyric information for real-time audio-to-score alignment of singing voice and suggests that lyric information can be efficiently used for any singer.
Acoustic Modeling for Automatic Lyrics-to-Audio Alignment
TLDR
This work proposes using additional speech and music-informed features and adapting the acoustic models trained on a large amount of solo singing vocals towards polyphonic music using a small amount of in-domain data to reduce the domain mismatch between training and testing data.
Low Resource Audio-To-Lyrics Alignment from Polyphonic Music Recordings
TLDR
This study presents a novel method that performs audio-to-lyrics alignment with a low memory consumption footprint regardless of the duration of the music recording, and utilizes the lyrics alignment system to segment the music recordings into sentence-level chunks.
LYRICS-AUDIO SYNCHRONIZATION USING SEPARATED VOCALS
TLDR
This paper presents an approach for lyric-audio alignment by comparing synthesized speech with a vocal track removed from an instrument mixture using source separation, taking a hierarchical approach to solve the problem.
DECIBEL: Improving Audio Chord Estimation for Popular Music by Alignment and Integration of Crowd-Sourced Symbolic Representations
TLDR
DECIBEL is proposed, a new ACE system that exploits widely available MIDI and tab representations to improve ACE from audio only and shows that the integration of musical knowledge from heterogeneous symbolic music representations is a suitable strategy for addressing challenging MIR tasks such as ACE.
Automatic Lyrics Transcription in Polyphonic Music: Does Background Music Help?
TLDR
This work proposes to learn music genre-specific characteristics to train polyphonic acoustic models, and explicitly model the characteristics of music, instead of trying to remove the background music as noise.
...
...

References

SHOWING 1-10 OF 29 REFERENCES
Popular song and lyrics synchronization and its application to music information retrieval
TLDR
This is the first automatic synchronization system only based on the low-level acoustic feature such as MFCC and it is evaluated on a Chinese song dataset collecting from 3 popular singers to open up the discussion of some challenging problems when developing a robust synchronization system for largescale database.
Automatic chord transcription from audio using computational models of musical context
TLDR
A novel dynamic Bayesian network (DBN) is presented which integrates models of metric position, key, chord, bass note and two beat-synchronous audio features into a single high-level musical context model.
Simultaneous Estimation of Chords and Musical Context From Audio
TLDR
This work devise a fully automatic method to simultaneously estimate from an audio waveform the chord sequence including bass notes, the metric positions of chords, and the key, and introduces a measure of segmentation quality and shows that bass and meter modeling are especially beneficial for obtaining the correct level of granularity.
LyricAlly: Automatic Synchronization of Textual Lyrics to Acoustic Music Signals
TLDR
LyricAlly is presented, a prototype that automatically aligns acoustic musical signals with their corresponding textual lyrics, in a manner similar to manually-aligned karaoke, using an appropriate pairing of audio and text processing.
LyricAlly: automatic synchronization of acoustic musical signals and textual lyrics
TLDR
A prototype that automatically aligns acoustic musical signals with their corresponding textual lyrics, in a manner similar to manually-aligned karaoke, is presented, using a multimodal approach.
Word level automatic alignment of music and lyrics using vocal synthesis
TLDR
This approach uses a text-to-speech system to synthesize the singing voice according to the lyrics, and presents an alternative and effective solution to music-lyrics alignment which may require less training dataset.
Chord segmentation and recognition using EM-trained hidden markov models
TLDR
This work builds a system for automatic chord transcription using speech recognition tools, and uses “pitch class profile” vectors to emphasize the tonal content of the signal, and shows that these features far outperform cepstral coefficients for this task.
Approximate Note Transcription for the Improved Identification of Difficult Chords
TLDR
This paper seeks to find chroma features that are more suitable for usage in a musically-motivated model by performing a prior approximate transcription using an existing technique to solve non-negative least squares problems (NNLS).
A Discrete Mixture Model for Chord Labelling
TLDR
A new approach which uses as a basis a relatively simple chroma model to represent short-time sonorities derived from melody range and bass range chromagrams, and proves the practicability of the model by implementing a hidden Markov model (HMM) for chord labelling.
Automatic Recognition of Lyrics in Singing
TLDR
The paper considers the task of recognizing phonemes and words from a singing input by using a phonetic hidden Markov model recognizer and finds global adaptation to singing to improve singing recognition performance.
...
...