• Corpus ID: 6573918

Evaluation Framework for Automatic Singing Transcription

  title={Evaluation Framework for Automatic Singing Transcription},
  author={Emilio Molina and Ana M. Barbancho and Lorenzo J. Tard{\'o}n and Isabel Barbancho},
  booktitle={International Society for Music Information Retrieval Conference},
In this paper, we analyse the evaluation strategies used in previous works on automatic singing transcription, and we present a novel, comprehensive and freely available evaluation framework for automatic singing transcription. This framework consists of a cross-annotated dataset and a set of extended evaluation measures, which are integrated in a Matlab toolbox. The presented evaluation measures are based on standard MIREX note-tracking measures, but they provide extra information about the… 

Figures and Tables from this paper

VOCANO: A note transcription framework for singing voice in polyphonic music

VOCANO is presented, an open-source VOCAl NOte transcription framework built upon robust neural networks with multi-task and semi-supervised learning that outperforms the state of the arts on public benchmarks over a wide variety of evaluation metrics.

Computer-aided Melody Note Transcription Using the Tony Software: Accuracy and Efficiency

Tony, a software tool for the interactive annotation of melodies from monophonic audio recordings, is presented, and it is shown that Tony’s built in automatic note transcription method compares favourably with existing tools.

MusicYOLO: A Vision-Based Framework for Automatic Singing Transcription

A new AST framework called MusicYOLO is proposed, which obtains the note-level transcription results directly and detects note objects rather than isolated onset/offset moments, thus greatly enhancing the transcription performance.

vocadito: A dataset of solo vocals with f0, note, and lyric annotations

This work presents a small dataset entitled vocadito, consisting of 40 short excerpts of monophonic singing, sung in 7 different languages by singers with varying of levels of training, and recorded on a variety of devices.

Automatic Solfège Assessment

Experimental results indicate that the classification scheme is suitable to be used as an assessment tool, providing useful feedback to the student, and implemented using a Bayesian classifier.

Improving Lyrics Alignment Through Joint Pitch Detection

This paper proposes a multi-task learning approach for lyrics alignment that incorporates pitch and thus can make use of a new source of highly accurate temporal information and shows that the accuracy of the alignment result is indeed improved by this approach.

HSD: A hierarchical singing annotation dataset

A hierarchical singing annotation dataset that consists of 68 pop songs from Youtube that records the onset/offset time, pitch, duration, and lyric of each musical note in an enhanced LyRiCs (LRC) format to present the hierarchical structure of music.

Omnizart: A General Toolbox for Automatic Music Transcription

Omnizart is the first transcription toolkit which offers models covering a wide class of instruments ranging from solo, instrument ensembles, percussion instruments to vocal, as well as models for chord recognition and beat/downbeat tracking, two music information retrieval tasks highly related to AMT.


Automatic music transcription from audio has long been one of the most intriguing problems and a challenge in the field of music information retrieval, because it requires a series of low-level tasks

Toward Expressive Singing Voice Correction: On Perceptual Validity of Evaluation Metrics for Vocal Melody Extraction

A streamlined system to automate expressive SVC for both pitch and rhythmic errors is presented, and perceptual validity of the standard metrics through the lens of SVC is investigated, suggesting that the high pitch accuracy obtained by the metrics does not signify good perceptual scores.



Towards Computer-Assisted Flamenco Transcription: An Experimental Comparison of Automatic Transcription Algorithms as Applied to A Cappella Singing

A transcription system based on fundamental frequency and energy estimation, which incorporates an iterative strategy for note segmentation and labeling is proposed, which outperforms a state-of-the-art approach designed for other singing styles.

Explicit Transition Modelling for Automatic Singing Transcription

A system for the automatic transcription of solo human singing into note sequences and Hidden Markov models are used to represent both individual notes and the transitions between them in order to capture the variability of the estimated pitch within a statistical framework.

An Auditory Model Based Transcriber of Singing Sequences

A new system for the automatic transcription of singing sequences into a sequence of pitch and duration pairs is presented and it is shown that the accuracy of the newly proposed transcription system is not very to the choice of the free parameters, at least as long as they remain in the vicinity of the values one could forecast on the basis of their meaning.

Recent improvements of an auditory model based front-end for the transcription of vocal queries

Experiments have shown that the new system can transcribe vocal queries with an accuracy ranging from 76 % (whistling) to 85 % (humming), and that it clearly outperforms other state-of-the art systems on all three query types.

Fundamental frequency alignment vs. note-based melodic similarity for singing voice assessment

The results show that the proposed system is suitable for automatic singing voice rating and that DTW based measures are specially simple and effective for intonation and rhythm assessment.

Modelling of note events for singing transcription

The method produces symbolic notations from acoustic inputs based on two probabilistic models: a note event model and a musicological model which form a melody transcription system with a modular architecture which can be extended with desired front-end feature extractors and musicological rules.

Sung Note Segmentation for a Query-by-Humming System

New acoustic feats based on the signal energy distribution as obtained from the singing pe rception and production points of view are investigated and a specific mid-band energy combined with a biphasic detection function achieves high co-rect detection and low false alarm rates on the sonorant consonant syllables.

An Audio Front End for Query-by-Humming Systems

A front end dedicated to the symbolic translation of voice into a sequence of pitch and duration pairs is developed, crucial for the effectiveness of searching for music by melodic similarity.

Probabilistic models for the transcription of single-voice melodies

A method is proposed for the automatic transcription of single-voice melodies from an acoustic waveform into a symbolic musical notation (a MIDI file) using a probabilistic model that handles imperfections in the performed/estimated pitch values using a hidden Markov model.

Automatic Transcription of Melody, Bass Line, and Chords in Polyphonic Music

The method is computationally efficient and allows causal implementation, so it can process streaming audio, and may be used in music analysis, music information retrieval from large music databases, content-based audio processing, and interactive music systems.