Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals

  title={Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals},
  author={Jean-Louis Durrieu and Ga{\"e}l Richard and Bertrand David and C{\'e}dric F{\'e}votte},
  journal={IEEE Transactions on Audio, Speech, and Language Processing},
Extracting the main melody from a polyphonic music recording seems natural even to untrained human listeners. To a certain extent it is related to the concept of source separation, with the human ability of focusing on a specific source in order to extract relevant information. In this paper, we propose a new approach for the estimation and extraction of the main melody (and in particular the leading vocal part) from polyphonic audio signals. To that aim, we propose a new signal model where the… 

Figures from this paper

A convolutional recurrent neural network architecture that relies on a particular form of pretraining by source-filter nonneg-ative matrix factorisation to estimate the dominant melody of a polyphonic audio recording achieves state-of-the-art performance on the MedleyDB dataset without any augmentation methods or large training sets.
Main Melody Estimation with Source-Filter NMF and CRNN
This work proposes to enhance the NMF-based salience representations with CNN layers, then to model the temporal structure by an RNN network and to estimate the dominant melody with a final classification layer, and shows that such a system achieves state-of-the-art performance on the MedleyDB dataset without any augmentation methods or large training sets.
Automatic transcription of the melody from polyphonic music
An efficient computational method for auditory stream segregation that processes a variable number of simultaneous voices that allows a very efficient computation of the melody.
Improving melody extraction using Probabilistic Latent Component Analysis
  • Jinyu HanChing-Wei Chen
  • Computer Science
    2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2011
Quantitative evaluation shows that the new PLCA-based melody extraction algorithm performs significantly better than two existing melody extraction algorithms for polyphonic single-channel mixtures.
On-Line Melody Extraction From Polyphonic Audio Using Harmonic Cluster Tracking
  • Vipul AroraL. Behera
  • Computer Science
    IEEE Transactions on Audio, Speech, and Language Processing
  • 2013
A novel framework which estimates predominant vocal melody in real-time by tracking various sources with the help of harmonic clusters (combs) and then determining the predominant vocal source by using the harmonic strength of the source.
From Heuristics-Based to Data-Driven Audio Melody Extraction
The combination of supervised and unsupervised approaches leads to advancements on melody extraction and shows a promising path for future research and applications.
Robust Singer Identification in Polyphonic Music using Melody Enhancement and Uncertainty-based Learning
New methods to estimate the uncertainty from the signal in a fully automatic manner and to learn the classifier directly from polyphonic data are introduced.
Towards Computational Auditory Scene Analysis: Melody Extraction from Polyphonic Music
The method is a further development of an algorithm which was successfully evaluated as part of a melody ex- traction system and shows a superior performance for audio examples which have been assembled to show the importance of auditory streaming in human perception.
Melody Extraction from Polyphonic Music Signals Using Tandem Filter System
  • Chen JiaGang Liu
  • Engineering
    2018 International Computers, Signals and Systems Conference (ICOMSSC)
  • 2018
The robust principal component analysis is used to roughly extract the human voice from polyphonic music signal using tandem filter and a transverse stripe filter system to eliminate the non-possible fundamental frequency position.
Vocal Melody Extraction via DNN-based Pitch Estimation and Salience-based Pitch Refinement
Experimental results on three public datasets indicate that the proposed melody MIDI files as the sources of labels to train a deep neural network (DNN) model for melody extraction outperforms four state-of-the-art melody extraction methods in most cases.


Singer melody extraction in polyphonic signals using source separation methods
A new approach for singer melody extraction, based on blind source separation techniques, and a simplification of this general GMM and approximate the STFT of the music signal using Non-negative Matrix Factorization (NMF) techniques.
An iterative approach to monaural musical mixture de-soloing
This article proposes to model the power spectral densities of both contributions with a source/filter model for the main instrument while retaining a model emphasizing temporal repetitions of the musical background, and shows that improved source separation performances can be obtained by a two-step estimation strategy.
This document describes the submission to the MIREX audio melody extraction contest addressing the task of identifying the melody pitch contour from polyphonic musical audio, and proves that the algorithm performs best in respect of runtime and overall accuracy.
Combining pitch-based inference and non-negative spectrogram factorization in separating vocals from polyphonic music
A novel algorithm based on pitch estimation and nonnegative matrix factorization (NMF) that predicts the amount of noise in the vocal segments, which allows separating vocals and noise even when they overlap in time and frequency is proposed.
Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs
A general formalism for source model adaptation which is expressed in the framework of Bayesian models is introduced and results show that an adaptation scheme can improve consistently and significantly the separation performance in comparison with nonadapted models.
A robust predominant-F0 estimation method for real-time detection of melody and bass lines in CD recordings
  • Masataka Goto
  • Computer Science
    2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)
  • 2000
A predominant-F0 estimation method called PreFEst is proposed that does not rely on the F0's unreliable frequency component and obtains the most predominant F0 supported by harmonics within an intentionally limited frequency range.
Accompaniment separation and karaoke application based on automatic melody transcription
A method for separating accompaniment from polyphonic music and its karaoke application, both based on automatic melody transcription, which will help non-professional singers to produce more appealing k Karaoke performances.
Tracking melody in polyphonic audio . MIREX 2008
In this work a melody extraction technique is introduced to the MIREX 2008 campaign. The task’s objective consists in estimating the pitch of the main melody in polyphonic audio. The proposed method
A Classification Approach to Melody Transcription
This work presents a classification-based system for performing automatic melody transcription that makes no assumptions beyond what is learned from its training data, and shows that a Support Vector Machine melodic classifier produces results comparable to state of the art model-based transcription systems.
Transcription of the Singing Melody in Polyphonic Music
The method is based on multiple-F0 estimation followed by acoustic and musicological modeling, which produces a sequence of notes and rests as a transcription of the singing melody.