Polyphonic transcription by non-negative sparse coding of power spectra
Making machines that can understand musical structure has long been one of the holy grails of audio processing, separating overlapping sounds has been another. Here we present a simple framework initialy used for the first task, which has come to make itself very useful for source separation. We show that the same type of reasoning that allows one to find the building elements of a musical audio stream can also be used to find and extract elements contained in auditory scenes. We relate this work with recent developments in sparse representations and dimensionality reduction and show its application in a variety of situations.