Handling Asynchrony in Audio-Score Alignment


Aligning a canonical score to an audio recording of a musical performance can provide very good information about the timing of individual notes. However, a score representation frequently treats multiple note events as simultaneous, whereas in reality different performers will start notes at slightly differing times, and these timing details may be significant in the analysis of performance and expression. Using an example of a four-part a cappella vocal piece where each voice was recorded separately, we compare note onset and offset times obtained by manual annotation to three difference types of alignment: forced alignment of each part individually to its corresponding track, simultaneous alignment of the polyphonic score to the full audio, and independent alignment of single parts to the polyphonic audio. In each case, we examine the kinds of errors that occur. We discuss how standard dynamic time warping may be extended so that it retains the advantages of polyphonic alignment while allowing ostensibly simultaneous notes to have different onset and offset times.

Extracted Key Phrases

9 Figures and Tables

Cite this paper

@inproceedings{Devaney2009HandlingAI, title={Handling Asynchrony in Audio-Score Alignment}, author={Johanna Devaney and Daniel P. W. Ellis}, booktitle={ICMC}, year={2009} }