Assessing Segmentations: Two Methods for Confidence Scoring Automatic HMM-Based Word Segmentations


The Dutch-Flemish project Spoken Dutch Corpus (1998-2003) aims at the development of an annotated corpus of 10 million spoken words. In order to make the speech data easily accessible, a word segmentation couples the orthographic transcription to the speech signal by means of time stamps. Generally, such segmentations are produced manually. Since this… (More)


4 Figures and Tables