Matthew Stephen Seigel

Learn More
This paper describes some recent results of our collaborative work on developing a speech recognition system for the automatic transcription or media archives from the British Broadcasting Corporation (BBC). Material includes a high diversity of shows with their associated transcriptions. The latter are highly diverse in terms of completeness, reliability(More)
This paper investigates improving lightly supervised acoustic model training for an archive of broadcast data. Standard lightly supervised training uses automatically derived decoding hypotheses using a biased language model. However, as the actual speech can deviate significantly from the original programme scripts that are supplied, the quality of(More)
The task in keyword spotting (KWS) is to hypothesise times at which any of a set of key terms occurs in audio. An important aspect of such systems are the scores assigned to these hypotheses, the accuracy of which have a significant impact on performance. Estimating these scores may be formulated as a confidence estimation problem, where a measure of(More)
The task of word-level confidence estimation (CE) for automatic speech recognition (ASR) systems stands to benefit from the combination of suitably defined input features from multiple information sources. However, the information sources of interest may not necessarily operate at the same level of granularity as the underlying ASR system. The research(More)
The estimation of accurate confidence scores for sub-word-level units within automatic speech recognition (ASR) system transcriptions is investigated in this work. This is achieved through the application of linear-chain and hidden-state conditional random field (CRF) models to the task. A method for evaluating the significance of results quoted in terms of(More)
  • 1