SPEECH OGLE: Indexing Uncertainty for Spoken Document Search


The paper presents the Position Specific Posterior Lattice (PSPL), a novel lossy representation of automatic speech recognition lattices that naturally lends itself to efficient indexing and subsequent relevance ranking of spoken documents. In experiments performed on a collection of lecture recordings — MIT iCampus data — the spoken document ranking accuracy was improved by 20% relative over the commonly used baseline of indexing the 1-best output from an automatic speech recognizer. The inverted index built from PSPL lattices is compact — about 20% of the size of 3-gram ASR lattices and 3% of the size of the uncompressed speech — and it allows for extremely fast retrieval. Furthermore, little degradation in performance is observed when pruning PSPL lattices, resulting in even smaller indexes — 5% of the size of 3-gram ASR lattices.

Extracted Key Phrases

2 Figures and Tables

Cite this paper

@inproceedings{Chelba2005SPEECHOI, title={SPEECH OGLE: Indexing Uncertainty for Spoken Document Search}, author={Ciprian Chelba and Alex Acero}, booktitle={ACL}, year={2005} }