Learn More
In this paper we address the problem of aligning very long (often more than one hour) audio files to their corresponding textual transcripts in an effective manner. We present an efficient recur-sive technique to solve this problem that works well even on noisy speech signals. The key idea of this algorithm is to turn the forced alignment problem into a(More)
spoken document retrieval, speech indexing, out-of-vocabulary words, OOV words We present several novel approaches to the OOV query problem for spoken audio: indexing based on syllable-like units called particles and query expansion according to acoustic confusability for a word index. We also examine linear and OOV-based combination of indexing schemes. We(More)
We have developed an audio search engine incorporating speech recognition technology. This allows indexing of spoken documents from the World Wide Web when no transcription is available. This site indexes several talk and news radio shows covering a wide range of topics and speaking styles from a selection of public Web sites with multimedia archives. Our(More)
A <i>planar map</i> is a figure formed by a set of intersecting lines and curves. Such an object captures both the geometrical and the topological information implicitly defined by the data. In the context of 2D drawing it provides a new interaction paradigm, <i>map sketching</i>, for editing graphic shapes.To build a planar map, one must compute curve(More)
We present a novel approach to the out of vocabulary (OOV) query problem for audio indexing. Our technique first builds a word index for the audio using speech recognition. It then expands query words into in-vocabulary phrases according to intrinsic acoustic confusability and language model scores. The aim is to mimic the mistakes the speech recognizer(More)
A method is presented for performing speech recognition that is not dependent on a fixed word vocabulary. Particles are used as the recognition units in a speech recognition system which permits word-vocabulary independent speech decoding. A particle represents a concatenated phone sequence. Each string of particles that represents a word in the one-best(More)
The goal of this work is to use phonetic recognition to drive a synthetic image with speech. Phonetic units are identiied by the phonetic recognition engine and mapped to mouth gestures, known as visemes, the visual counterpart of phonemes. The acoustic waveform and visemes are then sent to a synthetic image player, called FaceMe! where they are rendered(More)
As the Web transforms from a text only medium into a more multimedia rich medium the need arises to perform searches based on the multimedia content. In this paper we present an audio and video search engine to tackle this problem. The engine uses speech recognition technology to index spoken audio and video files from the World Wide Web when no(More)
We have developed a speech recognition based audio search engine for indexing spoken documents found on the World Wide Web. Our site (http://www.compaq.com/speechbot) indexes around 20 news and talk radio shows covering a wide range of topics, speaking styles and acoustic conditions from a selection of public Web sites with multimedia archives. In this(More)
We explore how current traditional applications in multimedia indexing can evolve into fully-fledged knowledge management systems in which multimedia content, audio , video and images, are first class citizens and contribute as much as textual sources. We start by describing a current application for indexing audio and video from the Web, the SpeechBot web(More)