J. Scott Olsson

Learn More
Our goal in cross-language text classification (CLTC) is to use English training data to classify Czech documents (although the concepts presented here are applicable to any language pair). CLTC is an off-line problem, and the authors are unaware of any previous work in this area. CLTC is motivated by both the non-availability of Czech training data (the(More)
This paper describes two new techniques for increasing the accuracy oftopic label assignment to conversational speech from oral history interviews using supervised machine learning in conjunction with automatic speech recognition. The first, time-shifted classification, leverages local sequence information from the order in which the story is told. The(More)
Well tuned Large-Vocabulary Continuous Speech Recognition (LVCSR) has been shown to generally be more effective than vocabulary-independent techniques for ranked retrieval of spoken content when one or the other approach is used alone. Tuning LVCSR systems to a topic domain can be costly, however, and the experiments in this paper show that(More)
We consider the relationship between training set size and the parameter <i>k</i> for the <i>k</i>-Nearest Neighbors (<i>k</i>NN) classifier. When few examples are available, we observe that accuracy is sensitive to <i>k</i> and that best <i>k</i> tends to increase with training size. We explore the subsequent risk that <i>k</i> tuned on partitions will be(More)
We present a system to index and search conversational speech using a scoring heuristic on the expected posterior counts of phone n-grams in recognition lattices. We report significant improvements in retrieval effectiveness on five human languages over a strong 1-best baseline. The method is shown to improve the utility (mean average precision) of the(More)
Rapid and inexpensive techniques for automatic transcription of speech have the potential to dramatically expand the types of content to which information retrieval techniques can be productively applied, but limitations in accuracy and robustness must be overcome before that promise can be fully realized. Combining retrieval results from systems built on(More)
Title of dissertation: Combining Evidence from Unconstrained Spoken Term Frequency Estimation for Improved Speech Retrieval J. Scott Olsson, Doctor of Philosophy, 2008 Dissertation directed by: Associate Professor Douglas W. Oard College of Information Studies This dissertation considers the problem of information retrieval in speech. Today’s speech(More)
This paper introduces a new approach to ranking speech utterances by a system’s confidence that they contain a spoken word. Multiple alternate pronunciations, or degradations, of a query word’s phoneme sequence are hypothesized and incorporated into the ranking function. We consider two methods for hypothesizing these degradations, the best of which is(More)