- Full text PDF available (11)
- This year (0)
- Last 5 years (0)
- Last 10 years (8)
Journals and Conferences
Our goal in cross-language text classification (CLTC) is to use English training data to classify Czech documents (although the concepts presented here are applicable to any language pair). CLTC is an off-line problem, and the authors are unaware of any previous work in this area. CLTC is motivated by both the non-availability of Czech training data (the… (More)
This paper describes two new techniques for increasing the accuracy oftopic label assignment to conversational speech from oral history interviews using supervised machine learning in conjunction with automatic speech recognition. The first, time-shifted classification, leverages local sequence information from the order in which the story is told. The… (More)
We introduce several methods of combining feature selectors for text classification. Results from a large investigation of these combinations are summarized. Easily constructed combinations of feature selectors are shown to improve peak <i>R</i>-precision and <i>F<sub>1</sub></i> at statistically significant levels.
Well tuned Large-Vocabulary Continuous Speech Recognition (LVCSR) has been shown to generally be more effective than vocabulary-independent techniques for ranked retrieval of spoken content when one or the other approach is used alone. Tuning LVCSR systems to a topic domain can be costly, however, and the experiments in this paper show that… (More)
We consider the relationship between training set size and the parameter <i>k</i> for the <i>k</i>-Nearest Neighbors (<i>k</i>NN) classifier. When few examples are available, we observe that accuracy is sensitive to <i>k</i> and that best <i>k</i> tends to increase with training size. We explore the subsequent risk that <i>k</i> tuned on partitions will be… (More)
We present a system to index and search conversational speech using a scoring heuristic on the expected posterior counts of phone n-grams in recognition lattices. We report significant improvements in retrieval effectiveness on five human languages over a strong 1-best baseline. The method is shown to improve the utility (mean average precision) of the… (More)
We introduce a discriminative approach to vocabulary independent term frequency estimation. Using two separate corpora and recognition systems, we show that our model can perform significantly better than a previously established generative model at this task.
Rapid and inexpensive techniques for automatic transcription of speech have the potential to dramatically expand the types of content to which information retrieval techniques can be productively applied, but limitations in accuracy and robustness must be overcome before that promise can be fully realized. Combining retrieval results from systems built on… (More)
Title of dissertation: Combining Evidence from Unconstrained Spoken Term Frequency Estimation for Improved Speech Retrieval J. Scott Olsson, Doctor of Philosophy, 2008 Dissertation directed by: Associate Professor Douglas W. Oard College of Information Studies This dissertation considers the problem of information retrieval in speech. Today’s speech… (More)
This paper introduces a new approach to ranking speech utterances by a system’s confidence that they contain a spoken word. Multiple alternate pronunciations, or degradations, of a query word’s phoneme sequence are hypothesized and incorporated into the ranking function. We consider two methods for hypothesizing these degradations, the best of which is… (More)