Scott M. Thede

Learn More
The importance of dealing with unknown words in Natural Language Processing (NLP) is growing as NLP systems are used in more and more applications. One aid in predicting the lexical class of words that do not appear in the lexicon (referred to as unknown words) is the use of syntactic parsing rules. The distinction between closed-class and open-class words(More)
This paper examines the feasibility of using statistical methods to train a part-of-speech pre-dictor for unknown words. By using statistical methods, without incorporating hand-crafted linguistic information, the predictor could be used with any language for which there is a large tagged training corpus. Encouraging results have been obtained by testing(More)
Ratings of familiarity and pronounceability were obtained from a random sample of 199 surnames (selected from over 80,000 entries in the Purdue University phone book) and 199 nouns (from the Kucera-Francis, 1967, word database). The distributions of ratings for nouns versus names are substantially different: Nouns were rated as more familiar and easier to(More)
ACKNOWLEDGMENTS Writing a Ph.D. thesis is a tremendous amount o f w ork, and it is not accomplished in a vacuum. I would like to thank all the people who were most important t o m e during the journey. First, I would like t o t h a n k m y committee for their help. Special thanks go to Mary Harper, my advisor, for her many hours of proofreading, editing and(More)
Ratings of familiarity and pronounceability were obtained for a sample of 199 names and 199 nouns. Frequency and familiarity were more closely related in the proper name pool than the word pool, although the correlation was modest in both cases. Familiarity and pronounceability were highly related for both names and nouns. Although word-level models of(More)
This paper examines the feasibility of using statistical methods to train a part-of-speech tagger, particularly with respect to unknown words. Training a part-of-speech tagger on a tagged corpus, without incorporating hand-crafted linguistic information, allows that tagger to be used for any language. The use of statistical methods has given encouraging(More)
  • 1