Scott M. Thede

Learn More
This paper examines the feasibility of using sta tistical methods to train a part of speech pre dictor for unknown words By using statistical methods without incorporating hand crafted linguistic information the predictor could be used with any language for which there is a large tagged training corpus Encouraging re sults have been obtained by testing the(More)
The importance of dealing with unknown words in Natural Language Processing NLP is growing as NLP systems are used in more and more applications One aid in predicting the lexical class of words that do not appear in the lexicon referred to as unknown words is the use of syntactic parsing rules The distinction between closed class and open class words(More)
Thede Scott Ph D Purdue University December Parsing and Tagging Sen tences Containing Lexically Ambiguous and Unknown Tokens Major Professor Mary P Harper We present a parsing system designed to parse sentences containing unknown words as accurately as possible Our post mortem parsing algorithm combines syn tactic parsing rules morphological recognition and(More)
This paper examines the feasibility of using statistical methods to train a part-of-speech tagger, particularly with respect to unknown words. Training a part-of-speech tagger on a tagged corpus, without incorporating hand-crafted linguistic information, allows that tagger to be used for any language. The use of statistical methods has given encouraging(More)
The importance of dealing with unknown words in natural language processing NLP is growing as NLP systems are used in more and more applications The ability to parse sentences containing unknown words will make a parsing system more robust and exible The use of syntactic parsing rules provides constraints on the possible lexical categories of unknown words(More)
Ratings of familiarity and pronounceability were obtained from a random sample of 199 surnames (selected from over 80,000 entries in the Purdue University phone book) and 199 nouns (from the Kucera-Francis, 1967, word database). The distributions of ratings for nouns versus names are substantially different: Nouns were rated as more familiar and easier to(More)
  • 1