Learn More
Many corpus-based natural language processing systems rely on text corpora that have been manually annotated with syntactic or semantic tags. In particular, all previous dictionary construction systems for information extraction have used an annotated training corpus or some form of annotated input. We have developed a system called AutoSlog-TS that creates(More)
Vancouver, October 2005. OpinionFinder: A system for subjectivity analysis Theresa Wilson‡, Paul Hoffmann‡, Swapna Somasundaran†, Jason Kessler†, Janyce Wiebe†‡, Yejin Choi§, Claire Cardie§, Ellen Riloff∗, Siddharth Patwardhan∗ ‡Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15260 †Department of Computer Science, University of(More)
Information extraction systems usually require two dictionaries: a semantic lexicon and a dictionary of extraction patterns for the domain. We present a multilevel bootstrapping algorithm that generates both the semantic lexicon and extraction patterns simultaneously. As input, our technique requires only unannotated training texts and a handful of seed(More)
This paper presents the results of developing subjectivity classifiers using only unannotated texts for training. The performance rivals that of previous supervised learning approaches. In addition, we advance the state of the art in objective sentence classification by learning extraction patterns associated with objectivity and creating objective(More)
A common form of sarcasm on Twitter consists of a positive sentiment contrasted with a negative situation. For example, many sarcastic tweets include a positive sentiment, such as “love” or “enjoy”, followed by an expression that describes an undesirable activity or state (e.g., “taking exams” or “being ignored”). We have developed a sarcasm recognizer to(More)
This paper describes a bootstrapping algorithm called Basilisk that learns highquality semantic lexicons for multiple categories. Basilisk begins with an unannotated corpus and seed words for each semantic category, which are then bootstrapped to learn new words for each category. Basilisk hypothesizes the semantic class of a word based on collective(More)
We present a novel approach to weakly supervised semantic class learning from the web, using a single powerful hyponym pattern combined with graph structures, which capture two properties associated with pattern-based extractions: popularity and productivity. Intuitively, a candidate is popular if it was discovered many times by other instances in the(More)
The recent growth of online information available in the form of natural language documents creates a greater need for computing systems with the ability to process those documents to simplify access to the information. One type of processing appropriate for many tasks is information extraction, a type of text skimming that retrieves speci c types of(More)