Learn More
Similarity is an important and widely used concept. Previous definitions of similarity are tied to a particular application or a form of knowledge representation. We present an informationtheoretic definition of similarity that is applicable as long as there is a probabilistic model. We demonstrate how our definition can be used to measure the similarity in(More)
Bootstrapping semantics from text is one of the greatest challenges in natural language learning. We first define a word similarity measure based on the distributional pattern of words. The similarity measure allows us to construct a thesaurus using a parsed corpus. We then present a new evaluation methodology for the automatically constructed thesaurus.(More)
In this paper, we first present a dependency-based method for parser evaluation. We then use the method to evaluate a broad-coverage parser, called MINIPAR, with the SUSANNE corpus. The method allows us to evaluate not only the overall performance of the parser, but also its performance with respect to different grammatical relationships and phenomena. The(More)
In this paper, we propose an unsupervised method for discovering inference rules from text, such as “X is author of Y ≈ X wrote Y”, “X solved Y ≈ X found a solution to Y”, and “X caused Y ≈ Y is triggered by X”. Inference rules are extremely important in many fields such as natural language processing, information retrieval, and artificial intelligence in(More)
Inventories of manually compiled dictionaries usually serve as a source for word senses. However, they often include many rare senses while missing corpus/domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers word senses from text. It initially discovers a set of tight clusters called(More)
One of the main challenges in question-answering is the potential mismatch between the expressions in questions and the expressions in texts. While humans appear to use inference rules such as “X writes Y” implies “X is the author of Y” in answering questions, such rules are generally unavailable to question-answering systems due to the inherent difficulty(More)
Non-compositional expressions present a special challenge to NLP applications. We present a method for automatic identification of non-compositional expressions using their statistical properties in a text corpus. Our method is based on the hypothesis that when a phrase is non-composition, its mutual information differs significantly from the mutual(More)
A b s t r a c t Wi th th t emergence of broad-coverage parsers, quan­ titative evaluation of parsers becomes increasingly more important We propose a dependency-based method for evaluating broad-coverage parsers The method offers several advantages over previous methods that are based on phrase boundaries The error count score WL propose here is not only(More)
Overgeneration is the main source of computational complexity in previous principle-based parsers. This paper presents a message passing algorithm for principle-based parsing that avoids the overgeneration problem. This algorithm has been implemented in C + + and successfully tested with example sentences from (van Riemsdijk and Williams, 1986). 1. I n t r(More)