Learn More
Computing the pairwise semantic similarity between all words on the Web is a compu-tationally challenging task. Parallelization and optimizations are necessary. We propose a highly scalable implementation based on distributional similarity, implemented in the MapReduce framework and deployed over a 200 billion word crawl of the Web. The pairwise similarity(More)
State of the art set expansion algorithms produce varying quality expansions for different entity types. Even for the highest quality expansions , errors still occur and manual refinements are necessary for most practical uses. In this paper, we propose algorithms to aide this refinement process, greatly reducing the amount of manual labor required. The(More)
Sets of named entities are used heavily at commercial search engines such as Google, Yahoo and Bing. Acquiring sets of entities typically consists of combining semi-supervised expansion algorithms with manual cleaning of the resulting expanded sets. In this paper, we study the effects of different seed sets in a state-of-the-art semi-supervised expansion(More)
In this paper, we present a method for modeling joint information when generating n-best lists. We apply the method to a novel task of characterizing the similarity of a group of terms where only a small set of many possible semantic properties may be displayed to a user. We demonstrate that considering the results jointly, by accounting for the information(More)
Computing the similarity between entities is a core component of many NLP tasks such as measuring the semantic similarity of terms for generating a distributional thesaurus. In this paper, we study the problem of explaining post-hoc why a set of terms are similar. Given a set of terms, our task is to generate a small set of explanations that best(More)
  • 1