Learn More
Computing the pairwise semantic similarity between all words on the Web is a compu-tationally challenging task. Parallelization and optimizations are necessary. We propose a highly scalable implementation based on distributional similarity, implemented in the MapReduce framework and deployed over a 200 billion word crawl of the Web. The pairwise similarity(More)
Sets of named entities are used heavily at commercial search engines such as Google, Yahoo and Bing. Acquiring sets of entities typically consists of combining semi-supervised expansion algorithms with manual cleaning of the resulting expanded sets. In this paper, we study the effects of different seed sets in a state-of-the-art semi-supervised expansion(More)
Computing the similarity between entities is a core component of many NLP tasks such as measuring the semantic similarity of terms for generating a distributional thesaurus. In this paper, we study the problem of explaining post-hoc why a set of terms are similar. Given a set of terms, our task is to generate a small set of explanations that best(More)
  • 1