• Publications
  • Influence
Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis
This work proposes Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia that results in substantial improvements in correlation of computed relatedness scores with human judgments.
Placing search in context: the concept revisited
A new conceptual paradigm for performing search in context is presented, that largely automates the search process, providing even non-professional users with highly relevant results.
Knowledge vault: a web-scale approach to probabilistic knowledge fusion
The Knowledge Vault is a Web-scale probabilistic knowledge base that combines extractions from Web content (obtained via analysis of text, tabular data, page structure, and human annotations) with prior knowledge derived from existing knowledge repositories that computes calibrated probabilities of fact correctness.
A Review of Relational Machine Learning for Knowledge Graphs
This paper provides a review of how statistical models can be “trained” on large knowledge graphs, and then used to predict new facts about the world (which is equivalent to predicting new edges in the graph) and how such statistical models of graphs can be combined with text-based information extraction methods for automatically constructing knowledge graphs from the Web.
Wikipedia-based Semantic Interpretation for Natural Language Processing
This work proposes a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts, which represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence.
A word at a time: computing word relatedness using temporal semantic analysis
This paper proposes a new semantic relatedness model, Temporal Semantic Analysis (TSA), which captures this temporal information in word semantics as a vector of concepts over a corpus of temporally-ordered documents.
Google COVID-19 Community Mobility Reports: Anonymization Process Description (version 1.0)
This document describes the aggregation and anonymization process applied to the initial version of Google COVID-19 Community Mobility Reports (published at this http URL on April 2, 2020), a
Large-scale learning of word relatedness with constraints
A large-scale data mining approach to learning word-word relatedness, where known pairs of related words impose constraints on the learning process, and learns for each word a low-dimensional representation, which strives to maximize the likelihood of a word given the contexts in which it appears.
Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge
It is proposed to enrich document representation through automatic use of a vast compendium of human knowledge--an encyclopedia, and empirical results confirm that this knowledge-intensive representation brings text categorization to a qualitatively new level of performance across a diverse collection of datasets.
Feature Generation for Text Categorization Using World Knowledge
Improved machine learning algorithms for text categorization with generated features based on domain-specific and common-sense knowledge are enhanced, addressing the two main problems of natural language processing--synonymy and polysemy.