• Publications
  • Influence
Keyword search on external memory data graphs
TLDR
We propose a graph representation technique that combines a condensed version of the graph (the "supernode graph") which is always memory resident, along with whatever parts of the detailed graph are in a cache, to form a multi-granular graph representation. Expand
  • 152
  • 16
  • PDF
WebSets: extracting sets of entities from the web using unsupervised information extraction
TLDR
We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus by clustering terms found in HTML tables, and assigning concept names to these clusters using Hearst patterns. Expand
  • 91
  • 6
  • PDF
Exploratory Learning
TLDR
In multiclass semi-supervised learning (SSL), it is sometimes the case that the number of classes present in the data is not known. Expand
  • 45
  • 2
  • PDF
Automatic Gloss Finding for a Knowledge Base using Ontological Constraints
TLDR
We propose GLOFIN, a hierarchical semi-supervised learning algorithm which makes effective use of limited amounts of supervision and available ontological constraints. Expand
  • 30
  • 1
  • PDF
Entity List Completion Using Set Expansion Techniques
TLDR
We focus on relation and list extraction techniques to perform Entity List Completion task through a two stage retrieval process. Expand
  • 16
  • 1
  • PDF
A language modeling approach to entity recognition and disambiguation for search queries
The Entity Recognition and Disambiguation (ERD) problem refers to the task of recognizing mentions of entities in a given query string, disambiguating them, and mapping them to entities in a givenExpand
  • 12
  • 1
  • PDF
Very Fast Similarity Queries on Semi-Structured Data from the Web
TLDR
We propose PIC-D embeddings can represent large D-partite graphs using small number of dimensions enabling fast similarity queries. Expand
  • 8
  • 1
  • PDF
From Topic Models to Semi-supervised Learning: Biasing Mixed-Membership Models to Exploit Topic-Indicative Features in Entity Clustering
TLDR
We present methods to introduce different forms of supervision into mixed-membership latent variable models. Expand
  • 11
  • PDF
Constrained Semi-supervised Learning in the Presence of Unanticipated Classes
TLDR
Traditional semi-supervised learning (SSL) techniques consider the missing labels of unlabeled datapoints as latent/unobserved variables, and model these variables as parameters, and the parameters of the model, using Expectation Maximization (EM). Expand
  • 2
  • PDF